Data Analysis
sample-correlation-analysis
Compute pairwise sample Pearson or Spearman correlations and visualize as correlation heatmaps for QC and batch detection. Inputs: expression matrix, sample annotation. Outputs: correlation matrix, annotated heatmap, outlier sample flagging report.
86100Total Score
Core Capability
89 / 100
Functional Suitability
10 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
19 / 20 Passed
91Basic Pearson column analysis
4/4
85Row-labeled Spearman analysis
4/4
67Missing output variable
3/4
89Large Pearson TXT export
4/4
90Large Spearman one-sided run
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Outputs were generated from real bundled datasets and no fabricated statistical claims were detected. |
| Practice Boundaries | PASS | The skill stays within statistical computation and does not make diagnostic, prescriptive, or treatment claims. |
| Methodological Ground | PASS | Pearson and Spearman were applied to appropriate two-variable correlation tasks with documented assumptions and bounded scope. |
| Code Usability | PASS | All audited success-path commands executed successfully in the current environment without code changes. |
Core Capability89 / 100 — 8 Categories
Functional Suitability
Core Pearson and Spearman workflows are covered well, but a few runtime behaviors are not fully documented.
10 / 12
83%
Reliability
Error codes are strong, but failed runs still create empty output directories and one troubleshooting dependency note is inconsistent.
10 / 12
83%
Performance & Context
Full score achieved. The skill layers detail into references and keeps the top-level workflow concise.
8 / 8
100%
Agent Usability
Instructions are easy to follow, though a few wording inconsistencies slightly raise inference cost.
14 / 16
88%
Human Usability
Trigger language is natural, but the interface remains strict on exact variable names and file-based inputs.
7 / 8
88%
Security
Input validation is strong and no dangerous execution patterns were found. Routine session metadata is still written to disk on success.
11 / 12
92%
Maintainability
Scripts and references are well separated, but some documentation has drifted from observed runtime behavior.
11 / 12
92%
Agent-Specific
Triggering, layering, and idempotency are strong. Out-of-scope guidance could be more explicit.
18 / 20
90%
Core Capability Total89 / 100
Medical TaskExecution Average: 84.4 / 100 — Assertions: 19/20 Passed
91
Canonical
Basic Pearson column analysis
4/4 ✓
85
Variant A
Row-labeled Spearman analysis
4/4 ✓
67
Edge
Missing output variable
3/4 ⚠
89
Variant B
Large Pearson TXT export
4/4 ✓
90
Stress
Large Spearman one-sided run
4/4 ✓
91
Canonical✅ Pass
Basic Pearson column analysis
Executed cleanly and produced the expected CSV output.
Basic 37/40|Specialized 54/60|Total 91/100
✅A1Output creates the expected Pearson result file in table/.
✅A2Output reports both requested variables and the correct sample size.
✅A3Output includes the core statistical fields promised by the skill.
✅A4Output stays within the skill's two-variable correlation scope.
Pass rate: 4 / 4
85
Variant A✅ Pass
Row-labeled Spearman analysis
Row-oriented input handling worked correctly and the result stayed accurate.
Basic 35/40|Specialized 50/60|Total 85/100
✅A1The skill handles variables stored as first-column row labels.
✅A2The requested Spearman method is preserved in the output.
✅A3The result includes the requested variable names and a valid sample size.
✅A4The skill does not invent confidence intervals when this path does not provide them.
Pass rate: 4 / 4
67
Edge⚠️ Warning
Missing output variable
The failure message is clear, but the run still leaves an empty output directory tree behind.
Basic 28/40|Specialized 39/60|Total 67/100
✅A1The skill surfaces a specific, actionable missing-column error.
✅A2The run does not fabricate a result file after the validation failure.
❌A3The skill avoids unnecessary side effects on failed validation.
✅A4The failure path stays within scope and does not produce misleading statistics.
Pass rate: 3 / 4
89
Variant B✅ Pass
Large Pearson TXT export
Custom parameters and TXT output worked as documented on the large bundled dataset.
Basic 36/40|Specialized 53/60|Total 89/100
✅A1The skill respects custom output format and prefix settings.
✅A2The output preserves the requested hypothesis and confidence level.
✅A3The larger dataset executes cleanly without manual tuning.
✅A4The output remains constrained to the requested statistical summary.
Pass rate: 4 / 4
90
Stress✅ Pass
Large Spearman one-sided run
The large Spearman run completed successfully and preserved all requested parameters.
Basic 36/40|Specialized 54/60|Total 90/100
✅A1The skill completes a large-dataset Spearman run successfully.
✅A2The output preserves the requested method, variables, and one-sided alternative.
✅A3The output includes the promised core statistics.
✅A4The skill does not overstep into unsupported interpretation or causal claims.
Pass rate: 4 / 4
Medical Task Total84.4 / 100
Key Strengths
- The core Pearson and Spearman workflows execute successfully on both column-oriented and row-labeled bundled datasets.
- The skill exposes clear CLI parameters, deterministic outputs, and a lightweight reference structure that keeps the main SKILL.md concise.
- Error reporting uses stable, human-readable SKILL_* codes that make failures easy to diagnose in automation or manual use.