Data Analysis

sample-correlation-analysis

Compute pairwise sample Pearson or Spearman correlations and visualize as correlation heatmaps for QC and batch detection. Inputs: expression matrix, sample annotation. Outputs: correlation matrix, annotated heatmap, outlier sample flagging report.

86100Total Score
Core Capability
89 / 100
Functional Suitability
10 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
19 / 20 Passed
91Basic Pearson column analysis
4/4
85Row-labeled Spearman analysis
4/4
67Missing output variable
3/4
89Large Pearson TXT export
4/4
90Large Spearman one-sided run
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSOutputs were generated from real bundled datasets and no fabricated statistical claims were detected.
Practice BoundariesPASSThe skill stays within statistical computation and does not make diagnostic, prescriptive, or treatment claims.
Methodological GroundPASSPearson and Spearman were applied to appropriate two-variable correlation tasks with documented assumptions and bounded scope.
Code UsabilityPASSAll audited success-path commands executed successfully in the current environment without code changes.

Core Capability89 / 1008 Categories

Functional Suitability
Core Pearson and Spearman workflows are covered well, but a few runtime behaviors are not fully documented.
10 / 12
83%
Reliability
Error codes are strong, but failed runs still create empty output directories and one troubleshooting dependency note is inconsistent.
10 / 12
83%
Performance & Context
Full score achieved. The skill layers detail into references and keeps the top-level workflow concise.
8 / 8
100%
Agent Usability
Instructions are easy to follow, though a few wording inconsistencies slightly raise inference cost.
14 / 16
88%
Human Usability
Trigger language is natural, but the interface remains strict on exact variable names and file-based inputs.
7 / 8
88%
Security
Input validation is strong and no dangerous execution patterns were found. Routine session metadata is still written to disk on success.
11 / 12
92%
Maintainability
Scripts and references are well separated, but some documentation has drifted from observed runtime behavior.
11 / 12
92%
Agent-Specific
Triggering, layering, and idempotency are strong. Out-of-scope guidance could be more explicit.
18 / 20
90%
Core Capability Total89 / 100

Medical TaskExecution Average: 84.4 / 100 — Assertions: 19/20 Passed

91
Canonical
Basic Pearson column analysis
4/4
85
Variant A
Row-labeled Spearman analysis
4/4
67
Edge
Missing output variable
3/4
89
Variant B
Large Pearson TXT export
4/4
90
Stress
Large Spearman one-sided run
4/4
91
Canonical✅ Pass
Basic Pearson column analysis

Executed cleanly and produced the expected CSV output.

Basic 37/40|Specialized 54/60|Total 91/100
A1Output creates the expected Pearson result file in table/.
A2Output reports both requested variables and the correct sample size.
A3Output includes the core statistical fields promised by the skill.
A4Output stays within the skill's two-variable correlation scope.
Pass rate: 4 / 4
85
Variant A✅ Pass
Row-labeled Spearman analysis

Row-oriented input handling worked correctly and the result stayed accurate.

Basic 35/40|Specialized 50/60|Total 85/100
A1The skill handles variables stored as first-column row labels.
A2The requested Spearman method is preserved in the output.
A3The result includes the requested variable names and a valid sample size.
A4The skill does not invent confidence intervals when this path does not provide them.
Pass rate: 4 / 4
67
Edge⚠️ Warning
Missing output variable

The failure message is clear, but the run still leaves an empty output directory tree behind.

Basic 28/40|Specialized 39/60|Total 67/100
A1The skill surfaces a specific, actionable missing-column error.
A2The run does not fabricate a result file after the validation failure.
A3The skill avoids unnecessary side effects on failed validation.
A4The failure path stays within scope and does not produce misleading statistics.
Pass rate: 3 / 4
89
Variant B✅ Pass
Large Pearson TXT export

Custom parameters and TXT output worked as documented on the large bundled dataset.

Basic 36/40|Specialized 53/60|Total 89/100
A1The skill respects custom output format and prefix settings.
A2The output preserves the requested hypothesis and confidence level.
A3The larger dataset executes cleanly without manual tuning.
A4The output remains constrained to the requested statistical summary.
Pass rate: 4 / 4
90
Stress✅ Pass
Large Spearman one-sided run

The large Spearman run completed successfully and preserved all requested parameters.

Basic 36/40|Specialized 54/60|Total 90/100
A1The skill completes a large-dataset Spearman run successfully.
A2The output preserves the requested method, variables, and one-sided alternative.
A3The output includes the promised core statistics.
A4The skill does not overstep into unsupported interpretation or causal claims.
Pass rate: 4 / 4
Medical Task Total84.4 / 100

Key Strengths

  • The core Pearson and Spearman workflows execute successfully on both column-oriented and row-labeled bundled datasets.
  • The skill exposes clear CLI parameters, deterministic outputs, and a lightweight reference structure that keeps the main SKILL.md concise.
  • Error reporting uses stable, human-readable SKILL_* codes that make failures easy to diagnose in automation or manual use.