Data Analysis

sample-correlation-analysis

Compute pairwise sample Pearson or Spearman correlations and visualize as correlation heatmaps for QC and batch detection. Inputs: expression matrix, sample annotation. Outputs: correlation matrix, annotated heatmap, outlier sample flagging report.

86100Total Score

Core Capability

89 / 100

Functional Suitability

10 / 12

Reliability

10 / 12

Performance & Context

8 / 8

Agent Usability

14 / 16

Human Usability

7 / 8

Security

11 / 12

Maintainability

11 / 12

Agent-Specific

18 / 20

Medical Task

19 / 20 Passed

91Basic Pearson column analysis

4/4

85Row-labeled Spearman analysis

4/4

67Missing output variable

3/4

89Large Pearson TXT export

4/4

90Large Spearman one-sided run

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Outputs were generated from real bundled datasets and no fabricated statistical claims were detected.
Practice Boundaries	PASS	The skill stays within statistical computation and does not make diagnostic, prescriptive, or treatment claims.
Methodological Ground	PASS	Pearson and Spearman were applied to appropriate two-variable correlation tasks with documented assumptions and bounded scope.
Code Usability	PASS	All audited success-path commands executed successfully in the current environment without code changes.

Core Capability89 / 100 — 8 Categories

Functional Suitability

Core Pearson and Spearman workflows are covered well, but a few runtime behaviors are not fully documented.

10 / 12

83%

Reliability

Error codes are strong, but failed runs still create empty output directories and one troubleshooting dependency note is inconsistent.

10 / 12

83%

Performance & Context

Full score achieved. The skill layers detail into references and keeps the top-level workflow concise.

8 / 8

100%

Agent Usability

Instructions are easy to follow, though a few wording inconsistencies slightly raise inference cost.

14 / 16

88%

Human Usability

Trigger language is natural, but the interface remains strict on exact variable names and file-based inputs.

7 / 8

88%

Security

Input validation is strong and no dangerous execution patterns were found. Routine session metadata is still written to disk on success.

11 / 12

92%

Maintainability

Scripts and references are well separated, but some documentation has drifted from observed runtime behavior.

11 / 12

92%

Agent-Specific

Triggering, layering, and idempotency are strong. Out-of-scope guidance could be more explicit.

18 / 20

90%

Core Capability Total89 / 100

Medical TaskExecution Average: 84.4 / 100 — Assertions: 19/20 Passed

Canonical

Basic Pearson column analysis

4/4 ✓

Variant A

Row-labeled Spearman analysis

4/4 ✓

Edge

Missing output variable

3/4 ⚠

Variant B

Large Pearson TXT export

4/4 ✓

Stress

Large Spearman one-sided run

4/4 ✓

Canonical✅ Pass

Basic Pearson column analysis

Executed cleanly and produced the expected CSV output.

Basic 37/40|Specialized 54/60|Total 91/100

✅A1Output creates the expected Pearson result file in table/.

✅A2Output reports both requested variables and the correct sample size.

✅A3Output includes the core statistical fields promised by the skill.

✅A4Output stays within the skill's two-variable correlation scope.

Pass rate: 4 / 4

Variant A✅ Pass

Row-labeled Spearman analysis

Row-oriented input handling worked correctly and the result stayed accurate.

Basic 35/40|Specialized 50/60|Total 85/100

✅A1The skill handles variables stored as first-column row labels.

✅A2The requested Spearman method is preserved in the output.

✅A3The result includes the requested variable names and a valid sample size.

✅A4The skill does not invent confidence intervals when this path does not provide them.

Pass rate: 4 / 4

Edge⚠️ Warning

Missing output variable

The failure message is clear, but the run still leaves an empty output directory tree behind.

Basic 28/40|Specialized 39/60|Total 67/100

✅A1The skill surfaces a specific, actionable missing-column error.

✅A2The run does not fabricate a result file after the validation failure.

❌A3The skill avoids unnecessary side effects on failed validation.

✅A4The failure path stays within scope and does not produce misleading statistics.

Pass rate: 3 / 4

Variant B✅ Pass

Large Pearson TXT export

Custom parameters and TXT output worked as documented on the large bundled dataset.

Basic 36/40|Specialized 53/60|Total 89/100

✅A1The skill respects custom output format and prefix settings.

✅A2The output preserves the requested hypothesis and confidence level.

✅A3The larger dataset executes cleanly without manual tuning.

✅A4The output remains constrained to the requested statistical summary.

Pass rate: 4 / 4

Stress✅ Pass

Large Spearman one-sided run

The large Spearman run completed successfully and preserved all requested parameters.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1The skill completes a large-dataset Spearman run successfully.

✅A2The output preserves the requested method, variables, and one-sided alternative.

✅A3The output includes the promised core statistics.

✅A4The skill does not overstep into unsupported interpretation or causal claims.

Pass rate: 4 / 4

Medical Task Total84.4 / 100

Key Strengths

The core Pearson and Spearman workflows execute successfully on both column-oriented and row-labeled bundled datasets.
The skill exposes clear CLI parameters, deterministic outputs, and a lightweight reference structure that keeps the main SKILL.md concise.
Error reporting uses stable, human-readable SKILL_* codes that make failures easy to diagnose in automation or manual use.