Data Analysis

immune-pathway-analysis

Analyze immune-related pathway activity and immune cell-type enrichment scores from bulk RNA-seq expression data. Inputs: expression matrix, immune gene sets or pathway database. Outputs: pathway activity table, immune score comparison plots.

94100Total Score

Core Capability

98 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

16 / 16

Human Usability

8 / 8

Security

12 / 12

Maintainability

12 / 12

Agent-Specific

19 / 20

Medical Task

23 / 23 Passed

93Full GSVA baseline

5/5

91Full ssGSEA baseline

5/5

90Visualize-only reuse

4/4

89Analyze-only export

4/4

94Full custom heatmap run

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Scientific integrity check completed without reportable redline findings.
Practice Boundaries	PASS	Practice boundaries check completed without reportable redline findings.
Methodological Ground	PASS	Methodological ground check completed without reportable redline findings.
Code Usability	PASS	Code usability check completed without reportable redline findings.

Core Capability98 / 100 — 8 Categories

Functional Suitability

Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Reliability

Append-only provenance improves recovery, but reruns still overwrite core output artifacts in place.

11 / 12

92%

Performance & Context

Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

8 / 8

100%

Agent Usability

Completion format now tells the agent exactly how to summarize results after execution.

16 / 16

100%

Human Usability

Fixture notes now explain the smoke-test data and the minimal gene-set helper more clearly.

8 / 8

100%

Security

Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Maintainability

Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Agent-Specific

Escape hatches and progressive disclosure are strong; reruns still append timestamped provenance by design.

19 / 20

95%

Core Capability Total98 / 100

Medical TaskExecution Average: 91.4 / 100 — Assertions: 23/23 Passed

Canonical

Full GSVA baseline

5/5 ✓

Variant A

Full ssGSEA baseline

5/5 ✓

Edge

Visualize-only reuse

4/4 ✓

Variant B

Analyze-only export

4/4 ✓

Stress

Full custom heatmap run

5/5 ✓

Canonical✅ Pass

Full GSVA baseline

Fallback to |t| ranking because no pathways met FDR <= 0.05 on the bundled demo data.

Basic 38/40|Specialized 55/60|Total 93/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Variant A✅ Pass

Full ssGSEA baseline

Method switch to ssGSEA succeeded and produced a complete output bundle.

Basic 37/40|Specialized 54/60|Total 91/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Edge✅ Pass

Visualize-only reuse

Reused saved RDS state and appended a second provenance section without rerunning analysis.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Variant B✅ Pass

Analyze-only export

Analyze mode correctly produced tables and RDS output without requiring a plot.

Basic 36/40|Specialized 53/60|Total 89/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Stress✅ Pass

Full custom heatmap run

Accepted multiple non-default plotting parameters and produced the custom heatmap PDF successfully.

Basic 38/40|Specialized 56/60|Total 94/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Medical Task Total91.4 / 100

Key Strengths

Scope boundaries are explicit and reduce misuse across deconvolution, gene-level differential expression, and single-cell requests.
The CLI surface is well validated, including safe output path handling and precise parameter checks.
Provenance is strong: session info, output manifests, and run records are generated consistently.
The skill now explains smoke-test expectations and completion reporting more clearly to both agents and users.