Data Analysis
immune-pathway-analysis
Analyze immune-related pathway activity and immune cell-type enrichment scores from bulk RNA-seq expression data. Inputs: expression matrix, immune gene sets or pathway database. Outputs: pathway activity table, immune score comparison plots.
94100Total Score
Core Capability
98 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
16 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
19 / 20
Medical Task
23 / 23 Passed
93Full GSVA baseline
5/5
91Full ssGSEA baseline
5/5
90Visualize-only reuse
4/4
89Analyze-only export
4/4
94Full custom heatmap run
5/5
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Scientific integrity check completed without reportable redline findings. |
| Practice Boundaries | PASS | Practice boundaries check completed without reportable redline findings. |
| Methodological Ground | PASS | Methodological ground check completed without reportable redline findings. |
| Code Usability | PASS | Code usability check completed without reportable redline findings. |
Core Capability98 / 100 — 8 Categories
Functional Suitability
Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Reliability
Append-only provenance improves recovery, but reruns still overwrite core output artifacts in place.
11 / 12
92%
Performance & Context
Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
8 / 8
100%
Agent Usability
Completion format now tells the agent exactly how to summarize results after execution.
16 / 16
100%
Human Usability
Fixture notes now explain the smoke-test data and the minimal gene-set helper more clearly.
8 / 8
100%
Security
Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Maintainability
Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Agent-Specific
Escape hatches and progressive disclosure are strong; reruns still append timestamped provenance by design.
19 / 20
95%
Core Capability Total98 / 100
Medical TaskExecution Average: 91.4 / 100 — Assertions: 23/23 Passed
93
Canonical
Full GSVA baseline
5/5 ✓
91
Variant A
Full ssGSEA baseline
5/5 ✓
90
Edge
Visualize-only reuse
4/4 ✓
89
Variant B
Analyze-only export
4/4 ✓
94
Stress
Full custom heatmap run
5/5 ✓
93
Canonical✅ Pass
Full GSVA baseline
Fallback to |t| ranking because no pathways met FDR <= 0.05 on the bundled demo data.
Basic 38/40|Specialized 55/60|Total 93/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
✅A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
91
Variant A✅ Pass
Full ssGSEA baseline
Method switch to ssGSEA succeeded and produced a complete output bundle.
Basic 37/40|Specialized 54/60|Total 91/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
✅A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
90
Edge✅ Pass
Visualize-only reuse
Reused saved RDS state and appended a second provenance section without rerunning analysis.
Basic 36/40|Specialized 54/60|Total 90/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
89
Variant B✅ Pass
Analyze-only export
Analyze mode correctly produced tables and RDS output without requiring a plot.
Basic 36/40|Specialized 53/60|Total 89/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
94
Stress✅ Pass
Full custom heatmap run
Accepted multiple non-default plotting parameters and produced the custom heatmap PDF successfully.
Basic 38/40|Specialized 56/60|Total 94/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
✅A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
Medical Task Total91.4 / 100
Key Strengths
- Scope boundaries are explicit and reduce misuse across deconvolution, gene-level differential expression, and single-cell requests.
- The CLI surface is well validated, including safe output path handling and precise parameter checks.
- Provenance is strong: session info, output manifests, and run records are generated consistently.
- The skill now explains smoke-test expectations and completion reporting more clearly to both agents and users.