Data Analysis

immune-pathway-analysis

Analyze immune-related pathway activity and immune cell-type enrichment scores from bulk RNA-seq expression data. Inputs: expression matrix, immune gene sets or pathway database. Outputs: pathway activity table, immune score comparison plots.

94100Total Score
Core Capability
98 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
16 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
19 / 20
Medical Task
23 / 23 Passed
93Full GSVA baseline
5/5
91Full ssGSEA baseline
5/5
90Visualize-only reuse
4/4
89Analyze-only export
4/4
94Full custom heatmap run
5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSScientific integrity check completed without reportable redline findings.
Practice BoundariesPASSPractice boundaries check completed without reportable redline findings.
Methodological GroundPASSMethodological ground check completed without reportable redline findings.
Code UsabilityPASSCode usability check completed without reportable redline findings.

Core Capability98 / 1008 Categories

Functional Suitability
Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Reliability
Append-only provenance improves recovery, but reruns still overwrite core output artifacts in place.
11 / 12
92%
Performance & Context
Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
8 / 8
100%
Agent Usability
Completion format now tells the agent exactly how to summarize results after execution.
16 / 16
100%
Human Usability
Fixture notes now explain the smoke-test data and the minimal gene-set helper more clearly.
8 / 8
100%
Security
Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Maintainability
Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Agent-Specific
Escape hatches and progressive disclosure are strong; reruns still append timestamped provenance by design.
19 / 20
95%
Core Capability Total98 / 100

Medical TaskExecution Average: 91.4 / 100 — Assertions: 23/23 Passed

93
Canonical
Full GSVA baseline
5/5
91
Variant A
Full ssGSEA baseline
5/5
90
Edge
Visualize-only reuse
4/4
89
Variant B
Analyze-only export
4/4
94
Stress
Full custom heatmap run
5/5
93
Canonical✅ Pass
Full GSVA baseline

Fallback to |t| ranking because no pathways met FDR <= 0.05 on the bundled demo data.

Basic 38/40|Specialized 55/60|Total 93/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
91
Variant A✅ Pass
Full ssGSEA baseline

Method switch to ssGSEA succeeded and produced a complete output bundle.

Basic 37/40|Specialized 54/60|Total 91/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
90
Edge✅ Pass
Visualize-only reuse

Reused saved RDS state and appended a second provenance section without rerunning analysis.

Basic 36/40|Specialized 54/60|Total 90/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
89
Variant B✅ Pass
Analyze-only export

Analyze mode correctly produced tables and RDS output without requiring a plot.

Basic 36/40|Specialized 53/60|Total 89/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
94
Stress✅ Pass
Full custom heatmap run

Accepted multiple non-default plotting parameters and produced the custom heatmap PDF successfully.

Basic 38/40|Specialized 56/60|Total 94/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
Medical Task Total91.4 / 100

Key Strengths

  • Scope boundaries are explicit and reduce misuse across deconvolution, gene-level differential expression, and single-cell requests.
  • The CLI surface is well validated, including safe output path handling and precise parameter checks.
  • Provenance is strong: session info, output manifests, and run records are generated consistently.
  • The skill now explains smoke-test expectations and completion reporting more clearly to both agents and users.