Data Analysis

ssgsea-immune-infiltration-analysis

Quantify immune cell infiltration per sample using single-sample GSEA (ssGSEA) with curated immune cell gene signatures. Inputs: expression matrix. Outputs: immune cell score matrix, heatmap, violin comparison plots, correlation with clinical variables.

95100Total Score
Core Capability
94 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
20 / 20 Passed
96Default ssGSEA with plots
4/4
95GSVA Gaussian without plots
4/4
96Numeric column selectors
4/4
95Named column selectors
4/4
99Reuse-toggle no-plot cleanup
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSScientific integrity check completed without reportable redline findings.
Practice BoundariesPASSThe skill explicitly excludes clinical decision making and produced only descriptive analytic outputs.
Methodological GroundPASSThe workflow correctly frames ssGSEA/GSVA as relative enrichment and correlation as hypothesis-generating rather than causal inference.
Code UsabilityPASSThe R pipeline executed successfully across packaged tests, five audit inputs, and the output-directory reuse regression scenario.

Core Capability94 / 1008 Categories

Functional Suitability
Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Reliability
The stale-plot rerun bug is fixed; only a minor no-plot summary wording ambiguity remains in run_record.txt.
11 / 12
92%
Performance & Context
Canonical plotted runs remain moderately heavy but acceptable for the packaged example data.
7 / 8
88%
Agent Usability
Agent usability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
15 / 16
94%
Human Usability
The skill is easy to follow; the empty plot directory behavior could be documented more explicitly.
7 / 8
88%
Security
Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Maintainability
Regression coverage now protects both malformed inputs and output-directory reuse semantics.
12 / 12
100%
Agent-Specific
The skill is now effectively idempotent for reruns; only the no-plot summary wording remains slightly imprecise.
18 / 20
90%
Core Capability Total94 / 100

Medical TaskExecution Average: 96.2 / 100 — Assertions: 20/20 Passed

96
Canonical
Default ssGSEA with plots
4/4
95
Variant A
GSVA Gaussian without plots
4/4
96
Edge
Numeric column selectors
4/4
95
Variant B
Named column selectors
4/4
99
Stress
Reuse-toggle no-plot cleanup
4/4
96
Canonical✅ Pass
Default ssGSEA with plots

The full documented output set was produced, including plots, run_record.txt, and output_manifest.txt.

Basic 39/40|Specialized 57/60|Total 96/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
95
Variant A✅ Pass
GSVA Gaussian without plots

The documented GSVA audited baseline completed successfully and the manifest omitted plot entries.

Basic 38/40|Specialized 57/60|Total 95/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
96
Edge✅ Pass
Numeric column selectors

Index-based selector resolution and low-bound runtime settings completed successfully.

Basic 38/40|Specialized 58/60|Total 96/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
95
Variant B✅ Pass
Named column selectors

Named column resolution matched the documented workflow and preserved the expected output schema.

Basic 38/40|Specialized 57/60|Total 95/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
99
Stress✅ Pass
Reuse-toggle no-plot cleanup

After rerunning the same output directory with --make_plots=false, no stale PDFs remained and the manifest had no plot entries.

Basic 40/40|Specialized 59/60|Total 99/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
Medical Task Total96.2 / 100

Key Strengths

  • The previous output-directory reuse bug is fixed and now protected by an explicit regression test.
  • Generated manifests are now aligned with the documented output inventory, including run_record.txt and output_manifest.txt.
  • Repeated canonical runs with the same seed produced identical hashes for key result tables, supporting deterministic behavior.