Data Analysis

ssgsea-immune-infiltration-analysis

Quantify immune cell infiltration per sample using single-sample GSEA (ssGSEA) with curated immune cell gene signatures. Inputs: expression matrix. Outputs: immune cell score matrix, heatmap, violin comparison plots, correlation with clinical variables.

95100Total Score

Core Capability

94 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

7 / 8

Agent Usability

15 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

12 / 12

Agent-Specific

18 / 20

Medical Task

20 / 20 Passed

96Default ssGSEA with plots

4/4

95GSVA Gaussian without plots

4/4

96Numeric column selectors

4/4

95Named column selectors

4/4

99Reuse-toggle no-plot cleanup

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Scientific integrity check completed without reportable redline findings.
Practice Boundaries	PASS	The skill explicitly excludes clinical decision making and produced only descriptive analytic outputs.
Methodological Ground	PASS	The workflow correctly frames ssGSEA/GSVA as relative enrichment and correlation as hypothesis-generating rather than causal inference.
Code Usability	PASS	The R pipeline executed successfully across packaged tests, five audit inputs, and the output-directory reuse regression scenario.

Core Capability94 / 100 — 8 Categories

Functional Suitability

Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Reliability

The stale-plot rerun bug is fixed; only a minor no-plot summary wording ambiguity remains in run_record.txt.

11 / 12

92%

Performance & Context

Canonical plotted runs remain moderately heavy but acceptable for the packaged example data.

7 / 8

88%

Agent Usability

Agent usability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

15 / 16

94%

Human Usability

The skill is easy to follow; the empty plot directory behavior could be documented more explicitly.

7 / 8

88%

Security

Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Maintainability

Regression coverage now protects both malformed inputs and output-directory reuse semantics.

12 / 12

100%

Agent-Specific

The skill is now effectively idempotent for reruns; only the no-plot summary wording remains slightly imprecise.

18 / 20

90%

Core Capability Total94 / 100

Medical TaskExecution Average: 96.2 / 100 — Assertions: 20/20 Passed

Canonical

Default ssGSEA with plots

4/4 ✓

Variant A

GSVA Gaussian without plots

4/4 ✓

Edge

Numeric column selectors

4/4 ✓

Variant B

Named column selectors

4/4 ✓

Stress

Reuse-toggle no-plot cleanup

4/4 ✓

Canonical✅ Pass

Default ssGSEA with plots

The full documented output set was produced, including plots, run_record.txt, and output_manifest.txt.

Basic 39/40|Specialized 57/60|Total 96/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Variant A✅ Pass

GSVA Gaussian without plots

The documented GSVA audited baseline completed successfully and the manifest omitted plot entries.

Basic 38/40|Specialized 57/60|Total 95/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Edge✅ Pass

Numeric column selectors

Index-based selector resolution and low-bound runtime settings completed successfully.

Basic 38/40|Specialized 58/60|Total 96/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Variant B✅ Pass

Named column selectors

Named column resolution matched the documented workflow and preserved the expected output schema.

Basic 38/40|Specialized 57/60|Total 95/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Stress✅ Pass

Reuse-toggle no-plot cleanup

After rerunning the same output directory with --make_plots=false, no stale PDFs remained and the manifest had no plot entries.

Basic 40/40|Specialized 59/60|Total 99/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Medical Task Total96.2 / 100

Key Strengths

The previous output-directory reuse bug is fixed and now protected by an explicit regression test.
Generated manifests are now aligned with the documented output inventory, including run_record.txt and output_manifest.txt.
Repeated canonical runs with the same seed produced identical hashes for key result tables, supporting deterministic behavior.