Data Analysis

cibersort-immune-infiltration-analysis

Deconvolve bulk RNA-seq or microarray expression into 22 LM22 immune cell proportions using CIBERSORT. Inputs: expression matrix, sample groups. Outputs: immune fraction table, comparison bar plots, statistical test results.

90100Total Score

Core Capability

94 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

15 / 16

Human Usability

7 / 8

Security

11 / 12

Maintainability

12 / 12

Agent-Specific

18 / 20

Medical Task

35 / 35 Passed

91Packaged validation run

5/5

90No-plot explicit-column run

5/5

86Zero-permutation boundary

5/5

89Quantile-normalized run

5/5

88Higher-permutation run

5/5

82Missing case group

5/5

83Corrupted signature matrix

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	All quantitative statements in the re-audit were grounded in executed runs and recorded artifacts.
Practice Boundaries	PASS	The skill remains strictly non-clinical and does not provide diagnosis or treatment recommendations.
Methodological Ground	PASS	The implemented and documented workflow stays within a valid local CIBERSORT-style deconvolution scope with explicit limits.
Code Usability	PASS	Canonical, qn=true, perm=0, stress, and failure-reporting paths all executed correctly in the audit environment.

Core Capability94 / 100 — 8 Categories

Functional Suitability

Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Reliability

Failure metadata and payload preservation are now implemented; final promotion is still copy-based rather than a true transactional rename.

11 / 12

92%

Performance & Context

Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

8 / 8

100%

Agent Usability

The skill communicates outputs clearly, but still relies on artifact inspection rather than a formal response template.

15 / 16

94%

Human Usability

Malformed inputs still fail fast instead of triggering a clarification flow.

7 / 8

88%

Security

No injection or secret-handling issue was found, but logs intentionally retain file paths and run metadata.

11 / 12

92%

Maintainability

Recording logic is now separated into helper and report modules and covered by regression testing.

12 / 12

100%

Agent-Specific

Scope boundaries, failure-path guidance, and trigger precision are strong after the update.

18 / 20

90%

Core Capability Total94 / 100

Medical TaskExecution Average: 87 / 100 — Assertions: 35/35 Passed

Canonical

Packaged validation run

5/5 ✓

Variant A

No-plot explicit-column run

5/5 ✓

Edge

Zero-permutation boundary

5/5 ✓

Variant B

Quantile-normalized run

5/5 ✓

Stress

Higher-permutation run

5/5 ✓

Scope Boundary

Missing case group

5/5 ✓

Adversarial

Corrupted signature matrix

5/5 ✓

Canonical✅ Pass

Packaged validation run

Generated the full documented output set, including plots, session metadata, and append-only audit files.

Basic 37/40|Specialized 54/60|Total 91/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Variant A✅ Pass

No-plot explicit-column run

Explicit sample/group column selection worked and the plot directory remained empty as documented.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Edge✅ Pass

Zero-permutation boundary

The run completed successfully and both CLI output and run_record.txt explicitly explained why empirical P-values were NA.

Basic 35/40|Specialized 51/60|Total 86/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Variant B✅ Pass

Quantile-normalized run

The qn=true branch executed cleanly in the current container.

Basic 36/40|Specialized 53/60|Total 89/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Stress✅ Pass

Higher-permutation run

The heavier 50-permutation path completed with stable outputs and no format drift.

Basic 36/40|Specialized 52/60|Total 88/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Scope Boundary✅ Pass

Missing case group

Expected validation failure with a clear SKILL_INVALID_PARAMETER message and persisted failure artifacts.

Basic 32/40|Specialized 50/60|Total 82/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Adversarial✅ Pass

Corrupted signature matrix

Expected validation failure for non-finite signature values, with persisted run_record and output_manifest entries.

Basic 33/40|Specialized 50/60|Total 83/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

✅A5Result artifacts and scoring evidence were sufficient for audit review.

Pass rate: 5 / 5

Medical Task Total87 / 100

Key Strengths

Implementation, tests, documentation, and output artifacts are now tightly aligned around both success and failure paths.
Failure handling is materially stronger: invalid reruns no longer erase prior successful payloads and now leave auditable records.
The perm=0 boundary is now explicit, documented, and traceable in both CLI logs and run_record output.
Maintainability improved through modularized recording logic and regression coverage for the payload-preservation behavior.