Data Analysis

cibersort-immune-infiltration-analysis

Deconvolve bulk RNA-seq or microarray expression into 22 LM22 immune cell proportions using CIBERSORT. Inputs: expression matrix, sample groups. Outputs: immune fraction table, comparison bar plots, statistical test results.

90100Total Score
Core Capability
94 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
35 / 35 Passed
91Packaged validation run
5/5
90No-plot explicit-column run
5/5
86Zero-permutation boundary
5/5
89Quantile-normalized run
5/5
88Higher-permutation run
5/5
82Missing case group
5/5
83Corrupted signature matrix
5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSAll quantitative statements in the re-audit were grounded in executed runs and recorded artifacts.
Practice BoundariesPASSThe skill remains strictly non-clinical and does not provide diagnosis or treatment recommendations.
Methodological GroundPASSThe implemented and documented workflow stays within a valid local CIBERSORT-style deconvolution scope with explicit limits.
Code UsabilityPASSCanonical, qn=true, perm=0, stress, and failure-reporting paths all executed correctly in the audit environment.

Core Capability94 / 1008 Categories

Functional Suitability
Functional suitability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Reliability
Failure metadata and payload preservation are now implemented; final promotion is still copy-based rather than a true transactional rename.
11 / 12
92%
Performance & Context
Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
8 / 8
100%
Agent Usability
The skill communicates outputs clearly, but still relies on artifact inspection rather than a formal response template.
15 / 16
94%
Human Usability
Malformed inputs still fail fast instead of triggering a clarification flow.
7 / 8
88%
Security
No injection or secret-handling issue was found, but logs intentionally retain file paths and run metadata.
11 / 12
92%
Maintainability
Recording logic is now separated into helper and report modules and covered by regression testing.
12 / 12
100%
Agent-Specific
Scope boundaries, failure-path guidance, and trigger precision are strong after the update.
18 / 20
90%
Core Capability Total94 / 100

Medical TaskExecution Average: 87 / 100 — Assertions: 35/35 Passed

91
Canonical
Packaged validation run
5/5
90
Variant A
No-plot explicit-column run
5/5
86
Edge
Zero-permutation boundary
5/5
89
Variant B
Quantile-normalized run
5/5
88
Stress
Higher-permutation run
5/5
82
Scope Boundary
Missing case group
5/5
83
Adversarial
Corrupted signature matrix
5/5
91
Canonical✅ Pass
Packaged validation run

Generated the full documented output set, including plots, session metadata, and append-only audit files.

Basic 37/40|Specialized 54/60|Total 91/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
90
Variant A✅ Pass
No-plot explicit-column run

Explicit sample/group column selection worked and the plot directory remained empty as documented.

Basic 36/40|Specialized 54/60|Total 90/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
86
Edge✅ Pass
Zero-permutation boundary

The run completed successfully and both CLI output and run_record.txt explicitly explained why empirical P-values were NA.

Basic 35/40|Specialized 51/60|Total 86/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
89
Variant B✅ Pass
Quantile-normalized run

The qn=true branch executed cleanly in the current container.

Basic 36/40|Specialized 53/60|Total 89/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
88
Stress✅ Pass
Higher-permutation run

The heavier 50-permutation path completed with stable outputs and no format drift.

Basic 36/40|Specialized 52/60|Total 88/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
82
Scope Boundary✅ Pass
Missing case group

Expected validation failure with a clear SKILL_INVALID_PARAMETER message and persisted failure artifacts.

Basic 32/40|Specialized 50/60|Total 82/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
83
Adversarial✅ Pass
Corrupted signature matrix

Expected validation failure for non-finite signature values, with persisted run_record and output_manifest entries.

Basic 33/40|Specialized 50/60|Total 83/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
A5Result artifacts and scoring evidence were sufficient for audit review.
Pass rate: 5 / 5
Medical Task Total87 / 100

Key Strengths

  • Implementation, tests, documentation, and output artifacts are now tightly aligned around both success and failure paths.
  • Failure handling is materially stronger: invalid reruns no longer erase prior successful payloads and now leave auditable records.
  • The perm=0 boundary is now explicit, documented, and traceable in both CLI logs and run_record output.
  • Maintainability improved through modularized recording logic and regression coverage for the payload-preservation behavior.