Data Analysis
consensus-clustering-analysis
Identify robust molecular subtypes via consensus clustering (ConsensusClusterPlus). Inputs: expression matrix. Outputs: optimal k selection plot, consensus matrix heatmaps, subtype membership table, silhouette scores.
94100Total Score
Core Capability
94 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
20 / 20 Passed
96Default case-group run
4/4
96Alias-column group file
4/4
86Single-gene custom list
4/4
93Top-10 features with K=4
4/4
95Custom gene list with K=4
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Scientific integrity check completed without reportable redline findings. |
| Practice Boundaries | PASS | Practice boundaries check completed without reportable redline findings. |
| Methodological Ground | PASS | Methodological ground check completed without reportable redline findings. |
| Code Usability | PASS | The R code executed successfully in the test suite and all live audit runs. |
Core Capability94 / 100 — 8 Categories
Functional Suitability
Dependency installation steps are not documented for first-run users.
11 / 12
92%
Reliability
Recovery is strong, but there is no resumable checkpoint after partial method execution.
11 / 12
92%
Performance & Context
Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
8 / 8
100%
Agent Usability
Execution mode is implied by examples rather than explicitly labeled near the top.
15 / 16
94%
Human Usability
Trigger language is slightly technical for non-specialist users.
7 / 8
88%
Security
Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Maintainability
Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Agent-Specific
First-run environment setup and handoff guidance could be more explicit.
18 / 20
90%
Core Capability Total94 / 100
Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed
96
Canonical
Default case-group run
4/4 ✓
96
Variant A
Alias-column group file
4/4 ✓
86
Edge
Single-gene custom list
4/4 ✓
93
Variant B
Top-10 features with K=4
4/4 ✓
95
Stress
Custom gene list with K=4
4/4 ✓
96
Canonical✅ Pass
Default case-group run
Selected pearson + hc with K=2 and PAC=0.0000.
Basic 39/40|Specialized 57/60|Total 96/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
96
Variant A✅ Pass
Alias-column group file
Alias column detection succeeded and the seeded output matched the canonical run.
Basic 39/40|Specialized 57/60|Total 96/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
86
Edge✅ Pass
Single-gene custom list
Expected validation failure: SKILL_INVALID_DATA was emitted before clustering work started.
Basic 36/40|Specialized 50/60|Total 86/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
93
Variant B✅ Pass
Top-10 features with K=4
Non-default parameters were respected; selected pearson + hc with K=2 and PAC=0.1895.
Basic 38/40|Specialized 55/60|Total 93/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
95
Stress✅ Pass
Custom gene list with K=4
Completed the broader method grid and selected euclidean + hc with K=4 and PAC=0.1421.
Basic 39/40|Specialized 56/60|Total 95/100
✅A1Required outputs were generated for the audited workflow.
✅A2Input handling and validation behaved as documented.
✅A3No unsupported medical or scientific claim fabrication was detected.
✅A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
Medical Task Total93.2 / 100
Key Strengths
- Strong progressive disclosure keeps the main skill concise while deeper details live in references and modular scripts.
- Input validation is robust and backed by a clear SKILL_* error taxonomy.
- Reproducibility is strong through seed control, session capture, and deterministic reruns.
- Engineering quality is high: modular R files, passing automated tests, and clean output contracts.