Data Analysis

consensus-clustering-analysis

Identify robust molecular subtypes via consensus clustering (ConsensusClusterPlus). Inputs: expression matrix. Outputs: optimal k selection plot, consensus matrix heatmaps, subtype membership table, silhouette scores.

94100Total Score
Core Capability
94 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
20 / 20 Passed
96Default case-group run
4/4
96Alias-column group file
4/4
86Single-gene custom list
4/4
93Top-10 features with K=4
4/4
95Custom gene list with K=4
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSScientific integrity check completed without reportable redline findings.
Practice BoundariesPASSPractice boundaries check completed without reportable redline findings.
Methodological GroundPASSMethodological ground check completed without reportable redline findings.
Code UsabilityPASSThe R code executed successfully in the test suite and all live audit runs.

Core Capability94 / 1008 Categories

Functional Suitability
Dependency installation steps are not documented for first-run users.
11 / 12
92%
Reliability
Recovery is strong, but there is no resumable checkpoint after partial method execution.
11 / 12
92%
Performance & Context
Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
8 / 8
100%
Agent Usability
Execution mode is implied by examples rather than explicitly labeled near the top.
15 / 16
94%
Human Usability
Trigger language is slightly technical for non-specialist users.
7 / 8
88%
Security
Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Maintainability
Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.
12 / 12
100%
Agent-Specific
First-run environment setup and handoff guidance could be more explicit.
18 / 20
90%
Core Capability Total94 / 100

Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed

96
Canonical
Default case-group run
4/4
96
Variant A
Alias-column group file
4/4
86
Edge
Single-gene custom list
4/4
93
Variant B
Top-10 features with K=4
4/4
95
Stress
Custom gene list with K=4
4/4
96
Canonical✅ Pass
Default case-group run

Selected pearson + hc with K=2 and PAC=0.0000.

Basic 39/40|Specialized 57/60|Total 96/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
96
Variant A✅ Pass
Alias-column group file

Alias column detection succeeded and the seeded output matched the canonical run.

Basic 39/40|Specialized 57/60|Total 96/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
86
Edge✅ Pass
Single-gene custom list

Expected validation failure: SKILL_INVALID_DATA was emitted before clustering work started.

Basic 36/40|Specialized 50/60|Total 86/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
93
Variant B✅ Pass
Top-10 features with K=4

Non-default parameters were respected; selected pearson + hc with K=2 and PAC=0.1895.

Basic 38/40|Specialized 55/60|Total 93/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
95
Stress✅ Pass
Custom gene list with K=4

Completed the broader method grid and selected euclidean + hc with K=4 and PAC=0.1421.

Basic 39/40|Specialized 56/60|Total 95/100
A1Required outputs were generated for the audited workflow.
A2Input handling and validation behaved as documented.
A3No unsupported medical or scientific claim fabrication was detected.
A4Execution stayed within the stated Data Analysis skill scope.
Pass rate: 4 / 4
Medical Task Total93.2 / 100

Key Strengths

  • Strong progressive disclosure keeps the main skill concise while deeper details live in references and modular scripts.
  • Input validation is robust and backed by a clear SKILL_* error taxonomy.
  • Reproducibility is strong through seed control, session capture, and deterministic reruns.
  • Engineering quality is high: modular R files, passing automated tests, and clean output contracts.