Data Analysis

consensus-clustering-analysis

Identify robust molecular subtypes via consensus clustering (ConsensusClusterPlus). Inputs: expression matrix. Outputs: optimal k selection plot, consensus matrix heatmaps, subtype membership table, silhouette scores.

94100Total Score

Core Capability

94 / 100

Functional Suitability

11 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

15 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

12 / 12

Agent-Specific

18 / 20

Medical Task

20 / 20 Passed

96Default case-group run

4/4

96Alias-column group file

4/4

86Single-gene custom list

4/4

93Top-10 features with K=4

4/4

95Custom gene list with K=4

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Scientific integrity check completed without reportable redline findings.
Practice Boundaries	PASS	Practice boundaries check completed without reportable redline findings.
Methodological Ground	PASS	Methodological ground check completed without reportable redline findings.
Code Usability	PASS	The R code executed successfully in the test suite and all live audit runs.

Core Capability94 / 100 — 8 Categories

Functional Suitability

Dependency installation steps are not documented for first-run users.

11 / 12

92%

Reliability

Recovery is strong, but there is no resumable checkpoint after partial method execution.

11 / 12

92%

Performance & Context

Performance and context was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

8 / 8

100%

Agent Usability

Execution mode is implied by examples rather than explicitly labeled near the top.

15 / 16

94%

Human Usability

Trigger language is slightly technical for non-specialist users.

7 / 8

88%

Security

Security was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Maintainability

Maintainability was evaluated against the skill-auditor rubric; no blocking schema-level issue was recorded for this dimension.

12 / 12

100%

Agent-Specific

First-run environment setup and handoff guidance could be more explicit.

18 / 20

90%

Core Capability Total94 / 100

Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed

Canonical

Default case-group run

4/4 ✓

Variant A

Alias-column group file

4/4 ✓

Edge

Single-gene custom list

4/4 ✓

Variant B

Top-10 features with K=4

4/4 ✓

Stress

Custom gene list with K=4

4/4 ✓

Canonical✅ Pass

Default case-group run

Selected pearson + hc with K=2 and PAC=0.0000.

Basic 39/40|Specialized 57/60|Total 96/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Variant A✅ Pass

Alias-column group file

Alias column detection succeeded and the seeded output matched the canonical run.

Basic 39/40|Specialized 57/60|Total 96/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Edge✅ Pass

Single-gene custom list

Expected validation failure: SKILL_INVALID_DATA was emitted before clustering work started.

Basic 36/40|Specialized 50/60|Total 86/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Variant B✅ Pass

Top-10 features with K=4

Non-default parameters were respected; selected pearson + hc with K=2 and PAC=0.1895.

Basic 38/40|Specialized 55/60|Total 93/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Stress✅ Pass

Custom gene list with K=4

Completed the broader method grid and selected euclidean + hc with K=4 and PAC=0.1421.

Basic 39/40|Specialized 56/60|Total 95/100

✅A1Required outputs were generated for the audited workflow.

✅A2Input handling and validation behaved as documented.

✅A3No unsupported medical or scientific claim fabrication was detected.

✅A4Execution stayed within the stated Data Analysis skill scope.

Pass rate: 4 / 4

Medical Task Total93.2 / 100

Key Strengths

Strong progressive disclosure keeps the main skill concise while deeper details live in references and modular scripts.
Input validation is robust and backed by a clear SKILL_* error taxonomy.
Reproducibility is strong through seed control, session capture, and deterministic reruns.
Engineering quality is high: modular R files, passing automated tests, and clean output contracts.