Data Analysis

decision-tree-analysis

Build and visualize classification or regression decision trees (rpart/sklearn DecisionTreeClassifier). Inputs: feature matrix, labels. Outputs: tree diagram, node split rules, feature importance scores, cross-validation accuracy.

88100Total Score

Core Capability

90 / 100

Functional Suitability

11 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

15 / 16

Human Usability

7 / 8

Security

9 / 12

Maintainability

12 / 12

Agent-Specific

17 / 20

Medical Task

20 / 20 Passed

91Sample1 classification run

4/4

87Auto-detected regression run

4/4

93High-dimensional TXT classification

4/4

90Excluded-column TXT export

4/4

68Missing-column validation failure

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Outputs were computed from local data and no fabricated analytical claims were observed.
Practice Boundaries	PASS	The skill stays within data-analysis scope and does not produce diagnostic or prescriptive medical advice.
Methodological Ground	PASS	The workflow applies documented classification and regression paths consistently, with no principled methodological failure observed in the audit runs.
Code Usability	PASS	The R entrypoint executed successfully across all valid test cases and failed cleanly on the intentional validation case.

Core Capability90 / 100 — 8 Categories

Functional Suitability

Coverage is strong across classification, regression, and auto detection, but the validation section does not include a regression example.

11 / 12

92%

Reliability

Validation and error codes are strong, though recovery guidance could say more about privacy-sensitive datasets and poor-model-fit follow-up.

11 / 12

92%

Performance & Context

No issues flagged.

8 / 8

100%

Agent Usability

The command, arguments, outputs, and examples are clear, but explicit out-of-scope guidance is limited.

15 / 16

94%

Human Usability

Trigger language is natural, but the workflow remains fairly strict about CLI-shaped inputs and documented file formats.

7 / 8

88%

Security

Input validation is solid and there is no raw-code execution path, but the documentation does not warn that row identifiers may be copied into output files.

9 / 12

75%

Maintainability

No issues flagged.

12 / 12

100%

Agent-Specific

Triggering is precise and the workflow is idempotent, but escape hatches for unsupported or sensitive-data cases can be more explicit.

17 / 20

85%

Core Capability Total90 / 100

Medical TaskExecution Average: 85.8 / 100 — Assertions: 20/20 Passed

Canonical

Sample1 classification run

4/4 ✓

Variant A

Auto-detected regression run

4/4 ✓

Edge

High-dimensional TXT classification

4/4 ✓

Variant B

Excluded-column TXT export

4/4 ✓

Stress

Missing-column validation failure

4/4 ⚠

Canonical✅ Pass

Sample1 classification run

Executed exactly as documented and produced all primary artifacts plus session info.

Basic 38/40|Specialized 53/60|Total 91/100

✅A1Output directory contains feature importance, metrics, and figure artifacts.

✅A2The run resolves to classification mode.

✅A3Predictions preserve row identifiers when the first column is promoted to row names.

✅A4The skill completes without surfacing a raw stack trace.

Pass rate: 4 / 4

Variant A✅ Pass

Auto-detected regression run

Auto mode correctly switched to regression and exported the documented regression metrics.

Basic 37/40|Specialized 50/60|Total 87/100

✅A1Auto mode resolves a numeric target with many unique values to regression.

✅A2Regression metrics are emitted with the documented fields.

✅A3The feature-importance table still includes zero-importance predictors.

✅A4The run remains reproducible under the documented seed behavior.

Pass rate: 4 / 4

Edge✅ Pass

High-dimensional TXT classification

The TXT path handled 200 columns, warned about constant predictors, and still finished cleanly.

Basic 39/40|Specialized 54/60|Total 93/100

✅A1TXT input is parsed successfully.

✅A2Constant predictors are handled gracefully.

✅A3High-dimensional input still produces the documented output set.

✅A4The run stays within the declared scope of feature-importance analysis.

Pass rate: 4 / 4

Variant B✅ Pass

Excluded-column TXT export

Column exclusion and TXT table export both behaved as documented.

Basic 38/40|Specialized 52/60|Total 90/100

✅A1Requested excluded columns are removed without breaking the run.

✅A2TXT table output is honored.

✅A3Metrics and predictions are still exported when TXT table mode is used.

✅A4The run avoids redundant or surprising outputs.

Pass rate: 4 / 4

Stress⚠️ Warning

Missing-column validation failure

The command terminated early on a deliberate bad target column, but the validation message was precise and no partial outputs were created.

Basic 30/40|Specialized 38/60|Total 68/100

✅A1A missing target column is rejected with the documented error family.

✅A2The failure path is clear and free of raw stack traces.

✅A3The skill stops before creating misleading result artifacts.

✅A4The run is recoverable by correcting the column name and retrying.

Pass rate: 4 / 4

Medical Task Total85.8 / 100

Key Strengths

The skill has a clear, script-backed contract with consistent arguments, outputs, and error codes.
Validation is strong: missing inputs, bad parameters, missing columns, and insufficient data are handled explicitly.
The implementation is reproducible and largely deterministic because train/test splitting is seed-controlled.
The workflow remains usable across CSV and TXT inputs, including a high-dimensional feature set.