Data Analysis

decision-tree-analysis

Build and visualize classification or regression decision trees (rpart/sklearn DecisionTreeClassifier). Inputs: feature matrix, labels. Outputs: tree diagram, node split rules, feature importance scores, cross-validation accuracy.

88100Total Score
Core Capability
90 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
9 / 12
Maintainability
12 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
91Sample1 classification run
4/4
87Auto-detected regression run
4/4
93High-dimensional TXT classification
4/4
90Excluded-column TXT export
4/4
68Missing-column validation failure
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSOutputs were computed from local data and no fabricated analytical claims were observed.
Practice BoundariesPASSThe skill stays within data-analysis scope and does not produce diagnostic or prescriptive medical advice.
Methodological GroundPASSThe workflow applies documented classification and regression paths consistently, with no principled methodological failure observed in the audit runs.
Code UsabilityPASSThe R entrypoint executed successfully across all valid test cases and failed cleanly on the intentional validation case.

Core Capability90 / 1008 Categories

Functional Suitability
Coverage is strong across classification, regression, and auto detection, but the validation section does not include a regression example.
11 / 12
92%
Reliability
Validation and error codes are strong, though recovery guidance could say more about privacy-sensitive datasets and poor-model-fit follow-up.
11 / 12
92%
Performance & Context
No issues flagged.
8 / 8
100%
Agent Usability
The command, arguments, outputs, and examples are clear, but explicit out-of-scope guidance is limited.
15 / 16
94%
Human Usability
Trigger language is natural, but the workflow remains fairly strict about CLI-shaped inputs and documented file formats.
7 / 8
88%
Security
Input validation is solid and there is no raw-code execution path, but the documentation does not warn that row identifiers may be copied into output files.
9 / 12
75%
Maintainability
No issues flagged.
12 / 12
100%
Agent-Specific
Triggering is precise and the workflow is idempotent, but escape hatches for unsupported or sensitive-data cases can be more explicit.
17 / 20
85%
Core Capability Total90 / 100

Medical TaskExecution Average: 85.8 / 100 — Assertions: 20/20 Passed

91
Canonical
Sample1 classification run
4/4
87
Variant A
Auto-detected regression run
4/4
93
Edge
High-dimensional TXT classification
4/4
90
Variant B
Excluded-column TXT export
4/4
68
Stress
Missing-column validation failure
4/4
91
Canonical✅ Pass
Sample1 classification run

Executed exactly as documented and produced all primary artifacts plus session info.

Basic 38/40|Specialized 53/60|Total 91/100
A1Output directory contains feature importance, metrics, and figure artifacts.
A2The run resolves to classification mode.
A3Predictions preserve row identifiers when the first column is promoted to row names.
A4The skill completes without surfacing a raw stack trace.
Pass rate: 4 / 4
87
Variant A✅ Pass
Auto-detected regression run

Auto mode correctly switched to regression and exported the documented regression metrics.

Basic 37/40|Specialized 50/60|Total 87/100
A1Auto mode resolves a numeric target with many unique values to regression.
A2Regression metrics are emitted with the documented fields.
A3The feature-importance table still includes zero-importance predictors.
A4The run remains reproducible under the documented seed behavior.
Pass rate: 4 / 4
93
Edge✅ Pass
High-dimensional TXT classification

The TXT path handled 200 columns, warned about constant predictors, and still finished cleanly.

Basic 39/40|Specialized 54/60|Total 93/100
A1TXT input is parsed successfully.
A2Constant predictors are handled gracefully.
A3High-dimensional input still produces the documented output set.
A4The run stays within the declared scope of feature-importance analysis.
Pass rate: 4 / 4
90
Variant B✅ Pass
Excluded-column TXT export

Column exclusion and TXT table export both behaved as documented.

Basic 38/40|Specialized 52/60|Total 90/100
A1Requested excluded columns are removed without breaking the run.
A2TXT table output is honored.
A3Metrics and predictions are still exported when TXT table mode is used.
A4The run avoids redundant or surprising outputs.
Pass rate: 4 / 4
68
Stress⚠️ Warning
Missing-column validation failure

The command terminated early on a deliberate bad target column, but the validation message was precise and no partial outputs were created.

Basic 30/40|Specialized 38/60|Total 68/100
A1A missing target column is rejected with the documented error family.
A2The failure path is clear and free of raw stack traces.
A3The skill stops before creating misleading result artifacts.
A4The run is recoverable by correcting the column name and retrying.
Pass rate: 4 / 4
Medical Task Total85.8 / 100

Key Strengths

  • The skill has a clear, script-backed contract with consistent arguments, outputs, and error codes.
  • Validation is strong: missing inputs, bad parameters, missing columns, and insufficient data are handled explicitly.
  • The implementation is reproducible and largely deterministic because train/test splitting is seed-controlled.
  • The workflow remains usable across CSV and TXT inputs, including a high-dimensional feature set.