Data Analysis
decision-tree-analysis
Build and visualize classification or regression decision trees (rpart/sklearn DecisionTreeClassifier). Inputs: feature matrix, labels. Outputs: tree diagram, node split rules, feature importance scores, cross-validation accuracy.
88100Total Score
Core Capability
90 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
9 / 12
Maintainability
12 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
91Sample1 classification run
4/4
87Auto-detected regression run
4/4
93High-dimensional TXT classification
4/4
90Excluded-column TXT export
4/4
68Missing-column validation failure
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Outputs were computed from local data and no fabricated analytical claims were observed. |
| Practice Boundaries | PASS | The skill stays within data-analysis scope and does not produce diagnostic or prescriptive medical advice. |
| Methodological Ground | PASS | The workflow applies documented classification and regression paths consistently, with no principled methodological failure observed in the audit runs. |
| Code Usability | PASS | The R entrypoint executed successfully across all valid test cases and failed cleanly on the intentional validation case. |
Core Capability90 / 100 — 8 Categories
Functional Suitability
Coverage is strong across classification, regression, and auto detection, but the validation section does not include a regression example.
11 / 12
92%
Reliability
Validation and error codes are strong, though recovery guidance could say more about privacy-sensitive datasets and poor-model-fit follow-up.
11 / 12
92%
Performance & Context
No issues flagged.
8 / 8
100%
Agent Usability
The command, arguments, outputs, and examples are clear, but explicit out-of-scope guidance is limited.
15 / 16
94%
Human Usability
Trigger language is natural, but the workflow remains fairly strict about CLI-shaped inputs and documented file formats.
7 / 8
88%
Security
Input validation is solid and there is no raw-code execution path, but the documentation does not warn that row identifiers may be copied into output files.
9 / 12
75%
Maintainability
No issues flagged.
12 / 12
100%
Agent-Specific
Triggering is precise and the workflow is idempotent, but escape hatches for unsupported or sensitive-data cases can be more explicit.
17 / 20
85%
Core Capability Total90 / 100
Medical TaskExecution Average: 85.8 / 100 — Assertions: 20/20 Passed
91
Canonical
Sample1 classification run
4/4 ✓
87
Variant A
Auto-detected regression run
4/4 ✓
93
Edge
High-dimensional TXT classification
4/4 ✓
90
Variant B
Excluded-column TXT export
4/4 ✓
68
Stress
Missing-column validation failure
4/4 ⚠
91
Canonical✅ Pass
Sample1 classification run
Executed exactly as documented and produced all primary artifacts plus session info.
Basic 38/40|Specialized 53/60|Total 91/100
✅A1Output directory contains feature importance, metrics, and figure artifacts.
✅A2The run resolves to classification mode.
✅A3Predictions preserve row identifiers when the first column is promoted to row names.
✅A4The skill completes without surfacing a raw stack trace.
Pass rate: 4 / 4
87
Variant A✅ Pass
Auto-detected regression run
Auto mode correctly switched to regression and exported the documented regression metrics.
Basic 37/40|Specialized 50/60|Total 87/100
✅A1Auto mode resolves a numeric target with many unique values to regression.
✅A2Regression metrics are emitted with the documented fields.
✅A3The feature-importance table still includes zero-importance predictors.
✅A4The run remains reproducible under the documented seed behavior.
Pass rate: 4 / 4
93
Edge✅ Pass
High-dimensional TXT classification
The TXT path handled 200 columns, warned about constant predictors, and still finished cleanly.
Basic 39/40|Specialized 54/60|Total 93/100
✅A1TXT input is parsed successfully.
✅A2Constant predictors are handled gracefully.
✅A3High-dimensional input still produces the documented output set.
✅A4The run stays within the declared scope of feature-importance analysis.
Pass rate: 4 / 4
90
Variant B✅ Pass
Excluded-column TXT export
Column exclusion and TXT table export both behaved as documented.
Basic 38/40|Specialized 52/60|Total 90/100
✅A1Requested excluded columns are removed without breaking the run.
✅A2TXT table output is honored.
✅A3Metrics and predictions are still exported when TXT table mode is used.
✅A4The run avoids redundant or surprising outputs.
Pass rate: 4 / 4
68
Stress⚠️ Warning
Missing-column validation failure
The command terminated early on a deliberate bad target column, but the validation message was precise and no partial outputs were created.
Basic 30/40|Specialized 38/60|Total 68/100
✅A1A missing target column is rejected with the documented error family.
✅A2The failure path is clear and free of raw stack traces.
✅A3The skill stops before creating misleading result artifacts.
✅A4The run is recoverable by correcting the column name and retrying.
Pass rate: 4 / 4
Medical Task Total85.8 / 100
Key Strengths
- The skill has a clear, script-backed contract with consistent arguments, outputs, and error codes.
- Validation is strong: missing inputs, bad parameters, missing columns, and insufficient data are handled explicitly.
- The implementation is reproducible and largely deterministic because train/test splitting is seed-controlled.
- The workflow remains usable across CSV and TXT inputs, including a high-dimensional feature set.