lightgbm-analysis
Train and evaluate LightGBM gradient boosting models for classification or regression with hyperparameter tuning. Inputs: feature matrix, labels. Outputs: trained model, feature importance ranking, SHAP summary plot, ROC or RMSE performance curves.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated metrics, references, or claims were observed; all reported values came from generated artifacts. |
| Practice Boundaries | PASS | The skill stays inside model-training and artifact-export scope and does not make direct medical or prescriptive claims. |
| Methodological Ground | PASS | Weak-model cases were flagged as caution-only with rerun guidance rather than being presented as valid interpretation-ready findings. |
| Code Usability | PASS | The R entrypoint ran successfully across binary, regression, TXT-input, and overwrite-guardrail scenarios. |
Core Capability90 / 100 — 8 Categories
Medical TaskExecution Average: 83 / 100 — Assertions: 20/20 Passed
Run completed and exported all artifacts, but the model collapsed to one class and correctly downgraded itself to caution-only.
Regression workflow executed cleanly and exported an interpretation-eligible result set.
The tab-delimited TXT path worked correctly, and a repeated run produced identical metrics and importance tables.
The split-importance workflow completed, but the bundled example again degraded to a caution-only binary result.
The first run succeeded, and the second run correctly stopped with SKILL_OUTPUT_EXISTS instead of overwriting a populated directory.
Key Strengths
- The CLI contract is explicit, with strong parameter validation and clear SKILL_* failure messages.
- The implementation is reproducible: the workflow seeds its sampling and produced identical outputs on a repeated TXT smoke test.
- Weak-model cases are handled responsibly through model_quality_flag, interpretation_status, remediation tables, and rerun hints.
- The skill ships a genuinely runnable R workflow with bundled test data, references, and structured output artifacts.