Data Analysis

time-dependent-roc

Compute time-dependent AUC and ROC curves at multiple clinical time points for survival model evaluation. Inputs: survival time, event status, continuous biomarker or risk score. Outputs: time-AUC line plot, AUC table at 1/3/5 years, iAUC summary.

87100Total Score
Core Capability
88 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
11 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
94Default risk_score on sample1
4/4
92Alternate marker GPR161 on sample2
4/4
78Missing required data_file
4/4
76Invalid non-positive times
4/4
90Five-timepoint TXT export on sample3
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSOutputs are computed from supplied data and no fabricated study claims, identifiers, or effect estimates were introduced by the skill.
Practice BoundariesPASSThe skill computes ROC and AUC artifacts only and does not cross into diagnosis, treatment advice, or prescriptive clinical recommendations.
Methodological GroundPASSThe workflow correctly targets time-dependent ROC analysis for survival data and rejects unusable time specifications before running analysis.
Code UsabilityPASSThe R entry point executed successfully on valid inputs in this environment and produced the documented tables, figure, and session metadata.

Core Capability88 / 1008 Categories

Functional Suitability
Core workflow, supported formats, default marker behavior, and outputs are covered well; advanced plotting and export flags are only partially documented in SKILL.md.
11 / 12
92%
Reliability
Validation paths and error prefixes are strong, but recovery guidance for dependency setup and rerun behavior could be more explicit.
10 / 12
83%
Performance & Context
Progressive disclosure is handled well through a concise SKILL.md plus references and modular scripts.
8 / 8
100%
Agent Usability
The execution path is easy to follow, though agents still need `--help` to discover several advanced flags that appear in examples.
14 / 16
88%
Human Usability
Trigger language is natural enough for statistical-analysis requests, but the skill assumes some survival-analysis vocabulary.
7 / 8
88%
Security
Input validation is strong and no secrets are involved, but the documentation does not explicitly discuss PHI handling or de-identification expectations for patient-level data.
10 / 12
83%
Maintainability
Code is cleanly split across utility, validation, analysis, and plotting files, with bundled sample data for verification.
11 / 12
92%
Agent-Specific
Trigger precision and layering are strong; explicit stop conditions for unsupported interpretation requests and overwrite caveats would make the skill more robust.
17 / 20
85%
Core Capability Total88 / 100

Medical TaskExecution Average: 86 / 100 — Assertions: 20/20 Passed

94
Canonical
Default risk_score on sample1
4/4
92
Variant A
Alternate marker GPR161 on sample2
4/4
78
Edge
Missing required data_file
4/4
76
Variant B
Invalid non-positive times
4/4
90
Stress
Five-timepoint TXT export on sample3
4/4
94
Canonical✅ Pass
Default risk_score on sample1

Executed cleanly and produced every documented artifact in the expected directory structure.

Basic 38/40|Specialized 56/60|Total 94/100
A1Output creates the documented AUC table in table/time_roc_auc.csv
A2Output creates the documented ROC point export in data/time_roc_points.csv
A3Output creates both figure/time_roc.pdf and session_info.txt
A4Default marker handling uses risk_score when --marker_col is omitted
Pass rate: 4 / 4
92
Variant A✅ Pass
Alternate marker GPR161 on sample2

The non-default marker path worked without adjustment and preserved the requested time points.

Basic 37/40|Specialized 55/60|Total 92/100
A1The alternate marker column is accepted and surfaced in outputs
A2The requested time points are preserved in the output table
A3The documented output structure is created successfully
A4The workflow succeeds without manual intervention on a non-default marker
Pass rate: 4 / 4
78
Edge✅ Pass
Missing required data_file

This is an expected validation failure path; the skill rejected the request cleanly and printed help text.

Basic 31/40|Specialized 47/60|Total 78/100
A1Missing input is rejected with SKILL_MISSING_INPUT
A2The failure path prints actionable recovery guidance
A3The process fails cleanly instead of hanging or emitting a stack trace
A4No output directory or partial analysis artifacts are created on this path
Pass rate: 4 / 4
76
Variant B✅ Pass
Invalid non-positive times

This is an expected validation failure path; parameter checks ran before any data loading or analysis work.

Basic 30/40|Specialized 46/60|Total 76/100
A1Non-positive time points are rejected with SKILL_INVALID_PARAMETER
A2Help text is emitted after the validation failure
A3Validation happens before analysis execution begins
A4No output directory is created for the rejected run
Pass rate: 4 / 4
90
Stress✅ Pass
Five-timepoint TXT export on sample3

A higher-parameter run completed successfully, including TXT exports, figure generation, and session metadata.

Basic 36/40|Specialized 54/60|Total 90/100
A1TXT exports are produced in the documented data and table subdirectories
A2The selected marker and weighting are preserved in the output table
A3All requested time points are carried through to the results
A4The figure and session metadata are emitted for reproducibility
Pass rate: 4 / 4
Medical Task Total86 / 100

Key Strengths

  • The skill cleanly separates concise user-facing instructions from implementation details in scripts and references.
  • CLI validation is strong: missing inputs and invalid parameters fail early with stable SKILL_* messages and help output.
  • The implementation is executable as shipped and produces reproducible table, figure, and session metadata artifacts on valid runs.
  • Marker-column flexibility, multiple file formats, and standardized output directories make the skill practical for repeated analysis tasks.