Data Analysis

time-dependent-roc

Compute time-dependent AUC and ROC curves at multiple clinical time points for survival model evaluation. Inputs: survival time, event status, continuous biomarker or risk score. Outputs: time-AUC line plot, AUC table at 1/3/5 years, iAUC summary.

87100Total Score

Core Capability

88 / 100

Functional Suitability

11 / 12

Reliability

10 / 12

Performance & Context

8 / 8

Agent Usability

14 / 16

Human Usability

7 / 8

Security

10 / 12

Maintainability

11 / 12

Agent-Specific

17 / 20

Medical Task

20 / 20 Passed

94Default risk_score on sample1

4/4

92Alternate marker GPR161 on sample2

4/4

78Missing required data_file

4/4

76Invalid non-positive times

4/4

90Five-timepoint TXT export on sample3

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Outputs are computed from supplied data and no fabricated study claims, identifiers, or effect estimates were introduced by the skill.
Practice Boundaries	PASS	The skill computes ROC and AUC artifacts only and does not cross into diagnosis, treatment advice, or prescriptive clinical recommendations.
Methodological Ground	PASS	The workflow correctly targets time-dependent ROC analysis for survival data and rejects unusable time specifications before running analysis.
Code Usability	PASS	The R entry point executed successfully on valid inputs in this environment and produced the documented tables, figure, and session metadata.

Core Capability88 / 100 — 8 Categories

Functional Suitability

Core workflow, supported formats, default marker behavior, and outputs are covered well; advanced plotting and export flags are only partially documented in SKILL.md.

11 / 12

92%

Reliability

Validation paths and error prefixes are strong, but recovery guidance for dependency setup and rerun behavior could be more explicit.

10 / 12

83%

Performance & Context

Progressive disclosure is handled well through a concise SKILL.md plus references and modular scripts.

8 / 8

100%

Agent Usability

The execution path is easy to follow, though agents still need `--help` to discover several advanced flags that appear in examples.

14 / 16

88%

Human Usability

Trigger language is natural enough for statistical-analysis requests, but the skill assumes some survival-analysis vocabulary.

7 / 8

88%

Security

Input validation is strong and no secrets are involved, but the documentation does not explicitly discuss PHI handling or de-identification expectations for patient-level data.

10 / 12

83%

Maintainability

Code is cleanly split across utility, validation, analysis, and plotting files, with bundled sample data for verification.

11 / 12

92%

Agent-Specific

Trigger precision and layering are strong; explicit stop conditions for unsupported interpretation requests and overwrite caveats would make the skill more robust.

17 / 20

85%

Core Capability Total88 / 100

Medical TaskExecution Average: 86 / 100 — Assertions: 20/20 Passed

Canonical

Default risk_score on sample1

4/4 ✓

Variant A

Alternate marker GPR161 on sample2

4/4 ✓

Edge

Missing required data_file

4/4 ✓

Variant B

Invalid non-positive times

4/4 ✓

Stress

Five-timepoint TXT export on sample3

4/4 ✓

Canonical✅ Pass

Default risk_score on sample1

Executed cleanly and produced every documented artifact in the expected directory structure.

Basic 38/40|Specialized 56/60|Total 94/100

✅A1Output creates the documented AUC table in table/time_roc_auc.csv

✅A2Output creates the documented ROC point export in data/time_roc_points.csv

✅A3Output creates both figure/time_roc.pdf and session_info.txt

✅A4Default marker handling uses risk_score when --marker_col is omitted

Pass rate: 4 / 4

Variant A✅ Pass

Alternate marker GPR161 on sample2

The non-default marker path worked without adjustment and preserved the requested time points.

Basic 37/40|Specialized 55/60|Total 92/100

✅A1The alternate marker column is accepted and surfaced in outputs

✅A2The requested time points are preserved in the output table

✅A3The documented output structure is created successfully

✅A4The workflow succeeds without manual intervention on a non-default marker

Pass rate: 4 / 4

Edge✅ Pass

Missing required data_file

This is an expected validation failure path; the skill rejected the request cleanly and printed help text.

Basic 31/40|Specialized 47/60|Total 78/100

✅A1Missing input is rejected with SKILL_MISSING_INPUT

✅A2The failure path prints actionable recovery guidance

✅A3The process fails cleanly instead of hanging or emitting a stack trace

✅A4No output directory or partial analysis artifacts are created on this path

Pass rate: 4 / 4

Variant B✅ Pass

Invalid non-positive times

This is an expected validation failure path; parameter checks ran before any data loading or analysis work.

Basic 30/40|Specialized 46/60|Total 76/100

✅A1Non-positive time points are rejected with SKILL_INVALID_PARAMETER

✅A2Help text is emitted after the validation failure

✅A3Validation happens before analysis execution begins

✅A4No output directory is created for the rejected run

Pass rate: 4 / 4

Stress✅ Pass

Five-timepoint TXT export on sample3

A higher-parameter run completed successfully, including TXT exports, figure generation, and session metadata.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1TXT exports are produced in the documented data and table subdirectories

✅A2The selected marker and weighting are preserved in the output table

✅A3All requested time points are carried through to the results

✅A4The figure and session metadata are emitted for reproducibility

Pass rate: 4 / 4

Medical Task Total86 / 100

Key Strengths

The skill cleanly separates concise user-facing instructions from implementation details in scripts and references.
CLI validation is strong: missing inputs and invalid parameters fail early with stable SKILL_* messages and help output.
The implementation is executable as shipped and produces reproducible table, figure, and session metadata artifacts on valid runs.
Marker-column flexibility, multiple file formats, and standardized output directories make the skill practical for repeated analysis tasks.