Agent Skills
Survival analysisData-analysis
Survival Analysis (Kaplan-Meier)
AIPOCH-AI
Survival Analysis Code Generation and Chart Plotting
38
1
FILES
survival-analysis-km/
skill.md
scripts
main.py
references
README.md
sample_data.csv
SKILL.md
Survival Analysis (Kaplan-Meier)
Kaplan-Meier survival analysis tool for clinical and biological research. Generates publication-ready survival curves with statistical tests.
Features
- Kaplan-Meier Curve Generation: Publication-quality survival plots with confidence intervals
- Statistical Tests: Log-rank test, Wilcoxon test, Peto-Peto test
- Hazard Ratios: Cox proportional hazards regression with 95% CI
- Summary Statistics: Median survival time, restricted mean survival time (RMST)
- Multi-group Analysis: Supports 2+ comparison groups
- Risk Tables: Optional at-risk table below curves
Usage
Python Script
python scripts/main.py --input data.csv --time time_col --event event_col --group group_col --output results/
Arguments
| Argument | Description | Required |
|---|---|---|
--input | Input CSV file path | Yes |
--time | Column name for survival time | Yes |
--event | Column name for event indicator (1=event, 0=censored) | Yes |
--group | Column name for grouping variable | Optional |
--output | Output directory for results | Yes |
--conf-level | Confidence level (default: 0.95) | Optional |
--risk-table | Include risk table in plot | Optional |
Input Format
CSV with columns:
- Time column: Numeric, time to event or censoring
- Event column: Binary (1 = event occurred, 0 = censored/right-censored)
- Group column: Categorical variable for stratification
Example:
patient_id,time_months,death,treatment_group
P001,24.5,1,Drug_A
P002,36.2,0,Drug_A
P003,18.7,1,Placebo
Output Files
km_curve.png: Kaplan-Meier survival curvekm_curve.pdf: Vector version for publicationssurvival_stats.csv: Statistical summary (median survival, confidence intervals)hazard_ratios.csv: Cox regression results with HR and 95% CI- `logrank_test.csv**: Pairwise comparison p-values
- `report.txt**: Human-readable summary report
Technical Details
Statistical Methods
-
Kaplan-Meier Estimator: Non-parametric maximum likelihood estimate of survival function
- Product-limit estimator: Ŝ(t) = Π(tᵢ≤t) (1 - dᵢ/nᵢ)
- Greenwood's formula for variance estimation
-
Log-Rank Test: Most widely used test for comparing survival curves
- Null hypothesis: No difference between groups
- Weighted by number at risk at each event time
-
Cox Proportional Hazards: Semi-parametric regression model
- h(t|X) = h₀(t) × exp(β₁X₁ + β₂X₂ + ...)
- Proportional hazards assumption checked via Schoenfeld residuals
Dependencies
lifelines: Core survival analysis librarymatplotlib,seaborn: Visualizationpandas,numpy: Data handlingscipy: Statistical tests
Technical Difficulty: High ⚠️
This skill involves advanced statistical modeling. Results should be reviewed by a biostatistician, especially for:
- Proportional hazards assumption violations
- Small sample sizes (< 30 per group)
- Heavy censoring (> 50%)
- Time-varying covariates
References
See references/ folder for:
- Kaplan EL, Meier P (1958) original paper
- Cox DR (1972) regression models paper
- Sample datasets for testing
- Clinical reporting guidelines (ATN, CONSORT)
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--input | str | Required | Input CSV file path |
--time | str | Required | Column name for survival time |
--event | str | Required | |
--group | str | Required | |
--output | str | Required | Output directory for results |
--conf-level | float | 0.95 | |
--risk-table | str | Required | Include risk table in plot |
--figsize | str | '10 | |
--dpi | int | 300 |
Example
# Basic survival curve
python scripts/main.py \
--input clinical_data.csv \
--time overall_survival_months \
--event death \
--group treatment_arm \
--output ./results/ \
--risk-table
Output includes:
- Survival curves with 95% confidence bands
- Median survival: Drug A = 28.4 months (95% CI: 24.1-32.7), Placebo = 18.2 months (95% CI: 15.3-21.1)
- Log-rank test p-value: 0.0023
- Hazard ratio: 0.62 (95% CI: 0.45-0.85), p = 0.003
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies
pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support