Agent Skills

Roc Diagnostic Performance

AIPOCH

Use when evaluating diagnostic biomarker performance from case-control expression data with logistic regression and ROC curves, exporting coefficient and AUC tables together with a ROC PDF. NOT for: survival analysis, time-to-event outcomes, multiclass classification, calibration curves, decision-curve analysis, or nomogram construction.

23
1
FILES
roc-diagnostic-performance/
skill.md
scripts
cli.R
functions.R
io.R
main.R
plotting.R
run_analysis.R
utils.R
validation.R
references
algorithm.md
cli-guide.md
troubleshooting.md
85100Total Score
View Evaluation Report
Core Capability
91 / 100
Functional Suitability
12 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
12 / 12
Agent-Specific
17 / 20
Medical Task
23 / 25 Passed
86Basic ROC run with FOXP3, CD45, CD3E markers on Disease vs Control
5/5
83Custom group column and plot customization options
5/5
79Sample ID mismatch and insufficient case count
4/5
83Timeout parameter and saved model bundle verification
5/5
77Partial marker list with some absent genes and logistic failure
4/5

SKILL.md

ROC Diagnostic Performance

When to Use

Use this skill when you need to:

  • evaluate one or more diagnostic marker genes in a case-control cohort;
  • build a multivariable logistic regression diagnostic model from marker expression values;
  • compare the ROC performance of the full model against individual markers.

Typical user requests:

  • "Use these genes to build a diagnostic ROC model for case vs control samples."
  • "Evaluate the AUC of FOXP3, CD45, and CD3E and plot all ROC curves together."
  • "Run logistic regression on biomarker expression and export ROC results."

When Not to Use

Do not use this skill for:

  • survival or prognostic analysis with time-to-event outcomes;
  • multiclass classification tasks;
  • calibration plots, nomograms, or decision-curve analysis;
  • non-expression diagnostic inputs such as imaging, clinical scores, or mutation-only tables.

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdLogistic regression, ROC, AUC, and modeling assumptions
Need to run analysisscripts/main.RExecute Rscript scripts/main.R --expression_file ... --group_file ...
Encounter errorsreferences/troubleshooting.mdCommon SKILL_* errors and solutions
Need CLI examplesreferences/cli-guide.mdDetailed command-line examples
Need test datatests/data/Example expression matrix and group file

Usage

Rscript scripts/main.R \
  --expression_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --marker_genes FOXP3,CD45,CD3E \
  --case_group Disease \
  --output_dir ./output/ \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-e--expression_filecharacterrequiredExpression matrix file in CSV/TSV format
-g--group_filecharacterrequiredGroup file with sample IDs and labels
-m--marker_genescharacterrequiredComma-separated marker genes
-c--case_groupcharacterrequiredCase group label in the group file
--group_colcharacterNULLOptional group column name; auto-detected if omitted
-o--output_dircharacter./output/Output directory
--overwriteflagFALSEAllow writing into a non-empty output directory
-s--seedinteger42Random seed for reproducibility
-T--timeout_secondsinteger0Elapsed time limit in seconds; 0 disables timeout
--plot_widthdouble6ROC plot width in inches
--plot_heightdouble6ROC plot height in inches
--font_familycharactersansPDF font family
--line_colorscharacter#E64B35,#4DBBD5,#00A087,#3C5488,#F39B7FComma-separated ROC line colors
--line_widthdouble1.2ROC curve line width
--show_diagonalcharactertrueShow diagonal reference line: true or false
--diagonal_colorcharacter#7F7F7FDiagonal line color
--diagonal_ltyinteger2Diagonal line type
--plot_titlecharacterROC Diagnostic PerformanceROC plot title
--x_labelcharacter1 - SpecificityX-axis label
--y_labelcharacterSensitivityY-axis label
--base_cexdouble0.9Base text-size multiplier
--legend_positioncharacterbottomrightLegend position
--legend_cexdouble0.8Legend text size

Input Format

Expression Matrix (expression_file)

CSV or TSV file with genes as rows and samples as columns. The first column must store unique gene identifiers.

gene,Sample1,Sample2,Sample3
FOXP3,8.4,7.1,3.8
CD45,2.1,1.9,5.4
CD3E,5.8,6.2,4.0

Requirements

  • File extension must be .csv, .tsv, or .txt.
  • The first column must contain non-missing, unique gene identifiers.
  • Remaining columns must be sample IDs.
  • Selected marker genes must have numeric finite expression values across matched samples.

Group File (group_file)

CSV or TSV file with sample IDs in the first column and at least one group-label column.

sample,group
Sample1,Disease
Sample2,Disease
Sample3,Control

Requirements

  • File extension must be .csv, .tsv, or .txt.
  • The first column must contain non-missing, unique sample IDs.
  • At least one group column must be present.
  • The case_group value must appear in the selected group column.
  • At least 10 matched samples, 2 case samples, and 2 control samples are required.

Output Files

FileDescription
data/analysis_data.rdsMatched sample-level analysis dataset used for model fitting
data/roc_model.rdsSaved logistic regression model bundle with data and selected genes
table/model_coefficients.csvLogistic regression coefficients, z statistics, p-values, and odds ratios
table/roc_auc_summary.csvAUC values for the full model and each marker
plot/roc_curve.pdfROC curves for the full model and individual markers
session_info.txtSession information and run parameters

model_coefficients.csv

ColumnDescription
termModel term name
estimateLogistic regression coefficient
std_errorStandard error of the coefficient
z_valueWald z statistic
p_valueWald test p-value
odds_ratioExponentiated coefficient
odds_ratio_95_ciOdds ratio with 95% confidence interval

roc_auc_summary.csv

ColumnDescription
modelFull model or marker name
aucArea under the ROC curve

Workflow

Step 1: Validate Input

  • Check that the expression matrix and group file exist and have supported formats.
  • Validate unique gene identifiers and sample IDs.
  • Match samples shared by both files.

Step 2: Prepare Analysis Dataset

  • Keep only the requested marker genes that exist in the expression matrix.
  • Merge matched expression values with group labels.
  • Convert the selected case group to binary outcome labels.

Step 3: Fit Logistic Regression

  • Fit a multivariable logistic regression model using the selected markers.
  • Extract coefficient estimates, standard errors, p-values, and odds ratios.

Step 4: Compute ROC Performance

  • Generate the ROC curve of the full logistic model.
  • Generate ROC curves for each individual marker.
  • Calculate AUC values for the full model and each marker.

Step 5: Save Outputs

  • Save the matched analysis dataset and model bundle as .rds files.
  • Save coefficient and AUC summary tables as .csv files.
  • Save the combined ROC plot as a PDF.

Examples

Basic Usage

Rscript scripts/main.R \
  -e expression_matrix.csv \
  -g group_info.csv \
  -m FOXP3,CD45,CD3E \
  -c Disease \
  -o ./output/

With Explicit Group Column and Custom Plot

Rscript scripts/main.R \
  -e expression_matrix.csv \
  -g group_info.csv \
  -m FOXP3,CD45,CD3E \
  -c Disease \
  --group_col diagnosis \
  --plot_width 8 \
  --plot_height 6 \
  --plot_title "Biomarker ROC Comparison" \
  --legend_position topright \
  -o ./output/

With Test Data

Rscript scripts/main.R \
  -e tests/data/sample_expression_matrix.csv \
  -g tests/data/sample_group_info.csv \
  -m FOXP3,CD45,CD3E \
  -c Disease \
  -o tests/expected_output/ \
  --overwrite

Error Handling

ErrorCauseSolution
SKILL_INVALID_PARAMETERMissing required argument, invalid option value, invalid matrix/group structure, invalid case label, insufficient case-control counts, or logistic fitting failureCheck argument names, input content, class balance, and model stability
SKILL_FILE_NOT_FOUNDInput file does not existVerify the file path
SKILL_EMPTY_DATAInput file contains no usable rows, or no requested markers remain after filteringCheck file content, delimiter, and marker names
SKILL_MISSING_COLUMNSRequested group column is absentVerify --group_col and the group file header
SKILL_SAMPLE_MISMATCHExpression matrix and group file do not share sample IDsVerify that sample IDs match exactly between files
SKILL_PACKAGE_NOT_FOUNDRequired R package is not installedInstall the missing CRAN package

IF error persists, READ: references/troubleshooting.md


Testing

Smoke Test With Included Data

Rscript scripts/main.R --help

Rscript scripts/main.R \
  -e tests/data/sample_expression_matrix.csv \
  -g tests/data/sample_group_info.csv \
  -m FOXP3,CD45,CD3E \
  -c Disease \
  -o tests/expected_output/ \
  --overwrite

Automated Smoke Test Script

Rscript tests/run_smoke_test.R

Optional shell wrapper:

bash tests/run_smoke_test.sh

Expected Output

tests/expected_output/
|-- data/analysis_data.rds
|-- data/roc_model.rds
|-- plot/roc_curve.pdf
|-- session_info.txt
|-- table/model_coefficients.csv
`-- table/roc_auc_summary.csv

References

  1. Hosmer DW, Lemeshow S, Sturdivant RX (2013). Applied Logistic Regression.
  2. Fawcett T (2006). An Introduction to ROC Analysis. Pattern Recognition Letters.
  3. Robin X et al. (2011). pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics.

For detailed algorithm, READ: references/algorithm.md


Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • requireNamespace() dependency checks
  • Session info recording
  • Timeout parameter exposed as CLI option
  • File reading instructions in SKILL.md
  • Modular script structure in scripts/
  • Test data provided in tests/data/
  • Error handling with SKILL_* codes
  • References documented in references/

Last updated: 2026-04-17 | Version: 2.1.0