Agent Skills

Lasso Logistics Analysis

AIPOCH

Use when building a binary classification model from an expression matrix or other omics feature matrix with LASSO logistic regression, cross-validation, and coefficient path visualization. NOT for: multiclass classification, survival/Cox models, or ordinary linear regression.

18
1
FILES
lasso-logistics-analysis/
skill.md
scripts
cli_utils.R
io.R
main.R
modeling.R
plotting.R
run_analysis.R
runtime_utils.R
utils.R
references
algorithm.md
cli-guide.md
troubleshooting.md
85100Total Score
View Evaluation Report
Core Capability
94 / 100
Functional Suitability
11 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
21 / 25 Passed
78Binary classification on expression_matrix.csv vs groups.csv
4/5
78Custom --nfolds 5 and --seed 123
4/5
78Optional feature panel restriction from genes.csv
4/5
78Custom CV plot title via --cv_title flag
4/5
81Missing --case_group argument validation test
5/5

SKILL.md

LASSO Logistic Regression Analysis

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdLASSO objective function, cross-validation, and interpretation
Need to run analysisscripts/main.RExecute: Rscript scripts/main.R --input_file ... --group_file ...
Encounter errorsreferences/troubleshooting.mdCommon errors and solutions
Need CLI examplesreferences/cli-guide.mdDetailed CLI usage examples
Need test datatests/data/Sample input files for testing
Need workflow implementation detailsscripts/run_analysis.RInspect orchestration, outputs, and file-writing behavior
Need input-validation or error-handling detailsscripts/utils.R, scripts/io.RInspect validation, parsing, logging, and standardized safeguards

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./groups.csv \
  --case_group case \
  --control_group control \
  --output_dir ./output/ \
  --nfolds 10 \
  --timeout_seconds 1800 \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredExpression matrix file (features as rows, samples as columns)
-g--group_filecharacterrequiredGroup file with sample and group columns
-c--case_groupcharacterrequiredCase class label encoded as 1
-t--control_groupcharacterrequiredControl class label encoded as 0
-f--featurecharacterNULLOptional feature list file or comma-separated feature names
-n--nfoldsinteger10Cross-validation folds: 3, 5, 7, 10
--cv_titlecharacter""Optional title for the cross-validation plot
--path_titlecharacter""Optional title for the coefficient path plot
--timeout_secondsinteger1800Maximum elapsed runtime in seconds
-o--output_dircharacter./output/Output directory
-s--seedinteger42Random seed for reproducibility

Input Format

Expression Matrix (input_file)

Features as rows, samples as columns, CSV or TSV format with feature IDs in the first column.

,Sample01,Sample02,Sample03
TSPAN6,1.8479,1.8318,3.8276
TNMD,0.0349,0.0533,1.3889

Group File (group_file)

CSV or TSV with sample IDs and binary-group labels.

sample,group
Sample01,case
Sample02,control
Sample03,case

Optional Feature File (feature)

One feature per line, or pass a comma-separated feature list directly on the CLI.

TNMD
DPM1
SCYL3

Output Files

FileDescription
coefficient.csvAll coefficients at lambda.min
feature_matrix.csvSample-level matrix with original group labels and binary event column
selected_features.txtNon-zero features at lambda.min excluding the intercept, when available
missing_features.txtRequested features not found in the matrix, when applicable
lasso_lambda_binary_plot.pdfCross-validation curve
lasso_var_binary_plot.pdfCoefficient path plot
session_info.txtR session and package version info

Workflow

Step 1: Validate Input

WHEN checking validation rules or parsing behavior, READ: scripts/utils.R and scripts/io.R

  • Check file existence
  • Read expression matrix and group file
  • Verify samples match between files
  • Ensure both classes are present with at least 2 samples per class

Step 2: Prepare Modeling Matrix

WHEN checking class encoding or feature filtering behavior, READ: scripts/modeling.R

  • Encode case_group as 1 and control_group as 0
  • Optionally restrict to a user-supplied feature panel
  • Transpose expression data to sample-by-feature format

Step 3: Fit LASSO Logistic Regression

WHEN understanding the statistical method or lambda selection, READ: references/algorithm.md

  • Train a binomial glmnet model with alpha = 1
  • Run cv.glmnet to select the optimal lambda
  • Extract coefficients at lambda.min

Step 4: Save Results and Visualizations

WHEN checking output generation or plot behavior, READ: scripts/run_analysis.R and scripts/plotting.R

  • Save flat output files directly into output_dir
  • Generate cross-validation and coefficient path PDF plots
  • Leave plot titles empty by default unless the user provides custom titles

Methods

LASSO Logistic Regression

The model minimizes binomial deviance with an L1 penalty, shrinking weak coefficients to zero and performing embedded feature selection.

Cross-Validation

cv.glmnet evaluates candidate lambda values across nfolds folds and reports lambda.min and lambda.1se.


Examples

Basic Usage

Rscript scripts/main.R \
  -i ./expression_matrix.csv \
  -g ./groups.csv \
  -c case \
  -t control \
  -o ./output

Use a Feature Panel

Rscript scripts/main.R \
  -i ./expression_matrix.csv \
  -g ./groups.csv \
  -c case \
  -t control \
  -f ./genes.txt \
  -o ./output

Custom Folds and Seed

Rscript scripts/main.R \
  -i ./expression_matrix.csv \
  -g ./groups.csv \
  -c case \
  -t control \
  -n 5 \
  --timeout_seconds 900 \
  -s 123 \
  -o ./output

Custom Plot Titles

Rscript scripts/main.R \
  -i ./expression_matrix.csv \
  -g ./groups.csv \
  -c case \
  -t control \
  --cv_title "LASSO Cross-Validation" \
  --path_title "LASSO Coefficient Paths" \
  --timeout_seconds 1200 \
  -o ./output

Error Handling

Common Errors

ErrorCauseSolution
SKILL_FILE_NOT_FOUNDInput file does not existCheck file path
SKILL_EMPTY_FILEAn input file exists but contains no dataVerify the file is not empty
SKILL_PARSE_ERRORThe input file cannot be parsed as CSV or TSVCheck delimiters, headers, and encoding
SKILL_FILE_WRITE_ERRORThe output directory cannot be created or writtenCheck output path and permissions
SKILL_EMPTY_DATAThe loaded table has no usable rows or columnsVerify that the input file contains valid data
SKILL_MISSING_COLUMNSThe group file does not provide the required columnsProvide sample and group columns
SKILL_INVALID_TYPEA parameter or data field has the wrong typeEnsure numeric fields are numeric and strings are valid
SKILL_SAMPLE_MISMATCHSample IDs differ between matrix and group fileMake names match exactly
SKILL_INVALID_GROUPCase/control labels not found in group fileCheck --case_group and --control_group
SKILL_INVALID_DATAToo few classes, samples, or valid featuresReview input structure and feature list
SKILL_INVALID_PARAMETERUnsupported nfolds or empty parameterUse documented argument values
SKILL_DEPENDENCY_MISSINGRequired R package not installedInstall missing CRAN package
SKILL_TIMEOUTAnalysis exceeded the configured time limitReduce feature count or increase --timeout_seconds
SKILL_MEMORY_ERRORThe runtime environment cannot allocate enough memoryReduce matrix size or available workload
SKILL_RUNTIME_ERRORAn unexpected runtime error occurredReview the exact console error and retry

IF error persists, READ: references/troubleshooting.md


Testing

Test with Sample Data

# Check help
Rscript scripts/main.R --help

# Run with sample data
Rscript scripts/main.R \
  -i tests/data/expression_matrix.csv \
  -g tests/data/groups.csv \
  -c case \
  -t control \
  -f tests/data/genes.csv \
  --timeout_seconds 1800 \
  -o tests/output

Validation Commands

# Check coefficient output
ls -la tests/output/coefficient.csv

# Check plots exist
ls -la tests/output/lasso_lambda_binary_plot.pdf
ls -la tests/output/lasso_var_binary_plot.pdf

Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • requireNamespace() dependency checks
  • Session info recording
  • Timeout control with --timeout_seconds
  • Temp file cleanup
  • File reading instructions in SKILL.md
  • Modular script structure (<150 lines per file)
  • Test data provided
  • Error handling with SKILL_* codes
  • Scripts in scripts/ directory
  • References in references/ directory

Last updated: 2026-04-17 | Version: 1.0.0