Agent Skills

Elastic Net Feature Selection

AIPOCH

Use when selecting predictive genes or other molecular features from bulk expression matrices for binary case-vs-control classification with elastic net logistic regression, including coefficient path and cross-validation plots. Trigger keywords: elastic net, glmnet, feature selection, binary classification, lambda.min, lambda.1se. NOT for: survival/Cox modeling, multiclass outcomes, single-cell data, or non-expression tables.

20
1
FILES
elastic-net-feature-selection/
skill.md
scripts
functions.R
io.R
main.R
modeling.R
output.R
run_analysis.R
utils.R
validation.R
references
algorithm.md
cli-guide.md
troubleshooting.md
87100Total Score
View Evaluation Report
Core Capability
98 / 100
Functional Suitability
12 / 12
Reliability
12 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
19 / 20
Medical Task
21 / 25 Passed
80Binary classification alpha=0.5 on expression_matrix.csv vs groups.csv
4/5
80Auto alpha selection with alpha_grid 0,0.25,0.5,0.75,1
4/5
80alpha=0 (ridge mode) to verify empty selected_features.csv behavior
4/5
80Conservative selection with lambda_choice=lambda.1se
4/5
82Out-of-scope label in group file SKILL_INVALID_DATA enforcement test
5/5

SKILL.md

Elastic Net Feature Selection

When to Use

  • Use this skill for binary case-vs-control classification on bulk expression matrices.
  • Use it when you need elastic net logistic regression feature selection, coefficient paths, and cv.glmnet-based lambda selection.
  • Use custom labels such as Tumor and Normal only when the group file still contains exactly two outcome levels.

Out of Scope

  • Survival or Cox modeling
  • Multiclass outcomes
  • Single-cell data
  • Non-expression tables

Out-of-scope enforcement:

  • If the group file contains any label outside the requested case_group and control_group, the command stops with SKILL_INVALID_DATA instead of silently dropping samples.
  • If either requested class is missing after validation, the command stops with SKILL_INVALID_DATA.

When to Read External Files

SituationFile to ReadPurpose
Need to understand alpha, lambda choice, or feature-selection behaviorreferences/algorithm.mdElastic net logistic regression, penalty mixing, cross-validation, and coefficient selection assumptions
Need the authoritative executable entrypointscripts/main.RRun: Rscript scripts/main.R --input_file ... --group_file ... --output_dir ...
Need parameter examples, smoke-test commands, or recorded local runsreferences/cli-guide.mdVerified CLI examples for normal runs, conservative runs, and test-data runs
Need bundled sample inputs for a first run or regression testtests/data/Sample expression matrix, group file, and feature list
Encounter errors, warnings, or timeout issuesreferences/troubleshooting.mdCommon failures, console warning interpretation, and recovery steps

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./groups.csv \
  --feature_file ./genes.csv \
  --case_group case \
  --control_group control \
  --alpha auto \
  --alpha_grid 0,0.25,0.5,0.75,1 \
  --nfolds 5 \
  --lambda_choice lambda.min \
  --standardize TRUE \
  --timeout_seconds 600 \
  --output_dir ./output/ \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredExpression matrix file (genes as rows, samples as columns)
-g--group_filecharacterrequiredGroup information file with sample and group columns
-f--feature_filecharacterNULLOptional feature list file; if omitted, all matrix rows are used
-c--case_groupcharactercasePositive class label in the group file
-d--control_groupcharactercontrolNegative class label in the group file
-a--alphacharacter0.5Elastic net mixing parameter: numeric 0-1, or auto for CV-based selection
--alpha_gridcharacter0,0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,1Comma-separated alpha candidates evaluated when alpha=auto
-n--nfoldsinteger5Cross-validation fold count; automatically reduced if a class has fewer samples
-l--lambda_choicecharacterlambda.minCoefficient extraction rule: lambda.min or lambda.1se
-z--standardizelogicalTRUEStandardize features inside glmnet
-t--timeout_secondsinteger600Elapsed timeout limit in seconds
-o--output_dircharacter./output/Output directory
-s--seedinteger42Random seed for reproducibility

Input Format

Expression Matrix (input_file)

Genes as rows, samples as columns, CSV format with gene IDs in the first column.

,Sample01,Sample02,Sample03
TNMD,0.0349,0.0533,1.3889
DPM1,4.8627,5.4208,5.6370

Group File (group_file)

CSV with sample IDs and binary group labels.

sample,group
Sample01,case
Sample02,control
Sample03,case

Feature File (feature_file)

Optional plain text or single-column CSV file with one feature per line.

TNMD
DPM1
SCYL3

Output Files

FileDescription
alpha_tuning.csvCross-validated performance summary for each alpha candidate
model_coefficients.csvCoefficients at the selected lambda, including intercept
selected_features.csvSparse selected features sorted by absolute effect size; written empty when the chosen alpha is 0 (ridge)
feature_matrix.csvSample-by-feature analysis matrix used for model fitting
coefficient_path.pdfCoefficient trajectory plot across lambda values
cv_curve.pdfCross-validation error curve with lambda.min and lambda.1se
session_info.txtR session and package version info

Workflow

Step 1: Validate Input

  • WHEN preparing input files for a first run or regression test, READ: tests/data/
  • Check file existence
  • Reject empty input files
  • Detect sample and group columns in the group file
  • Reject group files that contain labels outside the requested binary comparison
  • Validate sample matching between expression matrix and group file

Step 2: Prepare Modeling Matrix

  • Restrict samples to the requested case and control groups
  • Intersect the optional feature list with matrix row names
  • Build a sample-by-feature numeric matrix for glmnet
  • Drop zero-variance features before modeling

Step 3: Run Elastic Net

  • WHEN deciding between alpha, lambda.min, and lambda.1se, READ: references/algorithm.md
  • If alpha=auto, evaluate the candidate alpha_grid with the same cross-validation folds
  • Fit the regularization path with glmnet
  • Run cv.glmnet to estimate the optimal lambda
  • Extract coefficients at lambda.min or lambda.1se
  • Apply runtime timeout and capture non-fatal warnings

Step 4: Export Results

  • WHEN you need exact invocation patterns or output inspection commands, READ: references/cli-guide.md
  • Save tuning tables and selected features
  • Generate coefficient path and cross-validation plots
  • Record session information for reproducibility

Methods

Elastic Net Logistic Regression

Elastic net combines lasso (L1) and ridge (L2) penalties through alpha, enabling sparse feature selection while stabilizing correlated predictors.

Cross-Validation

cv.glmnet evaluates the lambda path and reports both lambda.min and the more conservative lambda.1se.

Automatic Alpha Selection

When alpha=auto, the skill reuses the same cross-validation folds across all values in alpha_grid, compares the minimum cross-validated error for each candidate, and selects the best alpha before reporting coefficients and lambda-based outputs.

If the chosen alpha is 0, the model is ridge rather than sparse elastic net. In that case, selected_features.csv is written empty to avoid mislabeling dense ridge coefficients as selected features; use model_coefficients.csv for coefficient ranking instead.

Feature Selection Rule

Selected features are the coefficients whose absolute value exceeds a small numerical tolerance at the chosen lambda, excluding the intercept term.

If the chosen alpha is 0, the workflow writes an empty selected_features.csv because ridge coefficients are dense by design and should not be mislabeled as sparse selected features.


Examples

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -f genes.csv \
  -a auto \
  --alpha_grid 0,0.25,0.5,0.75,1 \
  -o output/first_run

Fixed-Alpha Baseline

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -f genes.csv \
  -a 0.5 \
  -o output/fixed_alpha

More Conservative Selection

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g groups.csv \
  -l lambda.1se \
  -o output/lambda_1se

Error Handling

Common Errors

ErrorCauseSolutionRead More
SKILL_FILE_NOT_FOUNDInput file does not existCheck file path and permissionsreferences/troubleshooting.md#skill_file_not_found
SKILL_EMPTY_DATAInput file exists but is emptyRe-export the input file with data rowsreferences/troubleshooting.md#skill_empty_data
SKILL_MISSING_COLUMNSGroup file lacks sample/group columnsVerify the group file structurereferences/troubleshooting.md#skill_missing_columns
SKILL_SAMPLE_MISMATCHSample IDs do not overlap between filesEnsure matrix column names match the group filereferences/troubleshooting.md#skill_sample_mismatch
SKILL_INVALID_PARAMETERCLI parameter is invalidCheck allowed values and rangesreferences/troubleshooting.md#skill_invalid_parameter
SKILL_INVALID_DATAToo few samples or usable features remainReview filtering choices and input datareferences/troubleshooting.md#skill_invalid_data
SKILL_DEPENDENCY_MISSINGRequired R package is not installedInstall missing packages before rerunningreferences/troubleshooting.md#skill_dependency_missing
SKILL_PKG_VERSIONInstalled package is too oldUpgrade the required packagereferences/troubleshooting.md#skill_pkg_version
SKILL_TIMEOUTRun exceeded the configured time limitIncrease timeout_seconds or reduce data sizereferences/troubleshooting.md#skill_timeout
SKILL_RUNTIME_ERRORAn unexpected runtime or output-write failure occurredCheck output path permissions, free space, and the last console messagereferences/troubleshooting.md#skill_runtime_error

IF error persists, READ: references/troubleshooting.md


Testing

Test with Sample Data

# Check help
Rscript scripts/main.R --help

# Run with bundled test data
Rscript scripts/main.R \
  -i tests/data/expression_matrix.csv \
  -g tests/data/groups.csv \
  -f tests/data/genes.csv \
  -a auto \
  --alpha_grid 0,0.5,1 \
  -o tests/output \
  -n 5 \
  -t 600

Validation Commands

# Inspect selected features (may be header-only if auto-alpha selects ridge)
cat tests/output/selected_features.csv

# Check plots exist
ls -la tests/output

Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • requireNamespace() dependency checks
  • Runtime package loading with library()
  • Session info recording
  • Timeout control with setTimeLimit()
  • Console warning handling
  • Out-of-scope label enforcement
  • gc() snapshot reporting
  • File reading instructions in SKILL.md
  • Modular script structure
  • Test data provided
  • Error handling with SKILL_* codes
  • Scripts in scripts/ directory
  • References in references/ directory

Last updated: 2026-04-20 | Version: 1.0.0