Agent Skills

Differential Expression Analysis

AIPOCH

Use when analyzing bulk RNA-seq or microarray expression data to identify differentially expressed genes between two biological groups (case vs control), with volcano plots and heatmap visualization. NOT for:single-cell RNA-seq, methylation analysis, non-expression data.

26
0
FILES
differential-expression-analysis/
skill.md
scripts
diff_methods.R
diff_visualization.R
functions.R
main.R
run_analysis.R
utils.R
references
algorithm.md
cli-guide.md
troubleshooting.md
90100Total Score
View Evaluation Report
Core Capability
91 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
25 / 25 Passed
93limma differential expression smoke test
5/5
91CLI help and option contract
5/5
88sample matching and group validation review
5/5
90visualization artifact workflow
5/5
88multi-method workflow coverage review
5/5

SKILL.md

Differential Expression Analysis

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdStatistical methods, formulas, assumptions
Need to run analysisscripts/main.RExecute: Rscript scripts/main.R --input_file ... --group_file ...
Encounter errorsreferences/troubleshooting.mdCommon errors and solutions
Need CLI examplesreferences/cli-guide.mdDetailed CLI usage examples
Need test datatests/data/Sample input files for testing

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --output_dir ./output/ \
  --diff_method limma \
  --p_threshold 0.05 \
  --logfc_threshold 0.1 \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredExpression matrix file (genes as rows, samples as columns)
-g--group_filecharacterrequiredGroup information file (sample ID + group columns)
-o--output_dircharacter./output/Output directory
-m--diff_methodcharacterlimmaMethod: limma, deseq2, edger, t, wilcox
-n--norm_methodcharacterTMMNormalization for edgeR: TMM, RLE, upperquartile
-p--p_thresholdnumeric0.05P-value threshold
-f--logfc_thresholdnumeric0.1Log fold change threshold
-s--seedinteger42Random seed for reproducibility

Input Format

Expression Matrix (input_file)

Genes as rows, samples as columns, CSV format with gene ID in first column.

"","GSM1442228","GSM1442229","GSM1442230"
"0610006L08Rik",3.438,3.237,3.265
"0610007P14Rik",6.734,7.017,6.807

Group File (group_file)

CSV with sample ID and group columns.

"ID","group"
"GSM1442228","Control"
"GSM1442229","Control"
"GSM1442230","DIC"

Output Files

FileDescription
Diffanalysis.csvComplete DE results with gene_id, logFC, Pvalue, Padj
volcano_plot.pdfVolcano plot with significance thresholds
heatmap.pdfHeatmap of top upregulated/downregulated genes
session_info.txtR session and package version info
temp/rdegs.csvSignificant differentially expressed genes
temp/Diffanalysis_filtered.csvFull results with group annotations

Workflow

Step 1: Validate Input

  • Check file existence
  • Validate sample matching between expression matrix and group file
  • Verify at least 2 samples per group

Step 2: Run Differential Expression

  • Choose method: limma, DESeq2, edgeR, t-test, or Wilcoxon
  • Calculate logFC and p-values
  • Apply multiple testing correction (Benjamini-Hochberg)

Step 3: Filter Results

  • Filter by p-value and logFC thresholds
  • Classify genes as Up, Down, or Not significant

Step 4: Generate Visualizations

  • Volcano plot showing significance vs fold change
  • Heatmap of top differential genes

Methods

limma

Linear models for microarray and RNA-seq with empirical Bayes moderation. Recommended for normalized expression data (FPKM, TPM).

DESeq2

Negative binomial GLM with variance stabilization. Recommended for raw count data.

edgeR

Empirical Bayes methods with TMM normalization. Supports robust dispersion estimation.

t-test / Wilcoxon

Simple pairwise statistical tests. t-test for parametric, Wilcoxon for non-parametric.


Examples

Basic Usage (limma)

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g group_info.csv \
  -o ./output \
  -m limma

With DESeq2 for Count Data

Rscript scripts/main.R \
  -i count_matrix.csv \
  -g group_info.csv \
  -o ./output \
  -m deseq2

Custom Thresholds

Rscript scripts/main.R \
  -i expression_matrix.csv \
  -g group_info.csv \
  -o ./output \
  -p 0.01 \
  -f 0.5

Error Handling

Common Errors

ErrorCauseSolution
SKILL_FILE_NOT_FOUNDInput file doesn't existCheck file path
SKILL_SAMPLE_MISMATCHSample names don't matchVerify group file matches expression matrix columns
SKILL_INVALID_DATALess than 2 groups or samples per groupCheck group file
SKILL_FILTER_ERRORNo significant genes foundRelax thresholds or check data quality
SKILL_DEPENDENCY_MISSINGR package not installedInstall required packages

IF error persists, READ: references/troubleshooting.md


Testing

Test with Sample Data

# Check help
Rscript scripts/main.R --help

# Run with sample data
Rscript scripts/main.R \
  -i tests/data/Combined_Datasets_Matrix_mus.csv \
  -g tests/data/Combined_Datasets_mus_Group.csv \
  -o tests/output/

Validation Commands

# Count lines in output
wc -l output/Diffanalysis.csv

# Check volcano plot exists
ls -la output/volcano_plot.pdf

Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • requireNamespace() dependency checks
  • Session info recording
  • Temp file cleanup
  • File reading instructions in SKILL.md
  • Modular script structure (<100 lines per file)
  • Test data provided
  • Error handling with SKILL_* codes
  • Scripts in scripts/ directory
  • References in references/ directory

Last updated: 2026-04-01 | Version: 2.0.0