Agent Skills

Deg Screening Analysis

AIPOCH

Use when screening differentially expressed genes from a bulk expression matrix between two user-specified groups, producing DEG tables, a volcano plot, and a clustered heatmap. Triggers include DEG analysis, volcano plot, clustered heatmap, limma-based two-group comparison, and case-vs-control screening. NOT for single-cell RNA-seq, multi-group contrasts, count-model workflows such as DESeq2/edgeR, or non-expression omics data.

13
0
FILES
deg-screening-analysis/
skill.md
scripts
diff_methods.R
diff_visualization.R
functions.R
main.R
run_analysis.R
utils.R
references
algorithm.md
cli-guide.md
troubleshooting.md
90100Total Score
View Evaluation Report
Core Capability
92 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
92Bundled OA vs control default run
4/4
90Raw p-value stricter-threshold run
4/4
87Zero-DEG strict-threshold run
4/4
90Case-insensitive group-label run
4/4
88Single-DEG sparse-heatmap run
4/4

SKILL.md

Differential Expression Gene Screening Analysis (Volcano Plot & Clustered Heatmap)

When to Use

Use this skill when you need a reproducible two-group DEG workflow on a bulk expression matrix and want:

  • a full differential expression table
  • a filtered DEG table
  • a volcano plot
  • a clustered heatmap of top differential genes

Typical requests include:

  • compare case vs control samples with limma
  • screen upregulated and downregulated genes from a normalized expression matrix
  • generate a DEG table with volcano and heatmap outputs from bulk transcriptome data

Out of Scope

Do not use this skill for:

  • single-cell RNA-seq workflows
  • multi-group contrasts or factorial designs
  • count-model pipelines that require DESeq2 or edgeR
  • batch correction, covariate-adjusted models, or generalized design-matrix consulting
  • non-expression omics data

If the request falls outside this scope, stop and hand off to a more appropriate analysis workflow instead of forcing the data through this skill.

Practical Caveats

  • Diffanalysis.csv currently exports name, logFC, P.value, and P.adj.
  • --p_type controls both DEG screening semantics and volcano plot significance semantics.
  • plot/heatmap.pdf is generated only when at least two heatmap genes remain after ranking.
  • When the result is very sparse, prefer keeping tables and volcano output as the primary artifacts.

When to Read External Files

SituationFile to ReadPurpose
Need algorithm details or statistical assumptionsreferences/algorithm.mdlimma method, filtering logic, volcano/heatmap selection rules
Need to execute the workflowscripts/main.RGet the exact CLI entry and runnable command
Encounter an error code or bad input formatreferences/troubleshooting.mdMatch SKILL_* errors to causes and fixes
Need more CLI examplesreferences/cli-guide.mdSee complete command examples for common use cases
Need a minimal runnable exampletests/data/Use bundled test input files for validation

Usage

Rscript scripts/main.R \
  --input_file tests/data/oa_exp.csv \
  --group_file tests/data/oa_group.csv \
  --case OA \
  --control control \
  --output_dir ./results

Arguments

ShortLongTypeDefaultRequiredDescription
-i--input_filecharacternoneyesExpression matrix CSV. First column is gene ID, remaining columns are sample values.
-g--group_filecharacternoneyesGroup annotation CSV. The script auto-detects sample and group columns, including files where the first column is row names or index.
-o--output_dircharacter./DEGnoOutput directory for tables, plots, and session metadata.
--casecharacternoneyesCase group name to compare. Matching is case-insensitive and trimmed.
--controlcharacternoneyesControl group name to compare. Matching is case-insensitive and trimmed.
-m--diff_methodcharacterlimmanoDifferential expression method. Current implementation supports limma only.
-p--p_thresholdnumeric0.05noSignificance threshold for DEG screening.
-f--logfc_thresholdnumeric1noAbsolute log fold change threshold for DEG screening.
--top_ninteger5noNumber of top upregulated and top downregulated genes considered for heatmap selection.
--p_typecharacterp.adjnoP-value field used for significance filtering and volcano significance coloring. Allowed values: p, p.adj.
--run_plotslogicalTRUEnoWhether to generate the volcano plot and clustered heatmap.
--timeout_secondsinteger3600noMaximum allowed runtime before timeout.
-s--seedinteger42noRandom seed recorded for reproducibility.

Output Files

FileFormatDescription
session_info.txttxtR session metadata and package versions used in the run.
data/DEG_list.rdardaSerialized R object containing method, groups, thresholds, the full differential table, and the screened DEG table.
table/Diffanalysis.csvcsvFull differential expression result table with columns name, logFC, P.value, and P.adj.
table/DEG.csvcsvSignificant DEG table only, containing screened genes with group labels up or down.
plot/volcano_plot.pdfpdfVolcano plot of differential genes using the p-value mode selected by --p_type.
plot/heatmap.pdfpdfClustered heatmap for selected top differential genes when at least two heatmap genes are available and plotting is enabled.

Workflow

Step 1: Validate Input

  • check that input files exist
  • load the expression matrix and ensure it is non-empty
  • auto-detect sample and group columns in the group file
  • verify sample IDs overlap correctly
  • verify case/control groups exist and each selected group has at least two samples

Step 2: Run Differential Expression

  • fit a two-group limma linear model
  • build the contrast case - control
  • compute empirical Bayes moderated statistics
  • export the full differential result table

Step 3: Screen Differentially Expressed Genes

  • apply p_threshold and logfc_threshold
  • use P.value or P.adj based on --p_type
  • label genes as up, down, or no
  • export DEG tables and serialized result objects

Step 4: Generate Volcano Plot & Clustered Heatmap

  • build plot/volcano_plot.pdf directly from the full differential table
  • select top up and top down genes for heatmap input
  • build plot/heatmap.pdf only when at least two heatmap genes are available

Error Handling

Error CodeMeaningTypical Fix
SKILL_FILE_NOT_FOUNDInput file path does not existVerify the file path and rerun
SKILL_PACKAGE_NOT_FOUNDRequired R package is missingInstall the missing package, then rerun
SKILL_MISSING_COLUMNSInput file does not contain the necessary columnsCheck CSV structure and column placement
SKILL_EMPTY_DATAInput file is empty or limma returns no analyzable rowsValidate input content or confirm the matrix contains enough valid values
SKILL_INVALID_PARAMETERArgument value or group selection is invalidCheck thresholds, --case, --control, and --p_type
SKILL_SAMPLE_MISMATCHExpression matrix samples and group file samples do not matchAlign sample IDs between the two input files
SKILL_TIMEOUTThe run exceeded the allowed runtimeIncrease --timeout_seconds or simplify the run

If you need step-by-step fixes, read references/troubleshooting.md.

Testing

Rscript tests/run_tests.R

Minimal CLI smoke test:

Rscript scripts/main.R \
  --input_file tests/data/oa_exp.csv \
  --group_file tests/data/oa_group.csv \
  --case OA \
  --control control \
  --output_dir ./tests_output

Expected outputs:

  • tests_output/table/Diffanalysis.csv
  • tests_output/table/DEG.csv
  • tests_output/plot/volcano_plot.pdf
  • tests_output/session_info.txt

tests_output/plot/heatmap.pdf is expected only when enough significant genes remain for heatmap rendering. Runs with fewer than two selected heatmap genes skip heatmap generation with a warning instead of failing. tests_output/table/DEG.csv may be empty when no genes pass the current thresholds.

Skill name: deg-screening-analysis