Agent Skills

Gsea

AIPOCH

Run GSEA on a ranked gene list and produce the enrichment table, running-score table, and enrichment plots.

42
1
FILES
gsea/
skill.md
scripts
functions.R
main.R
plot_functions.R
run_analysis.R
utils.R
references
algorithm.md
cli-guide.md
troubleshooting.md
assets
ssGSEA.rds
90100Total Score
View Evaluation Report
Core Capability
91 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
6 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
24 / 24 Passed
94Human KEGG analysis
5/5
92Hallmarks via RDS
5/5
76Invalid gene column
4/4
93Plot top three pathways
5/5
91Conflicting mode inputs
5/5

SKILL.md

When to read external files

SituationReadPurpose
Need algorithm detailsreferences/algorithm.mdStatistical method and formulas
Need to run an analysisscripts/main.RFull command reference
Hit an errorreferences/troubleshooting.mdLook up error codes and fixes
Need CLI examplesreferences/cli-guide.mdWorked argument examples

Scope

Use this skill for:

  • Running GSEA on a gene list ranked by a statistic
  • Generating enrichment curve plots from existing enrichGSEA.csv and gsea_running_scores.csv
  • Smoke-testing the pipeline with tests/data/sample_deg_results.csv

Do not use it for:

  • Differential expression on raw expression matrices
  • Single-sample ssGSEA
  • Network analysis or multi-omics integration

Usage

Analysis mode: Rscript scripts/main.R --input tests/data/sample_deg_results.csv --outdir ./GSEA_analysis --type KEGG --species human --seed 42 --timeout 300

Plot mode: Rscript scripts/main.R --running_file ./GSEA_analysis/Table/gsea_running_scores.csv --enrich_file ./GSEA_analysis/Table/enrichGSEA.csv --plot_output ./GSEA_analysis/plot/gsea_plot.pdf --top_n 5 --plot_format pdf --seed 42 --timeout 300

See references/cli-guide.md for more.

Mode selection:

  • Passing only --input runs analysis mode
  • Passing both --running_file and --enrich_file runs plot mode
  • If both sets of arguments are provided, plot mode takes precedence; analysis mode is skipped and a warning is logged

Arguments

Analysis-mode arguments

ShortLongTypeDefaultRequiredDescription
-i--inputcharacterNULLyesInput CSV file
-o--outdircharacterGSEA_analysisnoOutput directory
-g--gene_colcharacternamenoGene column name
-f--fc_colcharacterlogFCnoRanking-statistic column name
-t--typecharacterKEGGnoGene-set type: KEGG, HALLMARKS, GO_BP, GO_MF, GO_CC. With a preloaded RDS, HALLMARKS is automatically mapped to the asset key Hallmarks
-s--speciescharacterhumannoSpecies: human, mouse, rat
-p--pvalue_cutoffnumeric0.05noSignificance threshold
-m--methodcharacterfgseanoGSEA backend: fgsea or DOSE
-c--chunk_sizenumeric1000noChunk size for large gene-set conversion
-r--rds_pathcharacterNULLnoPath to a pre-stored gene-set RDS
-v--verboselogicalFALSEnoVerbose logging
--seedinteger42noRandom seed
--timeoutinteger300noTimeout in seconds; <=0 disables it
-h--helplogicalFALSEnoShow help

Plot-mode arguments

ShortLongTypeDefaultRequiredDescription
--running_filecharacterNULLyesPath to gsea_running_scores.csv
--enrich_filecharacterNULLyesPath to enrichGSEA.csv
--plot_outputcharactergsea_plot.pdfnoOutput plot path
--plot_widthnumeric8noPlot width
--plot_heightnumeric6noPlot height
--plot_formatcharacterpdfnoOutput format: pdf or png
--top_nnumeric1noNumber of top pathways to plot when geneSetID is not given
--rank_bycharacterp.adjustnoColumn used to rank pathways
--geneSetIDcharacter""noComma-separated pathway IDs
--plot_titlecharacter""noPlot title
--colorscharacter#4DBBD5,#E64B35,#00A087,#F39B7F,#3C5488,#8491B4noColor list
--base_sizenumeric11noBase font size
--subplotscharacter1,2,3noSub-panel indices to display
--rel_heightscharacter1.5,0.8,1noRelative panel heights
--NES_tablelogicalTRUEnoShow NES annotation
--no_NES_tablelogicalFALSEnoDisable NES annotation
--NES_label_sizenumeric4noNES label font size
--NES_label_xnumeric0.75noNES label x position
--NES_label_ynumeric0.75noNES label y position
--NES_label_colorcharacterblacknoNES label color
--NES_label_hjustnumeric0noNES label horizontal justification
--NES_label_vjustnumeric1noNES label vertical justification
--line_widthnumeric1noES line width
--dot_sizenumeric1.2noES dot size
--legend_positioncharacterautonoLegend position
--legend_xnumeric0.02noInset legend x coordinate
--legend_ynumeric0.02noInset legend y coordinate
--legend_just_xnumeric0noLegend horizontal justification
--legend_just_ynumeric0noLegend vertical justification
--legend_text_sizenumeric9noLegend text size
--legend_key_sizenumeric0.6noLegend key size
--legend_bg_alphanumeric0noLegend background alpha
--grid_major_colorcharactergrey92noMajor grid color
--grid_minor_colorcharactergrey92noMinor grid color
--ylab_escharacterEnrichment ScorenoES panel y-axis title
--ylab_rankcharacterRanked List MetricnoRank panel y-axis title
--xlab_rankcharacterRank in Ordered DatasetnoRank panel x-axis title
--hit_heightnumeric1noHit-bar height
--hit_gapnumeric0noHit-bar gap
--hit_linewidthnumeric0.5noHit-bar line width
--rank_bar_alphanumeric0.9noRank-bar alpha
--rank_bar_height_rationumeric0.3noRank-bar height ratio
--rank_metric_segment_colorcharactergreynoRank-line color
--rank_metric_segment_widthnumeric0.3noRank-line width
--rank_metric_segment_alphanumeric1noRank-line alpha
--pvalue_tablelogicalFALSEnoShow p-value table
--ES_geomcharacterlinenoES geometry: line or dot
--verboselogicalFALSEnoVerbose logging
--seedinteger42noRandom seed
--timeoutinteger300noTimeout in seconds; <=0 disables it
-h--helplogicalFALSEnoShow help

Input format

Analysis-mode input is a CSV with at least:

  • a gene column (default name name)
  • a ranking-statistic column (default name logFC)

Example:

name,logFC,pvalue,padj
TP53,2.5,0.001,0.01
BRCA1,1.8,0.005,0.02
EGFR,-1.2,0.01,0.05

Value constraints:

  • type accepts KEGG, HALLMARKS, GO_BP, GO_MF, GO_CC
  • When using a preloaded RDS, HALLMARKS is automatically matched to the asset key Hallmarks
  • species accepts human, mouse, rat

Output files

FileFormatDescription
data/GSEA_list.rdaRDAFull GSEA result object
Table/enrichGSEA.csvCSVEnrichment result table
Table/gsea_running_scores.csvCSVRunning-score table; if no enrichment passes, a header-only file is still written
plot/directoryPlot output directory
session_info.txtTXTR version and package versions

enrichGSEA.csv mainly contains: ID, Description, NES, pvalue, p.adjust, core_enrichment.

Error handling

Common error codes:

  • SKILL_FILE_NOT_FOUND: input file does not exist
  • SKILL_MISSING_COLUMNS: required columns are missing
  • SKILL_EMPTY_DATA: input is empty, or empty after filtering
  • SKILL_INVALID_PARAMETER: an argument has an invalid value
  • SKILL_PACKAGE_NOT_FOUND: a required package is not installed
  • SKILL_ANALYSIS_FAILED: GSEA still failed after retries

Triage doc: references/troubleshooting.md

Exit codes:

  • 0: success
  • 1: failure

Testing

Minimal test dataset: tests/data/sample_deg_results.csv

Minimal command: Rscript scripts/main.R --input tests/data/sample_deg_results.csv --outdir ./test_output --type KEGG --species human --seed 42 --timeout 300 --verbose

Expected output:

  • ./test_output/data/GSEA_list.rda
  • ./test_output/Table/enrichGSEA.csv
  • ./test_output/Table/gsea_running_scores.csv
  • ./test_output/session_info.txt
  • If no significant enrichment is found, gsea_running_scores.csv is still written but contains only the header
  • Exit code 0