Agent Skills

Wgcna Analysis

AIPOCH

Use when building a weighted gene co-expression network from a bulk expression matrix and a sample group file, filtering variable genes by MAD, identifying co-expression modules with WGCNA, correlating modules with traits, and exporting module-level plots and gene tables. NOT for single-cell RNA-seq, differential expression testing, methylation analysis, or datasets that are too small for WGCNA after quality control.

17
0
FILES
wgcna-analysis/
skill.md
scripts
chunk_io.R
functions.R
io.R
main.R
plotting.R
run_analysis.R
utils.R
wgcna_core.R
wgcna_selection.R
references
algorithm.md
cli-guide.md
troubleshooting.md
90100Total Score
View Evaluation Report
Core Capability
91 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
19 / 20
Medical Task
25 / 25 Passed
90Updated smoke test
5/5
88Signed bicor run
5/5
91Documentation conformance
5/5
91Explicit module export
5/5
89Chunked loading path
5/5

SKILL.md

WGCNA Analysis

When to Read External Files

AI Agent: This section tells you when to read additional files.

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdWGCNA workflow, filtering strategy, statistics, assumptions
Need to run analysisscripts/main.RExecute: Rscript scripts/main.R --input_file ... --group_file ...
Encounter errorsreferences/troubleshooting.mdCommon errors, causes, and fixes
Need CLI examplesreferences/cli-guide.mdComplete command examples for common scenarios
Need conversion audit detailsreferences/diagnosis-report.mdSkill readiness assessment and remediation summary
Need test datatests/data/Minimal example input files for validation

When Not to Use

  • Do not use for single-cell RNA-seq matrices.
  • Do not use for methylation data, DEG testing, or other non-WGCNA workflows.
  • Do not use if the user only wants exploratory discussion and does not want the analysis executed.
  • Do not proceed if the dataset is obviously too small for WGCNA or becomes too small after QC.

When an input is out of scope, stop early and state which limitation applies.


Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --output_dir ./output/ \
  --sample_column sample \
  --group_column group \
  --network_type unsigned \
  --cor_type pearson \
  --mad_quantile 0.25 \
  --min_mad 0.01 \
  --max_genes 5000 \
  --min_module_size 30 \
  --merge_cut_height 0.25 \
  --soft_r2_cutoff 0.85 \
  --module_of_interest auto \
  --top_modules 1 \
  --tom_sample_size 400 \
  --chunk_size 0 \
  --seed 42 \
  --timeout_seconds 0

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredExpression matrix file with genes in rows and samples in columns
-g--group_filecharacterrequiredSample-to-group mapping file
-o--output_dircharacter./output/Output directory
-a--sample_columncharactersampleSample column in the group file; falls back to the first column if not found
-b--group_columncharactergroupGroup column in the group file; falls back to the second column if not found
-n--network_typecharacterunsignedNetwork type for WGCNA: unsigned or signed
-c--cor_typecharacterpearsonCorrelation type: pearson or bicor
-q--mad_quantiledouble0.25MAD quantile used to define the variability cutoff
-m--min_maddouble0.01Minimum MAD cutoff combined with the quantile filter
-k--max_genesinteger0Maximum number of retained variable genes; 0 keeps all filtered genes
-p--min_module_sizeinteger30Minimum module size used by blockwiseModules()
-r--merge_cut_heightdouble0.25Merge cut height for module merging
-u--soft_r2_cutoffdouble0.85Target scale-free topology R-squared cutoff for soft-threshold selection
-t--trait_of_interestcharacterNULLTrait column used for module membership vs trait scatter plots; defaults to the first trait column
-x--module_of_interestcharacterautoModule color or comma-separated module colors to export; auto ranks modules by absolute module-trait correlation
--top_modulesinteger1Number of top-ranked modules to export when module_of_interest=auto
-y--tom_sample_sizeinteger400Number of genes sampled for the TOM heatmap
--chunk_sizeinteger0Row chunk size for large expression matrices; 0 disables chunked loading
-s--seedinteger42Random seed for reproducibility and TOM heatmap sampling
-z--timeout_secondsinteger0Optional elapsed-time limit in seconds; 0 disables timeout

Input Format

Expression Matrix (input_file)

CSV file with gene identifiers in the first column and sample IDs in the header row. All expression columns must be numeric.

,TCGA-78-7156,TCGA-44-6774,TCGA-69-A59K
A1BG,2.612908495,2.077948602,3.865545859
A1BG-AS1,3.526391179,3.221923473,4.684308865
A1CF,1.179552278,1.136255654,1.188781778

Requirements:

  • First column contains non-empty gene IDs
  • Sample column names are unique and non-empty
  • Expression values are numeric
  • After QC, the matrix must remain large enough for WGCNA

Group File (group_file)

CSV file with at least two columns: one sample ID column and one group column.

sample,group
TCGA-78-7156,Control
TCGA-44-6774,Control
TCGA-69-A59K,Control
TCGA-44-6147,Case

Requirements:

  • Sample IDs must match expression matrix column names
  • Sample IDs must be unique
  • Group values must be non-empty
  • At least two groups are required

Output Files

All files are written under output_dir.

FileFormatDescription
session_info.txttextR session information and package versions
plots/soft_threshold.pdfPDFScale-free topology fit and mean connectivity across tested powers
plots/sample_clustering.pdfPDFSample dendrogram with group color annotation
plots/gene_cluster_modules.pdfPDFGene dendrogram with module color labels
plots/module_eigengene_heatmap.pdfPDFEigengene adjacency heatmap
plots/tom_heatmap.pdfPDFTOM-based network heatmap for a sampled subset of genes
plots/module_trait_relationships.pdfPDFHeatmap of module-trait correlations and p-values
plots/module_membership_vs_trait_<module>_<trait>.pdfPDFScatter plot for each exported module against the selected trait
tables/sft_fit_indices.csvCSVSoft-threshold fit statistics returned by pickSoftThreshold()
tables/module_trait_cor.csvCSVModule eigengene vs trait correlation matrix
tables/module_trait_p.csvCSVP-value matrix corresponding to module-trait correlations
tables/module_assignments.csvCSVPer-gene module assignments
tables/selected_modules.csvCSVRanked summary of exported modules for the selected trait
tables/module_genes_<module>.csvCSVGene-level export for each selected module
tables/analysis_summary.csvCSVSummary of samples, retained genes, selected power, selected trait, and exported modules
data/net.rdsRDSFull WGCNA network object returned by blockwiseModules()
data/analysis_objects.rdsRDSSaved analysis objects, trait data, statistics, and selected modules
data/wgcna_tom-block.*.RDataRDataSaved TOM block file(s) produced by blockwiseModules()

Workflow

Step 1: Load and validate inputs

  • Validate file existence and CLI parameters
  • Read the expression matrix and group file
  • Check sample ID consistency between files

Step 2: Filter variable genes

  • Compute per-gene MAD values
  • Keep genes above the quantile-based MAD cutoff
  • Optionally cap the retained genes with max_genes

Step 3: Build the WGCNA network

  • Transpose the matrix into the WGCNA sample-by-gene format
  • Select soft-threshold power with pickSoftThreshold()
  • Build modules with blockwiseModules()

Step 4: Associate modules with traits

  • Encode groups as a trait matrix
  • Compute module-trait and gene-trait correlations
  • Rank modules by absolute correlation with the selected trait

Step 5: Export results

  • Save summary tables, module assignments, and per-module gene exports
  • Generate dendrogram, heatmap, TOM, and scatter plot PDFs
  • Save session info and serialized R objects

Error Handling

Common Errors

ErrorCauseSolution
SKILL_FILE_NOT_FOUNDInput file path is wrong or a required TOM file is missingCheck file paths and rerun the analysis from the start
SKILL_MISSING_COLUMNSRequired sample, group, or gene identifier columns are missing or emptyVerify the CSV header and column names
SKILL_EMPTY_DATAInput is empty, no genes pass filtering, or the filtered matrix is too small for WGCNACheck data quality and relax filtering settings
SKILL_INVALID_PARAMETERA CLI value is missing, out of range, or not in the allowed choicesReview parameter values in the command
SKILL_SAMPLE_MISMATCHGroup file sample IDs do not match expression matrix sample IDsMake sample identifiers identical across both files
SKILL_PACKAGE_NOT_FOUNDA required R package is not installedInstall missing packages before running

IF error persists, READ: references/troubleshooting.md


Testing

Smoke Test

# Check CLI help
Rscript scripts/main.R --help

# Run with bundled test data
Rscript scripts/main.R \
  --input_file tests/data/expression.csv \
  --group_file tests/data/group.csv \
  --output_dir tests/output-skill/ \
  --sample_column sample \
  --group_column group \
  --network_type unsigned \
  --cor_type pearson \
  --mad_quantile 0.25 \
  --min_mad 0.01 \
  --max_genes 500 \
  --min_module_size 20 \
  --merge_cut_height 0.25 \
  --soft_r2_cutoff 0.8 \
  --module_of_interest auto \
  --top_modules 1 \
  --tom_sample_size 150 \
  --chunk_size 0 \
  --seed 42 \
  --timeout_seconds 0

# Validate required outputs
Rscript tests/validate_outputs.R tests/output-skill/

Validation Commands

Rscript tests/validate_outputs.R tests/output-skill/

Expected outputs include session_info.txt, the plot PDFs, table CSVs, and the serialized R objects listed above.

Additional Verification

  • When exporting explicit modules, confirm the matching tables/module_genes_<module>.csv files exist.
  • When generating module membership plots, confirm the matching plots/module_membership_vs_trait_<module>_<trait>.pdf files exist.
  • If chunked loading is enabled, compare tables/analysis_summary.csv against a non-chunked run on the same input.

Notes

  • Re-running the same command with the same seed is expected to be reproducible.
  • Failed runs may still create output_dir and session_info.txt before exiting; this is expected and safe to overwrite on retry.

Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • Dependency checks with explicit package loading
  • Relative source() paths via get_script_dir()
  • Optional timeout parameter
  • Optional chunked loading for large matrices
  • Session info recording
  • Error handling with SKILL_* codes
  • SKILL.md parameters match the implemented CLI
  • File reading instructions in SKILL.md
  • Test data provided in tests/data/
  • Smoke-test output validation script provided

Last updated: 2026-04-17 | Version: 1.0.0