Agent Skills

Cibersort Immune Infiltration Analysis

AIPOCH

Use when estimating relative immune cell infiltration from a bulk expression matrix with a CIBERSORT-style nu-SVR deconvolution workflow based on an LM22 signature matrix, comparing one case group against one control group, and generating structured tables plus immune-fraction plots. NOT for single-cell RNA-seq, spatial data, clinical diagnosis, or workflows that require the original hosted CIBERSORT web service.

41
0
FILES
cibersort-immune-infiltration-analysis/
skill.md
scripts
cli_options.R
deconvolution.R
functions.R
io.R
main.R
recording.R
recording_helpers.R
recording_reports.R
run_analysis.R
utils.R
visualization.R
references
algorithm.md
cli-guide.md
troubleshooting.md
90100Total Score
View Evaluation Report
Core Capability
94 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
35 / 35 Passed
91Packaged validation run
5/5
90No-plot explicit-column run
5/5
86Zero-permutation boundary
5/5
89Quantile-normalized run
5/5
88Higher-permutation run
5/5
82Missing case group
5/5
83Corrupted signature matrix
5/5

SKILL.md

CIBERSORT Immune Infiltration Analysis

When to Use

  • Estimate relative immune cell fractions from a bulk expression matrix.
  • Compare one case group against one control group after deconvolution.
  • Generate structured tables, a serialized result object, and optional PDF plots.

When Not to Use

  • Single-cell RNA-seq, spatial transcriptomics, or clustering tasks.
  • Absolute clinical interpretation or treatment recommendation.
  • Workflows that require the original online CIBERSORT service instead of a local R implementation.

Workflow

  1. Confirm that the expression matrix, group file, and signature matrix are available.
  2. Run scripts/main.R with the case and control groups.
  3. Review the full result table, derived summary tables, and optional plots.
  4. Inspect run_record.txt and output_manifest.txt after each run, including failed validation attempts.

When to Read External Files

SituationFile to ReadPurpose
Need to run the analysisscripts/main.RCLI entry point
Need algorithm detailsreferences/algorithm.mdHQ reference workflow and result interpretation
Encounter an errorreferences/troubleshooting.mdError codes and environment fixes
Need CLI examples or the baseline recordreferences/cli-guide.mdExample commands and validation notes
Need packaged test inputstests/data/Demo expression matrix, group file, and LM22 file

Usage

Rscript scripts/main.R \
  --input_file ./expression_matrix.csv \
  --group_file ./group_info.csv \
  --signature_file ./LM22.txt \
  --case_group treatment \
  --control_group control \
  --output_dir ./output \
  --qn false \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filefilerequiredExpression matrix with genes as rows and samples as columns
-g--group_filefilerequiredGroup annotation table
-a--case_groupstringrequiredCase group label
-b--control_groupstringrequiredControl group label
-o--output_dirdir./outputOutput directory
--signature_filefiletests/data/LM22.txt when presentSignature matrix file
--sample_colstring/intnoneOptional sample column name or 1-based index
--group_colstring/intnoneOptional group column name or 1-based index
--gene_id_casestringupperGene ID normalization: asis, upper, or lower
--auto_unlogbooleantrueApply 2^x only if the expression matrix passes a conservative log-scale heuristic
--min_mean_expressionnumeric1Minimum mean expression before deconvolution
--perminteger1000Permutation count for empirical p-value estimation; 0 keeps the run lightweight but records P-value as NA
--qnbooleantrueApply quantile normalization to the mixture matrix
--svm_coresinteger1Worker count for the nu-SVR model selection step
--make_plotsbooleantrueGenerate PDF plots
--plot_widthnumeric16Default plot width in inches
--plot_heightnumeric10Default plot height in inches
-s--seedinteger42Random seed
-t--timeout_secondsinteger0Optional timeout in seconds; 0 disables it
--verbosebooleantruePrint progress logs

Input Format

Expression Matrix

CSV or TSV. The first column must contain gene identifiers. Remaining columns must be numeric sample-level expression values.

When --auto_unlog=true, the workflow reports summary statistics and applies 2^x only if the matrix passes a conservative log-scale heuristic. If the matrix is ambiguous, the values are left unchanged and the startup log explains why.

If duplicate gene identifiers are present, they are consolidated after gene-ID normalization by taking the per-sample maximum before downstream filtering and deconvolution.

gene,Sample1,Sample2,Sample3
TP53,10.2,8.5,9.1
CXCL9,4.3,6.1,5.7

Group File

CSV or TSV with one sample column and one group column.

sample,group
Sample1,control
Sample2,treatment
Sample3,treatment

Signature Matrix

The packaged default is tests/data/LM22.txt. A custom signature matrix must contain one gene column followed by immune-cell signature columns.

All immune-cell signature columns must be numeric and finite. If duplicate gene identifiers are present, they are consolidated by taking the per-cell-type maximum before gene intersection.

Output Files

FileDescription
data/cibersort_input.rdsSerialized aligned input matrices used by the local algorithm
data/cibersort_null_distribution.rdsSerialized permutation null distribution
data/cibersort_result.rdsSerialized result object with cell fractions, metrics, runtime settings, and heatmap rendering metadata
table/CIBERSORT_Results.csvFull result table in CSV format
table/CIBERSORT-Results.txtFull result table in tab-delimited text format
table/cibersort_cell_fractions_wide.csvWide-format immune cell fraction table
table/cibersort_cell_fractions_long.csvLong-format immune cell fraction table
table/cibersort_group_compare.csvCase-vs-control comparison summary
table/cibersort_quality_metrics.csvSample-level P-value, Correlation, and RMSE table
table/immune_cell_correlation_matrix.csvSpearman correlation matrix across immune cell types
table/immune_cell_correlation_pvalue.csvP-value matrix aligned to the correlation matrix
plot/immune_cell_composition_sample.pdfSample-level stacked composition plot when --make_plots=true
plot/immune_group_boxplot.pdfGroup comparison boxplot when --make_plots=true
plot/immune_correlation_heatmap.pdfImmune-cell correlation heatmap when --make_plots=true
session_info.txtR session information
output_manifest.txtAppend-only output manifest for successful and failed runs
run_record.txtAppend-only structured run record, including runtime notes and failed-run summaries

When --make_plots=false, the plot/ directory may still exist as part of the standard output layout, but no PDF plot files are written.

When --perm=0, the workflow logs a warning and completes without empirical permutation testing, so the P-value column is recorded as NA.

When a rerun targets an existing --output_dir and then fails validation or execution, the previous successful payload is preserved and the failure is appended to run_record.txt and output_manifest.txt.

Error Handling

Error CodeMeaningSolution
SKILL_FILE_NOT_FOUNDAn input file or signature matrix was not foundCheck the file path and rerun
SKILL_MISSING_COLUMNSA required column is missingFix the input schema
SKILL_EMPTY_DATANo usable genes, samples, or deconvolution outputs remainCheck the data, filtering, or signature overlap
SKILL_INVALID_PARAMETERA CLI parameter is missing or invalidReview the argument table and input values
SKILL_SAMPLE_MISMATCHExpression samples and group annotations do not alignHarmonize sample identifiers
SKILL_PACKAGE_NOT_FOUNDA required R package is missingInstall the missing package
SKILL_TIMEOUTThe configured time limit was exceededIncrease --timeout_seconds or set it to 0

If the error persists, READ: references/troubleshooting.md

Input Validation

This skill accepts:

  • A bulk expression matrix file in CSV or TSV format with one gene column and numeric sample columns.
  • A group annotation file in CSV or TSV format with one sample column and one group column.
  • Exactly one case group label and one control group label for comparison.
  • An optional custom signature matrix compatible with the documented LM22-style schema.

Do not use this skill for:

  • Single-cell RNA-seq, spatial transcriptomics, or cell clustering workflows.
  • Clinical diagnosis, treatment recommendation, or patient-level medical decision making.
  • Requests that need the hosted CIBERSORT web service rather than this local R implementation.
  • Multi-group study designs that require more than one case group versus one control group in a single run.

If the user's request is outside this scope, do not proceed with the workflow. Instead respond:

"cibersort-immune-infiltration-analysis is designed for local CIBERSORT-style immune deconvolution from a bulk expression matrix with one case group and one control group. Your request appears to be outside this scope. Please provide compatible bulk-expression inputs and group labels, or use a more appropriate tool for your task."

Testing

Rscript scripts/main.R --help

Rscript tests/run_tests.R

Rscript tests/test_skill.R

Validated packaged test path:

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --group_file tests/data/group_info.csv \
  --signature_file tests/data/LM22.txt \
  --case_group Tumor \
  --control_group Healthy \
  --output_dir tests/output \
  --perm 25 \
  --qn false \
  --svm_cores 1 \
  --seed 42

Container note:

  • The packaged test path uses --qn false because preprocessCore::normalize.quantiles() may trigger environment-level thread failures in some containers.
  • If you need a quantile-normalized run, validate that environment first and record the result in references/cli-guide.md.
  • tests/run_tests.R also checks that a failed rerun does not erase an existing successful payload directory.