Agent Skills

Gene Protein Expression Matrix Normalization

AIPOCH

Use when normalizing bulk gene or protein expression matrices with log2 transform, z-score standardization, or min-max scaling before downstream visualization or exploratory analysis. NOT for count-model normalization such as TPM/DESeq2 size factors, batch correction, or single-cell preprocessing.

7
0
FILES
gene-protein-expression-matrix-normalization/
skill.md
scripts
cli_options.R
functions.R
io.R
main.R
recording.R
run_analysis.R
utils.R
references
algorithm.md
cli-guide.md
troubleshooting.md
93100Total Score
View Evaluation Report
Core Capability
93 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
25 / 25 Passed
94log2 normalization smoke test
5/5
93zscore normalization smoke test
5/5
93minmax normalization smoke test
5/5
91CLI help and parameter contract
5/5
90bundled output reproducibility review
5/5

SKILL.md

Gene Protein Expression Matrix Normalization

When to Use

Use this skill when the user wants to normalize a numeric expression matrix before plotting, clustering, or exploratory comparison.

Typical requests:

  • "Normalize this gene expression matrix with log2"
  • "Do z-score scaling across samples"
  • "Map protein abundance values into 0 to 1"

When Not to Use

Do not use this skill for:

  • Count-model normalization such as CPM, TPM, TMM, or DESeq2 size factors
  • Batch correction or covariate adjustment
  • Single-cell preprocessing workflows
  • Matrices that contain missing, Inf, or NaN values unless they are cleaned first

When to Read External Files

When executing the analysis, run:

Rscript scripts/main.R --input_file <matrix.csv> --output_dir <output_dir> --method <log2|zscore|minmax>
SituationFile to ReadPurpose
Need to execute the workflowscripts/main.RCLI entry point
Need algorithm detailsreferences/algorithm.mdMethod definitions and assumptions
Encounter an errorreferences/troubleshooting.mdStandard error codes and fixes
Need examples or baseline run detailsreferences/cli-guide.mdReady-to-run commands and test record
Need dependency declarationsDESCRIPTIONRuntime package list

Usage

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --output_dir ./output \
  --method log2 \
  --pseudo_count 1 \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filefilerequiredExpression matrix in CSV or TSV format
-o--output_dirdir./outputOutput directory
-m--methodstringlog2Normalization method: log2, zscore, minmax
-r--marginstringcolumnApply normalization by row or column
-p--pseudo_countnumeric1Added before log2 transformation
-c--centerbooleantrueCenter values for z-score
-s--scale_valuesbooleantrueScale values for z-score
-t--timeout_secondsinteger0Optional timeout; 0 disables it
-d--delimiterstringautoInput delimiter: auto, csv, or tsv
--seedinteger42Random seed
--verbosebooleantruePrint progress logs

Input Format

The first column must contain feature identifiers. Remaining columns must be finite numeric sample values.

Missing values and non-finite values such as NA, NaN, Inf, and -Inf are rejected.

feature,S1,S2,S3
TP53,10,20,30
EGFR,3,5,9

This skill accepts gene or protein expression matrices. It does not infer count-model normalization such as CPM, TPM, TMM, or DESeq2 size factors.

Output Files

If --output_dir already exists, result files with the same names are overwritten. When --verbose=true, the workflow prints a warning before writing into a non-empty output directory.

For single-sample inputs, feature_summary.csv reports per-feature standard deviations as 0 by design because each feature contributes one observed value.

FileDescription
table/normalized_matrix.csvNormalized matrix with the original feature column preserved
table/feature_summary.csvPer-feature min, max, mean, and SD before and after normalization
table/sample_summary.csvPer-sample min, max, mean, and SD before and after normalization
data/normalized_matrix.rdsSerialized normalized matrix and run metadata
run_record.txtStructured execution record
output_manifest.txtOutput file manifest
session_info.txtR session information

Methods

log2

Computes log2(x + pseudo_count) for each numeric value.

zscore

Centers and scales along the selected margin. margin=column standardizes each sample; margin=row standardizes each feature.

When center=false and scale_values=true, the workflow divides by standard deviation without subtracting the mean first.

minmax

Rescales values to [0, 1] along the selected margin. Constant vectors are returned as zeros to avoid division-by-zero errors.

Error Handling

ErrorCauseSolution
SKILL_FILE_NOT_FOUNDInput file path is invalidCheck the input path
SKILL_MISSING_COLUMNSMatrix has fewer than two columnsProvide one feature column and at least one sample column
SKILL_INVALID_PARAMETERCLI value is unsupported or malformed, or the matrix contains non-finite valuesReview the argument table and inspect the matrix values
SKILL_TIMEOUTThe run exceeded --timeout_secondsIncrease the timeout or simplify the input size
SKILL_EMPTY_DATANo usable rows or columns remainCheck the input matrix

Testing

Rscript scripts/main.R --help

Rscript tests/run_tests.R

Rscript tests/run_tests.R audit_output_check

Rscript tests/test_skill.R

Rscript tests/test_skill.R audit_output_check --skip-prepare

tests/run_tests.R executes bundled log2, zscore, and minmax runs and writes their outputs under tests/output/.

When you pass a relative directory name such as audit_output_check, the test runner writes outputs under tests/output/audit_output_check/.

Run tests/run_tests.R before tests/test_skill.R when you want to validate pre-generated outputs explicitly. The validation script can also prepare missing outputs on its own.