Agent Skills

Rf Model Importance Analysis

AIPOCH

Use when you need a standardized R CLI workflow to train a two-class random forest model from an expression-like feature matrix, rank variable importance, and generate reproducible error and importance plots. NOT for regression tasks, multi-class classification, missing-value imputation, preprocessing, or remote data fetching.

12
0
FILES
rf-model-importance-analysis/
skill.md
scripts
cli_options.R
core_option_groups.R
functions.R
io.R
main.R
option_validation.R
path_utils.R
plot_option_groups.R
recording.R
run_analysis.R
utils.R
validation_utils.R
visualization.R
references
algorithm.md
cli-guide.md
troubleshooting.md
92100Total Score
View Evaluation Report
Core Capability
97 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
20 / 20
Medical Task
20 / 20 Passed
90Bundled dataset full analysis
4/4
89Custom importance metric and thresholds
4/4
86Identical case and control labels
4/4
87Plot-only rerender from existing model
4/4
89Heavier forest on bundled data
4/4

SKILL.md

RF Model Importance Analysis

Quick Start

Use one of these three commands first, then consult the full argument table only if you need extra tuning.

1. Standard Run

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --group_file tests/data/group_info.csv \
  --case_group AR \
  --control_group Control \
  --output_dir tests/output/manual-test \
  --seed 42 \
  --timeout_seconds 300

2. Tuned Importance Run

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --group_file tests/data/group_info.csv \
  --case_group AR \
  --control_group Control \
  --output_dir tests/output/custom-importance \
  --seed 42 \
  --rf_ntree 800 \
  --rf_mtry 4 \
  --rf_imp_type 2 \
  --rf_imp_threshold 1 \
  --rf_top_n 8 \
  --rf_importance_top_n 8 \
  --timeout_seconds 300

3. Plot-Only Rerender

Run this only after a full analysis has already created output_dir/data/rf_result.rds.

Rscript scripts/main.R \
  --plot_only TRUE \
  --output_dir tests/output/manual-test \
  --seed 42 \
  --timeout_seconds 300

When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdExplain random forest modeling, importance metrics, assumptions, and result interpretation
Need to execute the analysisscripts/main.RRun the CLI entry point with a complete Rscript command
Encounter an errorreferences/troubleshooting.mdMap error codes to causes and fixes
Need CLI examplesreferences/cli-guide.mdSee installation steps and runnable command examples
Need a runnable smoke testtests/data/Use the bundled small dataset for verification

Stop Conditions

Do not use this skill when any of the following is true:

  • The task is regression, multiclass classification, time-series modeling, or remote data fetching.
  • The input still requires imputation, normalization, batch correction, or other preprocessing.
  • The feature matrix contains missing values, non-numeric feature columns, or mismatched sample IDs.

If one of those conditions applies, stop and hand off to a preprocessing or alternative modeling workflow before running this skill.

Usage

Before running the CLI, ensure the data is already cleaned for binary classification: samples in rows, numeric feature columns only, and no missing values. Imputation, normalization, and batch correction are outside this skill's scope.

Rscript scripts/main.R \
  --input_file ./input/expression_matrix.csv \
  --group_file ./input/group_info.csv \
  --case_group Case \
  --control_group Control \
  --output_dir output/basic-run \
  --seed 42 \
  --timeout_seconds 600

Arguments

ShortLongTypeDefaultRequiredDescription
-i--input_filecharacternoneyes, unless --plot_only TRUEExpression matrix file with samples in rows and features in columns
-g--group_filecharacternoneyes, unless --plot_only TRUEGroup file with sample IDs in the first column
-c--case_groupcharacternoneyes, unless --plot_only TRUECase group label
-r--control_groupcharacternoneyes, unless --plot_only TRUEControl group label
-o--output_dircharacteroutputyesOutput directory inside the skill root
-p--plot_onlylogicalFALSEnoReuse output_dir/data/rf_result.rds and regenerate plots without retraining
-s--seedinteger42noRandom seed for reproducibility
-t--timeout_secondsinteger600noElapsed time limit for the run
--rf_ntreeinteger500noNumber of trees in the random forest
--rf_mtryintegerNAnoVariables sampled at each split; NA uses the package default
--rf_nodesizeintegerNAnoMinimum terminal node size; NA uses the package default
--rf_imp_typeinteger1noImportance metric type passed to randomForest::importance; allowed values are 1 or 2
--rf_imp_thresholdnumeric0noMinimum importance score retained in rf_top_features.csv
--rf_top_ninteger30noMaximum number of rows written to rf_top_features.csv
--rf_error_xlabcharacterNumber of TreesnoX-axis label for the RF error plot
--rf_error_ylabcharacterErrornoY-axis label for the RF error plot
--rf_error_line_sizenumeric0.6noLine width for the RF error plot
--rf_error_line_alphanumeric1noLine alpha for the RF error plot
--rf_error_line_colorcharacter#6C85F9,#D9503D,#939DE4,#DEA441,#A2C6D6,#E9B9E1,#BDD69F,#EBC98AnoComma-separated line colors for non-OOB curves
--rf_error_line_typecharacterdashednoLine type for class-specific error curves
--rf_error_line_oob_typecharactersolidnoLine type for the OOB curve
--rf_error_legend_positioncharacternonenoLegend position for the RF error plot
--rf_error_border_colorcharacterblacknoPanel border color for the RF error plot
--rf_error_border_fillcharacterNAnoPanel fill for the RF error plot; use NA or NULL as text
--rf_error_border_sizenumeric0.8noPanel border width for the RF error plot
--rf_error_base_sizenumeric14noBase font size for the RF error plot
--rf_error_widthnumeric6noRF error plot width in inches
--rf_error_heightnumeric5noRF error plot height in inches
--rf_importance_sortlogicalTRUEnoSort variables in the importance plot
--rf_importance_top_ninteger10noMaximum number of variables shown in the importance plot
--rf_importance_label_x_annlogicalTRUEnoShow x-axis tick labels in the importance plot
--rf_importance_label_colorcharacterblacknoText and point outline color in the importance plot
--rf_importance_label_cexnumeric0.9noLabel size in the importance plot
--rf_importance_point_cexnumeric0.9noPoint size in the importance plot
--rf_importance_point_shapeinteger23noPoint shape in the importance plot
--rf_importance_point_fillcharacterrednoPoint fill color in the importance plot
--rf_importance_line_colorcharactergraynoSegment color in the importance plot
--rf_importance_theme_borderlogicalTRUEnoDraw panel borders in the importance plot
--rf_importance_theme_offsetnumeric0.2noAxis expansion factor in the importance plot
--rf_importance_titlecharacterVariable ImportancenoMain title for the importance plot
--rf_importance_title_x_annlogicalTRUEnoShow title and axis annotations in the importance plot
--rf_importance_widthnumeric6noRF importance plot width in inches
--rf_importance_heightnumeric5noRF importance plot height in inches

Input Format

Expression Matrix

  • CSV or TSV.
  • First column: sample IDs.
  • Remaining columns: numeric features.
  • Samples must be rows.
  • Missing or non-numeric feature values are not allowed.

Example:

sample,HIF1A,NR4A1,SOCS1
S1,6.21,-1.34,2.01
S2,6.57,0.37,3.62
S3,7.05,2.12,5.01

Group File

  • CSV or TSV.
  • First column: sample IDs.
  • One additional column must contain both the case and control labels.
  • Exactly two groups are supported.

Example:

sample,group
S1,Case
S2,Case
S3,Control

Output Files

FileFormatDescription
data/rf_result.rdsRDSSerialized model bundle with the trained random forest and metadata
table/rf_feature_importance.csvCSVFull ranked feature-importance table using the selected importance metric
table/rf_top_features.csvCSVFiltered top feature table after applying --rf_imp_threshold and --rf_top_n
plot/rf_error_plot.pdfPDFError curves across trees for OOB and class-specific classification error
plot/rf_importance_plot.pdfPDFVariable-importance plot generated by randomForest::varImpPlot()
session_info.txtTXTR version, platform, and package version information

Error Handling

  • Successful runs exit with status code 0.
  • Failed runs exit with status code 1.
  • Error messages use standardized names such as SKILL_FILE_NOT_FOUND and SKILL_INVALID_PARAMETER.
  • Output paths are validated so that --output_dir cannot write outside the skill root.
  • The analysis never performs network requests and never executes user input through eval(), exec(), or system().

Common codes:

Error CodeMeaning
SKILL_FILE_NOT_FOUNDAn input file or required plot-only artifact does not exist
SKILL_MISSING_COLUMNSThe input file does not contain the required columns
SKILL_EMPTY_DATAAn input file is empty or a required model table is unavailable
SKILL_INVALID_PARAMETERA CLI argument, group setting, numeric constraint, or path is invalid
SKILL_SAMPLE_MISMATCHSample IDs do not match between the expression matrix and group file
SKILL_PACKAGE_NOT_FOUNDOne or more required CRAN packages are missing

For detailed fixes, READ: references/troubleshooting.md

Testing

Help Check

Rscript scripts/main.R --help

Full Test Run

Rscript tests/run_tests.R

Direct Test Command

Rscript scripts/main.R \
  --input_file tests/data/expression_matrix.csv \
  --group_file tests/data/group_info.csv \
  --case_group AR \
  --control_group Control \
  --output_dir tests/output/manual-test \
  --seed 42 \
  --rf_ntree 200 \
  --rf_top_n 5 \
  --rf_importance_top_n 5 \
  --timeout_seconds 300