Agent Skills

Umap Tsne Analysis

AIPOCH

Use when performing sample-level dimensionality reduction and visualization on abundance or OTU-style matrices with a companion group file, generating UMAP and/or t-SNE coordinates and plots for group separation assessment. NOT for: differential expression testing, single-cell workflows requiring dedicated embeddings pipelines, or analyses without a sample grouping file.

23
1
FILES
umap-tsne-analysis/
skill.md
scripts
dim_reduction_methods.R
functions.R
install_dependencies.R
main.R
run_analysis.R
utils.R
visualization.R
references
algorithm.md
cli-guide.md
troubleshooting.md
85100Total Score
View Evaluation Report
Core Capability
85 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
13 / 16
Human Usability
7 / 8
Security
11 / 12
Maintainability
10 / 12
Agent-Specific
16 / 20
Medical Task
22 / 25 Passed
90Both UMAP and tSNE on OTU matrix with 3 groups
5/5
87UMAP-only with custom column names and no normalization
5/5
84Single-group input
4/5
85Both methods with 120-second timeout
4/5
81Partial sample ID overlap between group file and matrix
4/5

SKILL.md

UMAP and t-SNE Analysis

Prerequisites

Run the following before the first analysis to install all required R packages:

Rscript scripts/install_dependencies.R

Alternative manual installation:

Rscript -e "install.packages(c('optparse','data.table','Rtsne','umap','ggplot2','vegan','R.utils'), repos='https://cloud.r-project.org')"

Note: R.utils is only required when --timeout > 0, but pre-installing it avoids environment drift across runs. testthat is installed by scripts/install_dependencies.R as the development test dependency.

The skill cannot run until these packages are installed. In new or bare R environments, always run the prerequisite step first.


When to Read External Files

SituationFile to ReadPurpose
Need algorithm detailsreferences/algorithm.mdDimensionality reduction methods, assumptions, parameter interpretation
Need to run analysisscripts/main.RExecute: Rscript scripts/main.R --input_file ... --group_file ...
Encounter errorsreferences/troubleshooting.mdCommon errors and solutions
Need CLI examplesreferences/cli-guide.mdDetailed CLI usage examples
Need test datatests/data/Sample input files for testing

Usage

Rscript scripts/main.R \
  --input_file ./otu_table.csv \
  --group_file ./group_info.csv \
  --output_dir ./output/ \
  --method both \
  --seed 42

Arguments

ShortLongTypeDefaultDescription
-i--input_filecharacterrequiredAbundance / OTU matrix file
-g--group_filecharacterrequiredGroup information file
-o--output_dircharacter./output/Output directory
-m--methodcharacterbothMethod: tsne, umap, or both
--sample_id_colcharacterfirst columnSample ID column in group file
--group_colcharactersecond columnGroup column in group file
--perplexitynumeric25t-SNE perplexity
--thetanumeric0.0t-SNE theta
--pcalogicalFALSEWhether to use PCA before t-SNE
--check_duplicateslogicalFALSEWhether t-SNE should check duplicated rows
--normalizelogicalTRUEWhether to normalize data before UMAP
--norm_methodcharacterhellingerNormalization method for vegan::decostand()
--n_neighborsinteger10UMAP neighborhood size
-s--seedinteger42Random seed for reproducibility
-t--timeoutinteger0Timeout in seconds; 0 disables timeout

Dependency Baseline

The skill was validated with the exact package baseline recorded in dependencies.lock.tsv.

PackageTested Version
optparse1.7.5
data.table1.15.4
Rtsne0.17
umap0.2.10.0
ggplot23.4.0
vegan2.7.3
R.utils2.13.0
testthat3.1.2

Use this file as the reproducibility baseline when validating a new environment.


Input Format

Abundance / OTU Matrix (input_file)

Features as rows, samples as columns, CSV/TSV-like tabular file with feature ID in the first column.

OTU_ID,S1,S2,S3,S4
OTU_1,10,3,0,5
OTU_2,2,8,1,0
OTU_3,0,0,6,9

Group File (group_file)

Tabular file with at least two columns: sample ID and group label.

SampleID,Group
S1,Control
S2,Control
S3,Treatment
S4,Treatment

Requirements:

  • At least 2 groups with at least 2 samples per group are required.
  • All sample IDs in the group file must exist in the matrix columns.
  • Single-group inputs will produce a SKILL_INVALID_PARAMETER error because dimensionality reduction without group contrast produces uninterpretable plots.

Output Files

FileDescription
table/tsne_coordinates.csvt-SNE coordinates with sample and group annotations
table/umap_coordinates.csvUMAP coordinates with sample and group annotations
plot/tsne_plot.pdft-SNE scatter plot with group colors and ellipses
plot/umap_plot.pdfUMAP scatter plot with group colors and ellipses
data/session_info.txtR session and package version info
data/analysis_data.rdaSaved analysis object with aligned matrix, metadata, colors, and runtime parameters

Workflow

Step 1: Validate Input

  • Check that matrix file and group file exist
  • Resolve sample ID and group columns
  • Validate at least 2 groups and at least 2 samples per group
  • Ensure all group-file sample IDs exist in the matrix
  • Remove samples with zero total abundance after alignment if needed

Step 2: Prepare Matrix

  • Convert input table into numeric matrix
  • Align matrix columns to sample order from the group file
  • Transpose matrix so rows become samples and columns become features

Step 3: Run Dimensionality Reduction

  • Run t-SNE if --method tsne or --method both
  • Run UMAP if --method umap or --method both
  • Apply fixed random seed for reproducibility

Step 4: Generate Visualizations

  • Plot sample embeddings
  • Color points by group
  • Draw group ellipses when enabled
  • Save PDF outputs

Methods

t-SNE

t-SNE is a non-linear dimensionality reduction method that preserves local neighborhood structure. It is useful for identifying local sample clustering patterns.

UMAP

UMAP is a manifold learning method that aims to preserve both local and some global structure. It is often faster than t-SNE and can produce stable low-dimensional embeddings when parameters are chosen appropriately.

Normalization

When --normalize TRUE, the script uses vegan::decostand() with the selected --norm_method before UMAP. This is helpful for abundance-style ecological matrices.


Agent Response Contract

After a successful run, report:

  1. Method(s) run (tsne, umap, or both)
  2. Sample count and group count processed
  3. Key parameters used (perplexity for t-SNE, n_neighbors for UMAP)
  4. Group separation quality (describe visible clustering from coordinate ranges if accessible)
  5. Artifact paths: coordinate CSV(s) and plot PDF(s) produced

Examples

Basic Usage

Rscript scripts/main.R \
  -i otu_table.csv \
  -g group_info.csv \
  -o ./output \
  -m both

Only t-SNE

Rscript scripts/main.R \
  -i otu_table.csv \
  -g group_info.csv \
  -o ./output \
  -m tsne \
  --perplexity 10

Only UMAP with Custom Group Column

Rscript scripts/main.R \
  -i otu_table.csv \
  -g metadata.csv \
  -o ./output \
  -m umap \
  --sample_id_col SampleID \
  --group_col Treatment \
  --n_neighbors 15

Error Handling

Common Errors

ErrorCauseSolution
SKILL_FILE_NOT_FOUNDInput file does not existCheck file path
SKILL_MISSING_COLUMNSGroup or matrix file lacks required columnsVerify file format
SKILL_SAMPLE_MISMATCHSample IDs in group file do not match matrix columnsCheck sample naming consistency
SKILL_EMPTY_DATAMatrix becomes empty after preprocessingCheck input values and filtering
SKILL_INVALID_PARAMETERInvalid method, invalid parameter value, or single-group inputAdjust CLI arguments; ensure at least 2 groups are present
SKILL_PACKAGE_NOT_FOUNDRequired R package is missingRun Rscript scripts/install_dependencies.R; note that file errors will only surface after packages are installed
SKILL_TIMEOUTAnalysis exceeded the configured timeoutIncrease --timeout or set --timeout 0

IF error persists, READ: references/troubleshooting.md

Troubleshooting note: In environments where packages are not yet installed, SKILL_PACKAGE_NOT_FOUND will fire before file-validation errors. Install dependencies first, then re-run to expose any file-related errors.


Input Validation

This skill accepts:

  1. An abundance or OTU-style feature matrix (CSV/TSV, features as rows, samples as columns)
  2. A group file with at least two groups (CSV/TSV, sample IDs and group labels)

If the user's request does not involve UMAP or t-SNE dimensionality reduction for group separation visualization — for example, asking to run differential expression testing, process single-cell RNA-seq with specialized pipelines, perform clustering without a group file, or impute missing values — do not proceed with the workflow. Instead respond:

"UMAP and t-SNE Analysis is designed to perform sample-level dimensionality reduction and visualization on abundance or OTU-style matrices. Your request appears to be outside this scope. Please provide a feature matrix and group file for UMAP/t-SNE, or use a more appropriate tool for differential expression testing, single-cell analysis, or clustering."


Testing

Test with Sample Data

Rscript scripts/install_dependencies.R

Rscript scripts/main.R --help

Rscript scripts/main.R \
  -i tests/data/otu_table.csv \
  -g tests/data/group_info.csv \
  -o tests/output/ \
  -m both

Rscript tests/test_skill.R

Rscript tests/run_smoke_test.R

Validation Commands

ls -la tests/output/
ls -la tests/output/table
ls -la tests/output/plot
ls -la tests/output/data
wc -l tests/output/table/tsne_coordinates.csv
wc -l tests/output/table/umap_coordinates.csv

The canonical sample data live in tests/data/. Use those files for examples, smoke tests, and regression checks. The canonical output layout is output_dir/table, output_dir/plot, and output_dir/data.


Implementation Checklist

  • CLI parsing with optparse
  • set.seed() for reproducibility
  • requireNamespace() dependency checks
  • Dependency bootstrap script
  • Session info recording
  • File reading instructions in SKILL.md
  • Modular script structure
  • Error handling with SKILL_* codes
  • Test data provided in tests/data/
  • Version-pinned dependency baseline in dependencies.lock.tsv
  • Automated testthat coverage for validation and plotting edge cases
  • Scripts in scripts/ directory
  • References in references/ directory

Last updated: 2026-04-27 | Version: 1.1.0