Agent Skills
Multi-omicsTranscriptomicsProteomicsMetabolomics

Skill: Multi-Omics Integration Strategist (ID: 204)

AIPOCH-AI

Design a "multi-omics" joint analysis scheme to plan how to perform "Cross-validation" of transcriptome (RNA), proteome (Pro), and metabolome (Met) data at the pathway level.

30
1
FILES
multi-omics-integration-strategist/
skill.md
scripts
main.py

SKILL.md

Skill: Multi-Omics Integration Strategist (ID: 204)

Overview

Designs multi-omics (transcriptomics RNA, proteomics Pro, metabolomics Met) joint analysis schemes, performs cross-validation at the pathway level, and provides systems biology-level integrated analysis strategies.

Use Cases

  • Systems biology mechanism research for complex diseases
  • Biomarker discovery and validation
  • Drug target identification and pathway validation
  • Multi-omics data quality assessment and consistency analysis

Directory Structure

.
├── SKILL.md                 # This file - Skill documentation
├── config/
│   └── pathways.json        # Pathway database configuration
├── scripts/
│   └── main.py             # Main analysis script
├── templates/
│   └── report_template.md   # Analysis report template
└── examples/
    └── sample_data/         # Sample datasets

Input

Required Files

FileFormatDescription
rna_data.csvCSVTranscriptomics data: Gene ID, expression value, differential analysis results
pro_data.csvCSVProteomics data: Protein ID, abundance value, differential analysis results
met_data.csvCSVMetabolomics data: Metabolite ID, concentration value, differential analysis results

Input Format Specifications

RNA Data (rna_data.csv)

gene_id,gene_name,log2fc,pvalue,padj,sample_A,sample_B,...
ENSG00000139618,BRCA1,1.23,0.001,0.005,12.5,13.2,...

Protein Data (pro_data.csv)

protein_id,gene_name,log2fc,pvalue,padj,sample_A,sample_B,...
P38398,BRCA1,0.85,0.002,0.008,2450,2890,...

Metabolite Data (met_data.csv)

metabolite_id,metabolite_name,kegg_id,log2fc,pvalue,padj,...
C00187,Cholesterol,C00187,-1.45,0.003,0.012,...

Integration Strategy

1. ID Mapping Layer

  • RNA → Protein: Mapping through Gene Symbol / UniProt ID
  • Protein → Metabolite: Association through KEGG/Reactome enzyme-reaction-metabolite
  • RNA → Metabolite: Indirect association through KEGG pathway

2. Pathway Mapping

Supported databases:

  • KEGG (Kyoto Encyclopedia of Genes and Genomes)
  • Reactome
  • WikiPathways
  • GO (Gene Ontology) - Biological Process

3. Cross-Validation Methods

3.1 Directional Consistency Validation

  • Whether the change direction of genes/proteins/metabolites in the same pathway is consistent
  • Score: +1 (consistent), -1 (opposite), 0 (no data)

3.2 Correlation Validation

  • Pearson/Spearman correlation analysis
  • Cross-omics expression profile clustering

3.3 Pathway Enrichment Concordance

  • Independent enrichment analysis for each omics
  • Common enriched pathway identification

3.4 Network Topology Validation

  • Construct cross-omics regulatory network
  • Identify key nodes (Hub genes/proteins/metabolites)

Output

1. Integration Report (integration_report.md)

# Multi-Omics Integration Analysis Report

## Executive Summary
- Sample count: RNA=30, Pro=28, Met=25
- Mapping success rate: RNA-Pro=85%, Pro-Met=62%
- Pathway coverage: 342 KEGG pathways

## Cross-Validation Results
### Highly Consistent Pathways (Score > 0.8)
1. Glycolysis/Gluconeogenesis (Score=0.92)
2. Citrate cycle (TCA cycle) (Score=0.88)

### Conflicting Pathways (Score < -0.3)
1. Fatty acid biosynthesis (Score=-0.45)

## Recommendations
- Focus on: Energy metabolism-related pathways
- Needs verification: Lipid metabolism pathway data quality

2. External Visualization Tools (Not Included)

This tool generates analysis results that can be visualized using external tools. Users may export results to:

Chart TypePurposeExternal Tool Required
Circos PlotCross-omics relationship panoramamatplotlib/circlize (user-installed)
Pathway HeatmapPathway-level changesseaborn/complexheatmap (user-installed)
Sankey DiagramData flow mappingplotly (user-installed)
Network GraphMolecular interaction networknetworkx/cytoscape (networkx is included)
Correlation MatrixCross-omics correlationseaborn (user-installed)
Bubble PlotIntegrated enrichment analysisggplot2/plotly (user-installed)

Note: This skill focuses on data integration and analysis. Visualization requires separate installation of plotting libraries by the user.

3. Output Files

FileDescription
mapped_ids.jsonID mapping results
pathway_scores.csvPathway cross-validation scores
consistency_matrix.csvCross-omics consistency matrix
network_edges.csvNetwork edge list
report.htmlInteractive HTML report

Usage

Basic Usage

python scripts/main.py \
  --rna rna_data.csv \
  --pro pro_data.csv \
  --met met_data.csv \
  --output ./results

Advanced Options

python scripts/main.py \
  --rna rna_data.csv \
  --pro pro_data.csv \
  --met met_data.csv \
  --pathway-db KEGG,Reactome \
  --id-mapping config/mapping.json \
  --method correlation+enrichment+network \
  --output ./results \
  --format html,csv,json

Configuration

config/pathways.json

{
  "databases": {
    "KEGG": {
      "enabled": true,
      "organism": "hsa",
      "min_genes": 3
    },
    "Reactome": {
      "enabled": true,
      "min_genes": 5
    }
  },
  "mapping": {
    "rna_to_protein": "gene_symbol",
    "protein_to_metabolite": "enzyme_commission"
  }
}

Dependencies

  • Python >= 3.8
  • pandas >= 1.3.0
  • numpy >= 1.21.0
  • scipy >= 1.7.0
  • scikit-learn >= 1.0.0
  • networkx >= 2.6.0
  • matplotlib >= 3.4.0
  • seaborn >= 0.11.0
  • gseapy >= 1.0.0 (Pathway enrichment analysis)

References

  1. Subramanian et al. (2005) PNAS - GSEA method
  2. Kamburov et al. (2011) NAR - ConsensusPathDB
  3. Chin et al. (2018) Nature Communications - Multi-omics integration methods review

Version

  • Version: 1.0.0
  • Last Updated: 2026-02-06
  • Author: OpenClaw Bioinformatics Team

Risk Assessment

Risk IndicatorAssessmentLevel
Code ExecutionPython/R scripts executed locallyMedium
Network AccessNo external API callsLow
File System AccessRead input files, write output filesMedium
Instruction TamperingStandard prompt guidelinesLow
Data ExposureOutput files saved to workspaceLow

Security Checklist

  • No hardcoded credentials or API keys
  • No unauthorized file system access (../)
  • Output does not expose sensitive information
  • Prompt injection protections in place
  • Input file paths validated (no ../ traversal)
  • Output directory restricted to workspace
  • Script execution in sandboxed environment
  • Error messages sanitized (no stack traces exposed)
  • Dependencies audited

Prerequisites

# Python dependencies
pip install -r requirements.txt

Evaluation Criteria

Success Metrics

  • Successfully executes main functionality
  • Output meets quality standards
  • Handles edge cases gracefully
  • Performance is acceptable

Test Cases

  1. Basic Functionality: Standard input → Expected output
  2. Edge Case: Invalid input → Graceful error handling
  3. Performance: Large dataset → Acceptable processing time

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
    • Performance optimization
    • Additional feature support

Parameters

ParameterTypeDefaultDescription
--rnastrRequired
--prostrRequired
--metstrRequired
--outputstr'./results'
--databasesstr'KEGG'
--create-samplestrRequiredCreate sample data for testing
--formatstr'md