Agent Skills
Multi-omicsTranscriptomicsProteomicsMetabolomics
Skill: Multi-Omics Integration Strategist (ID: 204)
AIPOCH-AI
Design a "multi-omics" joint analysis scheme to plan how to perform "Cross-validation" of transcriptome (RNA), proteome (Pro), and metabolome (Met) data at the pathway level.
30
1
FILES
multi-omics-integration-strategist/
skill.md
scripts
main.py
SKILL.md
Skill: Multi-Omics Integration Strategist (ID: 204)
Overview
Designs multi-omics (transcriptomics RNA, proteomics Pro, metabolomics Met) joint analysis schemes, performs cross-validation at the pathway level, and provides systems biology-level integrated analysis strategies.
Use Cases
- Systems biology mechanism research for complex diseases
- Biomarker discovery and validation
- Drug target identification and pathway validation
- Multi-omics data quality assessment and consistency analysis
Directory Structure
.
├── SKILL.md # This file - Skill documentation
├── config/
│ └── pathways.json # Pathway database configuration
├── scripts/
│ └── main.py # Main analysis script
├── templates/
│ └── report_template.md # Analysis report template
└── examples/
└── sample_data/ # Sample datasets
Input
Required Files
| File | Format | Description |
|---|---|---|
rna_data.csv | CSV | Transcriptomics data: Gene ID, expression value, differential analysis results |
pro_data.csv | CSV | Proteomics data: Protein ID, abundance value, differential analysis results |
met_data.csv | CSV | Metabolomics data: Metabolite ID, concentration value, differential analysis results |
Input Format Specifications
RNA Data (rna_data.csv)
gene_id,gene_name,log2fc,pvalue,padj,sample_A,sample_B,...
ENSG00000139618,BRCA1,1.23,0.001,0.005,12.5,13.2,...
Protein Data (pro_data.csv)
protein_id,gene_name,log2fc,pvalue,padj,sample_A,sample_B,...
P38398,BRCA1,0.85,0.002,0.008,2450,2890,...
Metabolite Data (met_data.csv)
metabolite_id,metabolite_name,kegg_id,log2fc,pvalue,padj,...
C00187,Cholesterol,C00187,-1.45,0.003,0.012,...
Integration Strategy
1. ID Mapping Layer
- RNA → Protein: Mapping through Gene Symbol / UniProt ID
- Protein → Metabolite: Association through KEGG/Reactome enzyme-reaction-metabolite
- RNA → Metabolite: Indirect association through KEGG pathway
2. Pathway Mapping
Supported databases:
- KEGG (Kyoto Encyclopedia of Genes and Genomes)
- Reactome
- WikiPathways
- GO (Gene Ontology) - Biological Process
3. Cross-Validation Methods
3.1 Directional Consistency Validation
- Whether the change direction of genes/proteins/metabolites in the same pathway is consistent
- Score: +1 (consistent), -1 (opposite), 0 (no data)
3.2 Correlation Validation
- Pearson/Spearman correlation analysis
- Cross-omics expression profile clustering
3.3 Pathway Enrichment Concordance
- Independent enrichment analysis for each omics
- Common enriched pathway identification
3.4 Network Topology Validation
- Construct cross-omics regulatory network
- Identify key nodes (Hub genes/proteins/metabolites)
Output
1. Integration Report (integration_report.md)
# Multi-Omics Integration Analysis Report
## Executive Summary
- Sample count: RNA=30, Pro=28, Met=25
- Mapping success rate: RNA-Pro=85%, Pro-Met=62%
- Pathway coverage: 342 KEGG pathways
## Cross-Validation Results
### Highly Consistent Pathways (Score > 0.8)
1. Glycolysis/Gluconeogenesis (Score=0.92)
2. Citrate cycle (TCA cycle) (Score=0.88)
### Conflicting Pathways (Score < -0.3)
1. Fatty acid biosynthesis (Score=-0.45)
## Recommendations
- Focus on: Energy metabolism-related pathways
- Needs verification: Lipid metabolism pathway data quality
2. External Visualization Tools (Not Included)
This tool generates analysis results that can be visualized using external tools. Users may export results to:
| Chart Type | Purpose | External Tool Required |
|---|---|---|
| Circos Plot | Cross-omics relationship panorama | matplotlib/circlize (user-installed) |
| Pathway Heatmap | Pathway-level changes | seaborn/complexheatmap (user-installed) |
| Sankey Diagram | Data flow mapping | plotly (user-installed) |
| Network Graph | Molecular interaction network | networkx/cytoscape (networkx is included) |
| Correlation Matrix | Cross-omics correlation | seaborn (user-installed) |
| Bubble Plot | Integrated enrichment analysis | ggplot2/plotly (user-installed) |
Note: This skill focuses on data integration and analysis. Visualization requires separate installation of plotting libraries by the user.
3. Output Files
| File | Description |
|---|---|
mapped_ids.json | ID mapping results |
pathway_scores.csv | Pathway cross-validation scores |
consistency_matrix.csv | Cross-omics consistency matrix |
network_edges.csv | Network edge list |
report.html | Interactive HTML report |
Usage
Basic Usage
python scripts/main.py \
--rna rna_data.csv \
--pro pro_data.csv \
--met met_data.csv \
--output ./results
Advanced Options
python scripts/main.py \
--rna rna_data.csv \
--pro pro_data.csv \
--met met_data.csv \
--pathway-db KEGG,Reactome \
--id-mapping config/mapping.json \
--method correlation+enrichment+network \
--output ./results \
--format html,csv,json
Configuration
config/pathways.json
{
"databases": {
"KEGG": {
"enabled": true,
"organism": "hsa",
"min_genes": 3
},
"Reactome": {
"enabled": true,
"min_genes": 5
}
},
"mapping": {
"rna_to_protein": "gene_symbol",
"protein_to_metabolite": "enzyme_commission"
}
}
Dependencies
- Python >= 3.8
- pandas >= 1.3.0
- numpy >= 1.21.0
- scipy >= 1.7.0
- scikit-learn >= 1.0.0
- networkx >= 2.6.0
- matplotlib >= 3.4.0
- seaborn >= 0.11.0
- gseapy >= 1.0.0 (Pathway enrichment analysis)
References
- Subramanian et al. (2005) PNAS - GSEA method
- Kamburov et al. (2011) NAR - ConsensusPathDB
- Chin et al. (2018) Nature Communications - Multi-omics integration methods review
Version
- Version: 1.0.0
- Last Updated: 2026-02-06
- Author: OpenClaw Bioinformatics Team
Risk Assessment
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python/R scripts executed locally | Medium |
| Network Access | No external API calls | Low |
| File System Access | Read input files, write output files | Medium |
| Instruction Tampering | Standard prompt guidelines | Low |
| Data Exposure | Output files saved to workspace | Low |
Security Checklist
- No hardcoded credentials or API keys
- No unauthorized file system access (../)
- Output does not expose sensitive information
- Prompt injection protections in place
- Input file paths validated (no ../ traversal)
- Output directory restricted to workspace
- Script execution in sandboxed environment
- Error messages sanitized (no stack traces exposed)
- Dependencies audited
Prerequisites
# Python dependencies
pip install -r requirements.txt
Evaluation Criteria
Success Metrics
- Successfully executes main functionality
- Output meets quality standards
- Handles edge cases gracefully
- Performance is acceptable
Test Cases
- Basic Functionality: Standard input → Expected output
- Edge Case: Invalid input → Graceful error handling
- Performance: Large dataset → Acceptable processing time
Lifecycle Status
- Current Stage: Draft
- Next Review Date: 2026-03-06
- Known Issues: None
- Planned Improvements:
- Performance optimization
- Additional feature support
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
--rna | str | Required | |
--pro | str | Required | |
--met | str | Required | |
--output | str | './results' | |
--databases | str | 'KEGG' | |
--create-sample | str | Required | Create sample data for testing |
--format | str | 'md |