Agent Skills
Data-analysisVisualizationVolacno plot

Volcano Plot Script Generator

AIPOCH-AI

Generate code for volcano plots from DEG (Differentially Expressed Genes) analysis results. Triggered when user needs visualization of gene expression data, p-value vs fold-change scatter plots, publication-ready figures for bioinformatics analysis

33
1
FILES
volcano-plot-script/
skill.md
scripts
main.py
test_deg.csv
references
best_practices.md
example_deg_data.csv
markers.txt
assets
example_volcano.png
example_volcano.R

SKILL.md

Volcano Plot Script Generator

A skill for generating publication-ready volcano plots from differential gene expression analysis results.

Overview

Volcano plots visualize the relationship between statistical significance (p-values) and magnitude of change (fold changes) in gene expression data. This skill generates customizable R or Python scripts for creating high-quality figures suitable for publications.

Use Cases

  • Visualize RNA-seq DEG analysis results
  • Identify significantly upregulated and downregulated genes
  • Highlight genes of interest (markers, pathways)
  • Generate publication-quality figures for manuscripts
  • Compare multiple experimental conditions

Input Requirements

Required input data format:

  • Gene identifier (gene symbol or ENSEMBL ID)
  • Log2 fold change values
  • Adjusted or raw p-values
  • Optional: gene annotations, pathways

Output

  • Publication-ready volcano plot (PNG/PDF/SVG)
  • Customizable R or Python script
  • Optional: labeled significant gene lists

Usage

# Example: Run the volcano plot generator
python scripts/main.py --input deg_results.csv --output volcano_plot.png

Parameters

ParameterDescriptionDefault
--inputPath to DEG results CSV/TSVrequired
--outputOutput plot file pathvolcano_plot.png
--log2fc-colColumn name for log2 fold changelog2FoldChange
--pvalue-colColumn name for p-valuepadj
--gene-colColumn name for gene IDsgene
--log2fc-threshLog2 FC threshold for significance1.0
--pvalue-threshP-value threshold0.05
--label-genesFile with genes to labelNone
--top-nLabel top N significant genes10
--color-upColor for upregulated genes#E74C3C
--color-downColor for downregulated genes#3498DB
--color-nsColor for non-significant genes#95A5A6

Technical Difficulty

Medium - Requires understanding of:

  • DEG analysis concepts (fold change, p-values, FDR)
  • Data visualization principles
  • Matplotlib/ggplot2 plotting libraries

Dependencies

Python

  • pandas
  • matplotlib
  • seaborn
  • numpy

R

  • ggplot2
  • dplyr
  • ggrepel (for label positioning)

References

Author

Auto-generated skill for bioinformatics visualization.

Risk Assessment

Risk IndicatorAssessmentLevel
Code ExecutionPython/R scripts executed locallyMedium
Network AccessNo external API callsLow
File System AccessRead input files, write output plotsMedium
Instruction TamperingStandard prompt guidelinesLow
Data ExposureOutput files saved to workspaceLow

Security Checklist

  • No hardcoded credentials or API keys
  • Input file paths validated (no ../ traversal)
  • Output directory restricted to workspace
  • Script execution in sandboxed environment
  • Error messages sanitized (no stack traces exposed)
  • Dependencies audited (pandas, matplotlib, seaborn, numpy)

Prerequisites

# Python dependencies
pip install -r requirements.txt

# R dependencies (if using R)
install.packages(c("ggplot2", "dplyr", "ggrepel"))

Evaluation Criteria

Success Metrics

  • Successfully generates executable Python/R script
  • Output plot is publication-ready quality
  • Correctly identifies significant genes based on thresholds
  • Handles missing or malformed data gracefully
  • Color scheme is accessible (colorblind-friendly)

Test Cases

  1. Basic DEG Visualization: Input standard DESeq2 results → Valid volcano plot
  2. Custom Thresholds: Adjust log2FC and p-value thresholds → Correct gene classification
  3. Gene Labeling: Specify genes to label → Labels appear correctly
  4. Large Dataset: Input 20,000+ genes → Performance remains acceptable
  5. Malformed Data: Input with missing values → Graceful error handling

Lifecycle Status

  • Current Stage: Draft
  • Next Review Date: 2026-03-06
  • Known Issues: None
  • Planned Improvements:
    • Add interactive plot option (Plotly)
    • Support for multiple comparison groups
    • Integration with pathway enrichment tools