Agent Skills
HLANeoantigen

Neoantigen Predictor

AIPOCH

Identify and handle statistical outliers in datasets using z-score, IQR, or Grubbs methods with regulatory-ready documentation.

30
0
FILES
neoantigen-predictor/
skill.md
scripts
main.py
__pycache__
references
example_mutations.csv
README.md
neoantigen-predictor_audit_result_v1.json
neoantigen-predictor_audit_result_v2.json
POLISH_CHANGELOG.md
requirements.txt

SKILL.md

Neoantigen Predictor

Predicts patient-specific neoantigen candidate peptides with high immunogenicity based on HLA typing and tumor mutation profiles, providing target screening for tumor immunotherapy.

Quick Check

python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --hla "HLA-A*02:01" --mutations mutations.csv --output results.json

When to Use

  • Use this skill to predict neoantigens from tumor mutation data and patient HLA typing.
  • Use this skill to screen high-priority immunotherapy targets based on MHC binding affinity and immunogenicity scores.
  • Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
  • Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.

Workflow

  1. Validate input first (hard gate): Confirm the request is within scope. If vaccine design, clinical trial interpretation, or general genomics analysis is requested, emit the scope refusal before any processing.
  2. Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
  3. Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
  4. Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
  5. Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
  6. If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.

Function Overview

Neoantigens are variant peptides generated by non-synonymous mutations in tumor cells, presented by the patient's HLA molecules and recognized by T cells. This tool integrates:

  1. Mutant Peptide Generation — Extract 8-11mer variant peptides from mutation sites
  2. HLA Binding Prediction — Predict peptide binding affinity to patient HLA molecules
  3. Immunogenicity Assessment — Assess potential to elicit immune response
  4. Priority Ranking — Comprehensive scoring to screen optimal neoantigen candidates

Input Format

HLA Typing Input

FormatExampleDescription
Standard NomenclatureHLA-A*02:01WHO standard HLA nomenclature
SimplifiedA0201Omit HLA- and *:
Multi-allelesHLA-A*02:01,A*11:01,B*07:02Comma-separated

Mutation Data Input

VCF Format:

#CHROM  POS     ID  REF ALT QUAL    FILTER  INFO
chr17   7579472 .   G   A   100     PASS    GENE=TP53;AA=p.R273H

Table Format (CSV):

GeneChromPositionRefAltProtein_Change
TP53chr177579472GAp.R273H

Usage

Command Line

python scripts/main.py \
  --hla "HLA-A*02:01,HLA-A*11:01,B*07:02" \
  --vcf mutations.vcf \
  --output neoantigen_results.json

python scripts/main.py \
  --hla-file hla_genotype.txt \
  --mutations mutations.csv \
  --peptide-length 9,10,11 \
  --rank-cutoff 0.5 \
  --output results.json

Python API

from scripts.main import NeoantigenPredictor

predictor = NeoantigenPredictor()
hla_alleles = ["HLA-A*02:01", "HLA-A*11:01", "HLA-B*07:02"]
mutations = [{"gene": "TP53", "chrom": "chr17", "pos": 7579472, "ref": "G", "alt": "A", "protein_change": "p.R273H"}]
results = predictor.predict(hla_alleles=hla_alleles, mutations=mutations, peptide_length=[9, 10], mhc_method="netmhcpan")
high_affinity = predictor.filter_by_binding(results, rank_threshold=0.5)

Scoring Algorithms

MHC Binding Affinity

MetricThreshold
Rank %<0.5% = Strong, <2% = Weak
IC50 (nM)<50nM = High, <500nM = Intermediate

Priority Score

priority_score = (
    0.40 * (1 - rank_percentile) +   # MHC binding
    0.35 * immunogenicity_score +     # Immunogenicity
    0.25 * clinical_score             # Expression, clonality
)

Algorithm Limitations

  • MHC binding prediction accuracy: ~85% (Rank < 0.5 threshold)
  • Immunogenicity prediction requires experimental validation (~60-70% correlation)
  • Does not consider HLA molecule expression levels on cell surface
  • Cannot predict immune tolerance or suppressive T cell responses

Clinical Application Notes

Important: This tool is for research purposes only. Prediction results must not be the sole basis for clinical decisions.

  • All candidate neoantigens require experimental validation (e.g., ELISPOT, tetramer staining)
  • Consider patient immune status and treatment history
  • Assess potential autoimmune toxicity risks
  • Combine with tumor microenvironment immune infiltration status

Dependencies

  • Python 3.8+ (strictly required; dataclasses module used)
  • biopython, pandas, numpy, requests
  • NetMHCpan 4.1 (optional, local install for improved performance)

Prerequisites

pip install -r requirements.txt

Input Validation

This skill accepts: patient HLA typing data and tumor mutation profiles (VCF, CSV, or FASTA format) for the purpose of predicting neoantigen candidates and immunotherapy targets.

If the user's request does not involve neoantigen prediction from HLA and mutation data — for example, asking to design vaccines, interpret clinical trial results, or perform general genomics analysis — do not proceed with the workflow. Instead respond:

"neoantigen-predictor is designed to predict neoantigen candidates from HLA typing and tumor mutation data for immunotherapy research. Your request appears to be outside this scope. Please provide HLA alleles and mutation data, or use a more appropriate tool for your task."

Do not continue the workflow when the request is out of scope, missing HLA typing or mutation data, or would require clinical decision-making. For missing inputs, state exactly which fields are missing.

Fallback Behavior

If scripts/main.py fails or required inputs are incomplete:

  1. Report the exact failure point and error message (sanitized).
  2. State what can still be completed (e.g., peptide generation without binding prediction if NetMHCpan is unavailable).
  3. Manual fallback: use --variant-peptides peptides.fasta to skip mutation processing and predict binding for pre-generated peptides directly.
  4. Do not fabricate binding scores, immunogenicity values, or clinical interpretations.

Output Requirements

Every final response must make these items explicit when relevant:

  • Objective or requested deliverable
  • Inputs used and assumptions introduced
  • Workflow or decision path
  • Core result, recommendation, or artifact
  • Constraints, risks, caveats, or validation needs (always include research-only disclaimer)
  • Unresolved items and next-step checks

Error Handling

  • If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
  • If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
  • If scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.
  • Do not fabricate files, citations, data, search results, or execution outcomes.

Response Template

Use the following fixed structure for non-trivial requests:

  1. Objective
  2. Inputs Received
  3. Assumptions
  4. Workflow
  5. Deliverable
  6. Risks and Limits (always include research-only disclaimer)
  7. Next Checks

For stress/multi-constraint requests, also include:

  • Constraints checklist (compliance, performance, error paths)
  • Explicit boundary statement confirming no clinical decisions were made
  • Unresolved items with explicit blocking reasons

If the request is simple, you may compress the structure, but always keep the research disclaimer and scope limits explicit.