Neoantigen Predictor
Identify and handle statistical outliers in datasets using z-score, IQR, or Grubbs methods with regulatory-ready documentation.
SKILL.md
Neoantigen Predictor
Predicts patient-specific neoantigen candidate peptides with high immunogenicity based on HLA typing and tumor mutation profiles, providing target screening for tumor immunotherapy.
Quick Check
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --hla "HLA-A*02:01" --mutations mutations.csv --output results.json
When to Use
- Use this skill to predict neoantigens from tumor mutation data and patient HLA typing.
- Use this skill to screen high-priority immunotherapy targets based on MHC binding affinity and immunogenicity scores.
- Use this skill for data analysis tasks that require explicit assumptions, bounded scope, and a reproducible output format.
- Use this skill when you need a documented fallback path for missing inputs, execution errors, or partial evidence.
Workflow
- Validate input first (hard gate): Confirm the request is within scope. If vaccine design, clinical trial interpretation, or general genomics analysis is requested, emit the scope refusal before any processing.
- Confirm the user objective, required inputs, and non-negotiable constraints before doing detailed work.
- Validate that the request matches the documented scope and stop early if the task would require unsupported assumptions.
- Use the packaged script path or the documented reasoning path with only the inputs that are actually available.
- Return a structured result that separates assumptions, deliverables, risks, and unresolved items.
- If execution fails or inputs are incomplete, switch to the fallback path and state exactly what blocked full completion.
Function Overview
Neoantigens are variant peptides generated by non-synonymous mutations in tumor cells, presented by the patient's HLA molecules and recognized by T cells. This tool integrates:
- Mutant Peptide Generation — Extract 8-11mer variant peptides from mutation sites
- HLA Binding Prediction — Predict peptide binding affinity to patient HLA molecules
- Immunogenicity Assessment — Assess potential to elicit immune response
- Priority Ranking — Comprehensive scoring to screen optimal neoantigen candidates
Input Format
HLA Typing Input
| Format | Example | Description |
|---|---|---|
| Standard Nomenclature | HLA-A*02:01 | WHO standard HLA nomenclature |
| Simplified | A0201 | Omit HLA- and *: |
| Multi-alleles | HLA-A*02:01,A*11:01,B*07:02 | Comma-separated |
Mutation Data Input
VCF Format:
#CHROM POS ID REF ALT QUAL FILTER INFO
chr17 7579472 . G A 100 PASS GENE=TP53;AA=p.R273H
Table Format (CSV):
| Gene | Chrom | Position | Ref | Alt | Protein_Change |
|---|---|---|---|---|---|
| TP53 | chr17 | 7579472 | G | A | p.R273H |
Usage
Command Line
python scripts/main.py \
--hla "HLA-A*02:01,HLA-A*11:01,B*07:02" \
--vcf mutations.vcf \
--output neoantigen_results.json
python scripts/main.py \
--hla-file hla_genotype.txt \
--mutations mutations.csv \
--peptide-length 9,10,11 \
--rank-cutoff 0.5 \
--output results.json
Python API
from scripts.main import NeoantigenPredictor
predictor = NeoantigenPredictor()
hla_alleles = ["HLA-A*02:01", "HLA-A*11:01", "HLA-B*07:02"]
mutations = [{"gene": "TP53", "chrom": "chr17", "pos": 7579472, "ref": "G", "alt": "A", "protein_change": "p.R273H"}]
results = predictor.predict(hla_alleles=hla_alleles, mutations=mutations, peptide_length=[9, 10], mhc_method="netmhcpan")
high_affinity = predictor.filter_by_binding(results, rank_threshold=0.5)
Scoring Algorithms
MHC Binding Affinity
| Metric | Threshold |
|---|---|
| Rank % | <0.5% = Strong, <2% = Weak |
| IC50 (nM) | <50nM = High, <500nM = Intermediate |
Priority Score
priority_score = (
0.40 * (1 - rank_percentile) + # MHC binding
0.35 * immunogenicity_score + # Immunogenicity
0.25 * clinical_score # Expression, clonality
)
Algorithm Limitations
- MHC binding prediction accuracy: ~85% (Rank < 0.5 threshold)
- Immunogenicity prediction requires experimental validation (~60-70% correlation)
- Does not consider HLA molecule expression levels on cell surface
- Cannot predict immune tolerance or suppressive T cell responses
Clinical Application Notes
Important: This tool is for research purposes only. Prediction results must not be the sole basis for clinical decisions.
- All candidate neoantigens require experimental validation (e.g., ELISPOT, tetramer staining)
- Consider patient immune status and treatment history
- Assess potential autoimmune toxicity risks
- Combine with tumor microenvironment immune infiltration status
Dependencies
- Python 3.8+ (strictly required; dataclasses module used)
- biopython, pandas, numpy, requests
- NetMHCpan 4.1 (optional, local install for improved performance)
Prerequisites
pip install -r requirements.txt
Input Validation
This skill accepts: patient HLA typing data and tumor mutation profiles (VCF, CSV, or FASTA format) for the purpose of predicting neoantigen candidates and immunotherapy targets.
If the user's request does not involve neoantigen prediction from HLA and mutation data — for example, asking to design vaccines, interpret clinical trial results, or perform general genomics analysis — do not proceed with the workflow. Instead respond:
"neoantigen-predictor is designed to predict neoantigen candidates from HLA typing and tumor mutation data for immunotherapy research. Your request appears to be outside this scope. Please provide HLA alleles and mutation data, or use a more appropriate tool for your task."
Do not continue the workflow when the request is out of scope, missing HLA typing or mutation data, or would require clinical decision-making. For missing inputs, state exactly which fields are missing.
Fallback Behavior
If scripts/main.py fails or required inputs are incomplete:
- Report the exact failure point and error message (sanitized).
- State what can still be completed (e.g., peptide generation without binding prediction if NetMHCpan is unavailable).
- Manual fallback: use
--variant-peptides peptides.fastato skip mutation processing and predict binding for pre-generated peptides directly. - Do not fabricate binding scores, immunogenicity values, or clinical interpretations.
Output Requirements
Every final response must make these items explicit when relevant:
- Objective or requested deliverable
- Inputs used and assumptions introduced
- Workflow or decision path
- Core result, recommendation, or artifact
- Constraints, risks, caveats, or validation needs (always include research-only disclaimer)
- Unresolved items and next-step checks
Error Handling
- If required inputs are missing, state exactly which fields are missing and request only the minimum additional information.
- If the task goes outside the documented scope, stop instead of guessing or silently widening the assignment.
- If
scripts/main.pyfails, report the failure point, summarize what still can be completed safely, and provide a manual fallback. - Do not fabricate files, citations, data, search results, or execution outcomes.
Response Template
Use the following fixed structure for non-trivial requests:
- Objective
- Inputs Received
- Assumptions
- Workflow
- Deliverable
- Risks and Limits (always include research-only disclaimer)
- Next Checks
For stress/multi-constraint requests, also include:
- Constraints checklist (compliance, performance, error paths)
- Explicit boundary statement confirming no clinical decisions were made
- Unresolved items with explicit blocking reasons
If the request is simple, you may compress the structure, but always keep the research disclaimer and scope limits explicit.