Agent Skills
Gwas Database
AIPOCH
Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.
222
6
FILES
87100Total Score
View Evaluation ReportCore Capability
85 / 100
Functional Suitability
11 / 12
Reliability
9 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
9 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
92Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region
4/4
88Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region
4/4
86Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID
4/4
86Structured entities: studies, associations, variants (SNPs), and traits (EFO-mapped)
4/4
86End-to-end case for Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID
4/4
SKILL.md
When to Use
Use this skill when you need to:
- Look up a specific variant (rsID) to see all reported trait/disease associations and their p-values/effect sizes.
- Find variants associated with a trait/disease (via free text or an EFO trait ID) for downstream interpretation or reporting.
- Perform gene-centric exploration to identify GWAS hits within/near a gene of interest.
- Retrieve study-level metadata (GCST accession, PMID, cohorts, ancestry, sample size) to assess evidence quality and applicability.
- Access or filter summary statistics (when available) for genome-wide analyses (e.g., fine-mapping, colocalization, PRS development).
Key Features
- Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID.
- Structured entities: studies, associations, variants (SNPs), and traits (EFO-mapped).
- Programmatic access via:
- GWAS Catalog REST API:
https://www.ebi.ac.uk/gwas/rest/api - Summary Statistics API:
https://www.ebi.ac.uk/gwas/summary-statistics/api
- GWAS Catalog REST API:
- Association-level fields commonly used in analysis: p-value, strongest allele, odds ratio/beta, mapped trait labels.
- Pagination support for bulk extraction (
page,size, and_linksnavigation).
Dependencies
- Python 3.9+
requests>= 2.31.0pandas>= 2.0.0 (optional; for tabular outputs)
Example Usage
The following script is a complete, runnable example that:
- fetches associations for an EFO trait,
- filters by genome-wide significance,
- returns a tidy table.
import time
import requests
import pandas as pd
GWAS_REST_BASE = "https://www.ebi.ac.uk/gwas/rest/api"
def fetch_trait_associations(efo_id: str, page_size: int = 100, max_pages: int = 50):
"""
Fetch associations for a given EFO trait ID from the GWAS Catalog REST API.
Returns a list of association JSON objects.
"""
url = f"{GWAS_REST_BASE}/efoTraits/{efo_id}/associations"
headers = {"Accept": "application/json"}
all_assocs = []
for page in range(max_pages):
params = {"page": page, "size": page_size}
r = requests.get(url, params=params, headers=headers, timeout=60)
r.raise_for_status()
data = r.json()
assocs = data.get("_embedded", {}).get("associations", [])
if not assocs:
break
all_assocs.extend(assocs)
time.sleep(0.1) # be polite to the public API
return all_assocs
def to_table(assocs, p_threshold: float = 5e-8) -> pd.DataFrame:
rows = []
for a in assocs:
p = a.get("pvalue")
try:
p_float = float(p) if p is not None else None
except (TypeError, ValueError):
p_float = None
if p_float is None or p_float > p_threshold:
continue
rows.append({
"rsId": a.get("rsId"),
"trait": a.get("efoTrait") or a.get("mappedLabel"),
"pvalue": p_float,
"strongestAllele": a.get("strongestAllele"),
"orPerCopyNum": a.get("orPerCopyNum"),
"betaNum": a.get("betaNum"),
"pubmedId": a.get("pubmedId"),
"studyAccession": a.get("studyAccession"),
})
df = pd.DataFrame(rows).drop_duplicates()
if not df.empty:
df = df.sort_values("pvalue", ascending=True).reset_index(drop=True)
return df
if __name__ == "__main__":
# Example: Type 2 diabetes (EFO_0001360)
efo_id = "EFO_0001360"
assocs = fetch_trait_associations(efo_id)
df = to_table(assocs, p_threshold=5e-8)
print(df.head(20).to_string(index=False))
print(f"\nSignificant associations: {len(df)}")
if not df.empty:
print(f"Unique variants: {df['rsId'].nunique()}")
Implementation Details
Data Model and Identifiers
- Study accession:
GCST...(e.g.,GCST001234) - Variant identifier:
rs...(e.g.,rs7903146) - Trait identifier: EFO term (e.g.,
EFO_0001360) - Gene symbol: HGNC-approved symbol (e.g.,
APOE,TCF7L2)
Core Endpoints (REST API)
- Study details:
GET /studies/{GCST} - Variant details:
GET /singleNucleotidePolymorphisms/{rsId} - Variant associations:
GET /singleNucleotidePolymorphisms/{rsId}/associations - Trait associations:
GET /efoTraits/{EFO}/associations
Pagination Strategy
- Most list endpoints are paginated.
- Use query parameters:
size: number of records per page (commonly 20–100)page: zero-based page index
- Stop conditions:
_embedded.associationsis empty, or- you reach a predefined
max_pagessafety limit.
Significance Thresholds and Filtering
- A common GWAS threshold is p ≤ 5×10⁻⁸ (genome-wide significance).
- Filtering should be applied after parsing
pvalueinto a numeric type; handle missing or non-numeric values safely.
Summary Statistics Access (when available)
- Summary Statistics API base:
https://www.ebi.ac.uk/gwas/summary-statistics/api - Typical filters include chromosome/position ranges and p-value bounds (endpoint availability and parameters may vary by resource version).
- For bulk downloads, the Catalog also provides an FTP directory:
http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/
Practical Notes for Robust Use
- Respect public API usage (add small delays; cache results for iterative workflows).
- Always interpret associations in context:
- ancestry/cohort metadata,
- sample size,
- replication status,
- effect size harmonization needs across studies.