Agent Skills

Gwas Database

AIPOCH

Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region.

222
6
FILES
gwas-database/
skill.md
references
api_reference.md
87100Total Score
View Evaluation Report
Core Capability
85 / 100
Functional Suitability
11 / 12
Reliability
9 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
11 / 12
Maintainability
9 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
92Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region
4/4
88Query the NHGRI-EBI GWAS Catalog to retrieve SNP–trait associations, study metadata, and (when available) summary statistics when you need evidence for a variant, trait/disease, gene, or genomic region
4/4
86Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID
4/4
86Structured entities: studies, associations, variants (SNPs), and traits (EFO-mapped)
4/4
86End-to-end case for Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID
4/4

SKILL.md

When to Use

Use this skill when you need to:

  1. Look up a specific variant (rsID) to see all reported trait/disease associations and their p-values/effect sizes.
  2. Find variants associated with a trait/disease (via free text or an EFO trait ID) for downstream interpretation or reporting.
  3. Perform gene-centric exploration to identify GWAS hits within/near a gene of interest.
  4. Retrieve study-level metadata (GCST accession, PMID, cohorts, ancestry, sample size) to assess evidence quality and applicability.
  5. Access or filter summary statistics (when available) for genome-wide analyses (e.g., fine-mapping, colocalization, PRS development).

Key Features

  • Multiple query entry points: rsID, EFO trait ID, gene symbol, chromosomal region, GCST accession, PMID.
  • Structured entities: studies, associations, variants (SNPs), and traits (EFO-mapped).
  • Programmatic access via:
    • GWAS Catalog REST API: https://www.ebi.ac.uk/gwas/rest/api
    • Summary Statistics API: https://www.ebi.ac.uk/gwas/summary-statistics/api
  • Association-level fields commonly used in analysis: p-value, strongest allele, odds ratio/beta, mapped trait labels.
  • Pagination support for bulk extraction (page, size, and _links navigation).

Dependencies

  • Python 3.9+
  • requests >= 2.31.0
  • pandas >= 2.0.0 (optional; for tabular outputs)

Example Usage

The following script is a complete, runnable example that:

  1. fetches associations for an EFO trait,
  2. filters by genome-wide significance,
  3. returns a tidy table.
import time
import requests
import pandas as pd

GWAS_REST_BASE = "https://www.ebi.ac.uk/gwas/rest/api"

def fetch_trait_associations(efo_id: str, page_size: int = 100, max_pages: int = 50):
    """
    Fetch associations for a given EFO trait ID from the GWAS Catalog REST API.
    Returns a list of association JSON objects.
    """
    url = f"{GWAS_REST_BASE}/efoTraits/{efo_id}/associations"
    headers = {"Accept": "application/json"}

    all_assocs = []
    for page in range(max_pages):
        params = {"page": page, "size": page_size}
        r = requests.get(url, params=params, headers=headers, timeout=60)
        r.raise_for_status()
        data = r.json()

        assocs = data.get("_embedded", {}).get("associations", [])
        if not assocs:
            break

        all_assocs.extend(assocs)
        time.sleep(0.1)  # be polite to the public API

    return all_assocs

def to_table(assocs, p_threshold: float = 5e-8) -> pd.DataFrame:
    rows = []
    for a in assocs:
        p = a.get("pvalue")
        try:
            p_float = float(p) if p is not None else None
        except (TypeError, ValueError):
            p_float = None

        if p_float is None or p_float > p_threshold:
            continue

        rows.append({
            "rsId": a.get("rsId"),
            "trait": a.get("efoTrait") or a.get("mappedLabel"),
            "pvalue": p_float,
            "strongestAllele": a.get("strongestAllele"),
            "orPerCopyNum": a.get("orPerCopyNum"),
            "betaNum": a.get("betaNum"),
            "pubmedId": a.get("pubmedId"),
            "studyAccession": a.get("studyAccession"),
        })

    df = pd.DataFrame(rows).drop_duplicates()
    if not df.empty:
        df = df.sort_values("pvalue", ascending=True).reset_index(drop=True)
    return df

if __name__ == "__main__":
    # Example: Type 2 diabetes (EFO_0001360)
    efo_id = "EFO_0001360"

    assocs = fetch_trait_associations(efo_id)
    df = to_table(assocs, p_threshold=5e-8)

    print(df.head(20).to_string(index=False))
    print(f"\nSignificant associations: {len(df)}")
    if not df.empty:
        print(f"Unique variants: {df['rsId'].nunique()}")

Implementation Details

Data Model and Identifiers

  • Study accession: GCST... (e.g., GCST001234)
  • Variant identifier: rs... (e.g., rs7903146)
  • Trait identifier: EFO term (e.g., EFO_0001360)
  • Gene symbol: HGNC-approved symbol (e.g., APOE, TCF7L2)

Core Endpoints (REST API)

  • Study details: GET /studies/{GCST}
  • Variant details: GET /singleNucleotidePolymorphisms/{rsId}
  • Variant associations: GET /singleNucleotidePolymorphisms/{rsId}/associations
  • Trait associations: GET /efoTraits/{EFO}/associations

Pagination Strategy

  • Most list endpoints are paginated.
  • Use query parameters:
    • size: number of records per page (commonly 20–100)
    • page: zero-based page index
  • Stop conditions:
    • _embedded.associations is empty, or
    • you reach a predefined max_pages safety limit.

Significance Thresholds and Filtering

  • A common GWAS threshold is p ≤ 5×10⁻⁸ (genome-wide significance).
  • Filtering should be applied after parsing pvalue into a numeric type; handle missing or non-numeric values safely.

Summary Statistics Access (when available)

  • Summary Statistics API base: https://www.ebi.ac.uk/gwas/summary-statistics/api
  • Typical filters include chromosome/position ranges and p-value bounds (endpoint availability and parameters may vary by resource version).
  • For bulk downloads, the Catalog also provides an FTP directory:
    • http://ftp.ebi.ac.uk/pub/databases/gwas/summary_statistics/

Practical Notes for Robust Use

  • Respect public API usage (add small delays; cache results for iterative workflows).
  • Always interpret associations in context:
    • ancestry/cohort metadata,
    • sample size,
    • replication status,
    • effect size harmonization needs across studies.