Agent Skills

Reference Finder

AIPOCH

Automatically finds and ranks PubMed references for each sentence in scientific text; use when you need titles, DOIs, and brief recommendation reasons from the PubMed E-utilities API.

2
0
FILES
reference-finder/
skill.md
scripts
find_refs.py
references
evaluation-checklist.md
91100Total Score
View Evaluation Report
Core Capability
88 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
97You have a scientific paragraph and want suggested PubMed papers for each sentence
4/4
93You need top-ranked references with title, DOI, PMID, year, and a short why recommended explanation
4/4
91Sentence-level reference matching for scientific text
4/4
91Returns the top N (default: 3) most relevant PubMed records per sentence
4/4
91End-to-end case for Sentence-level reference matching for scientific text
4/4

SKILL.md

When to Use

  • You have a scientific paragraph and want suggested PubMed papers for each sentence.
  • You need top-ranked references with title, DOI, PMID, year, and a short why recommended explanation.
  • You are drafting or reviewing a manuscript and want quick literature grounding for key claims.
  • You want a lightweight reference matcher that uses only the official PubMed E-utilities API (no third-party services).
  • You need a scriptable tool for batch or CLI workflows to generate candidate citations.

Key Features

  • Sentence-level reference matching for scientific text.
  • Returns the top N (default: 3) most relevant PubMed records per sentence.
  • Outputs structured fields: title, DOI, PMID, year, recommendation reason.
  • Relevance ranking based on:
    • keyword overlap / match strength,
    • publication year preference,
    • citation-count signal (when available/derivable).
  • Safety constraints:
    • Network access restricted to eutils.ncbi.nlm.nih.gov.
    • No local filesystem writes except to outputs/ during execution.
    • Request timeout set to 30 seconds with clear error messages.
  • Supports Python API usage and CLI usage (including interactive mode).

Dependencies

  • Python 3.x (standard library only; no third-party packages required)

Example Usage

Python (direct call)

from reference_finder import find_references

text = "CRISPR-Cas9 gene editing has revolutionized biomedical research."

results = find_references(text)

for ref in results[:3]:
    print(f"- {ref['title']} ({ref['year']})")
    print(f"  DOI: {ref['doi']}")
    print(f"  PMID: {ref['pmid']}")
    print(f"  Reason: {ref['reason']}")

CLI (single input)

python scripts/find_refs.py "CRISPR-Cas9 gene editing has revolutionized biomedical research."

CLI (interactive mode)

python scripts/find_refs.py

Example output (JSON)

[
  {
    "pmid": "PMID:",
    "title": "A Programmable Dual-RNA-Guided DNA Endonuclease in Vitro",
    "doi": "10.1126/science.1225829",
    "year": 2012,
    "reason": "Highest keyword match for 'CRISPR-Cas9', foundational paper"
  }
]

Implementation Details

Data flow

  1. Sentence splitting: The input text is split into sentences (implementation-defined; typically punctuation-based).
  2. PubMed search (ESearch): For each sentence, a query is sent to:
    • https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
  3. Record retrieval (EFetch): The top candidate PMIDs are fetched via:
    • https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi
  4. Field extraction: Title, year, PMID, and DOI (when present) are extracted from the returned metadata.
  5. Ranking and selection: Candidates are scored and the top N are returned with a short recommendation reason.

Ranking signals

  • Keyword match: Measures overlap between sentence terms and retrieved record metadata (e.g., title/abstract terms when available).
  • Publication year: Used as a preference signal (e.g., favoring more recent work unless a classic/foundational match is strong).
  • Citation count: Incorporated when available/derivable; otherwise treated as missing without failing the run.

Operational constraints and safety

  • Allowed network host: eutils.ncbi.nlm.nih.gov only.
  • Prohibited: Any third-party URLs.
  • Filesystem: Do not write outside outputs/ during execution.
  • Rate limiting: Use a reasonable request cadence (e.g., ~0.5s between requests) to respect API limits.
  • Timeout: 30 seconds per request.
  • Error handling: Return semantic, user-readable error messages for network/API/parse failures.

Defaults

  • Top references per sentence: 3
  • Endpoints:
    • ESearch: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi
    • EFetch: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi