Agent Skills

Citation Management

AIPOCH

Comprehensive citation management for academic research; use when you need to discover papers (Google Scholar/PubMed), extract/verify metadata (DOI/PMID/arXiv/URL), and produce validated, clean BibTeX for manuscripts.

139
10
FILES
citation-management/
skill.md
scripts
doi_to_bibtex.py
extract_metadata.py
format_bibtex.py
search_google_scholar.py
search_pubmed.py
validate_citations.py
references
bibtex_formatting.md
citation_validation.md
google_scholar_search.md
metadata_extraction.md
pubmed_search.md
87100Total Score
View Evaluation Report
Core Capability
88 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
15 / 20 Passed
88You need to find relevant or highly cited papers on a topic using Google Scholar or PubMed
3/4
86You have identifiers (e.g., DOI, PMID, arXiv ID, URL) and must convert them into correct BibTeX
3/4
86Paper discovery
3/4
86Google Scholar search with year filtering, pagination, and citation-count sorting
3/4
86End-to-end case for Paper discovery
3/4

SKILL.md

When to Use

  • You need to find relevant or highly cited papers on a topic using Google Scholar or PubMed.
  • You have identifiers (e.g., DOI, PMID, arXiv ID, URL) and must convert them into correct BibTeX.
  • You want to verify citation accuracy (DOI resolution, required fields, consistency with CrossRef/PubMed).
  • You need to clean, deduplicate, sort, and standardize an existing .bib file before submission.
  • You are preparing a thesis/manuscript and need a reproducible workflow from search → extraction → formatting → validation.

Key Features

  • Paper discovery
    • Google Scholar search with year filtering, pagination, and citation-count sorting.
    • PubMed search with MeSH terms, field tags, publication-type filters, and date ranges.
  • Metadata extraction
    • Resolve DOI/PMID/arXiv/URL to structured metadata via CrossRef, PubMed E-utilities, and arXiv APIs.
    • Batch processing from files containing mixed identifiers.
  • BibTeX generation & cleanup
    • Generate BibTeX entries with appropriate entry types and required fields.
    • Format, sort (key/year/author), and deduplicate BibTeX libraries.
  • Citation validation
    • DOI resolution checks and metadata cross-checking.
    • Required-field checks by entry type, syntax validation, duplicate detection, and optional auto-fix.
  • Workflow integration
    • Produces submission-ready .bib files for LaTeX/Overleaf workflows and complements literature review pipelines.

Dependencies

  • Python: 3.10+ (recommended)
  • Python packages:
    • requests>=2.31.0
    • scholarly>=1.7.11 (optional; required only for Google Scholar automation)

Example Usage

A complete, end-to-end workflow that searches, extracts metadata, formats, deduplicates, and validates a bibliography:

# 1) Search PubMed (biomedical focus)
python scripts/search_pubmed.py \
  --query '"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \
  --date-start 2020-01-01 \
  --date-end 2024-12-31 \
  --limit 200 \
  --output crispr_pubmed.json

# 2) Search Google Scholar (broad coverage)
python scripts/search_google_scholar.py "CRISPR gene editing therapeutics" \
  --year-start 2020 \
  --year-end 2024 \
  --limit 100 \
  --output crispr_scholar.json

# 3) Extract metadata from search outputs (or mixed identifiers)
cat crispr_pubmed.json crispr_scholar.json > combined_results.json
python scripts/extract_metadata.py \
  --input combined_results.json \
  --output combined.bib

# 4) Add known papers by DOI (append)
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2 >> combined.bib
python scripts/doi_to_bibtex.py 10.1126/science.aam9317 >> combined.bib

# 5) Format + deduplicate + sort (newest first)
python scripts/format_bibtex.py combined.bib \
  --deduplicate \
  --sort year \
  --descending \
  --output formatted.bib

# 6) Validate + auto-fix common issues + emit report
python scripts/validate_citations.py formatted.bib \
  --auto-fix \
  --report validation.json \
  --output final_references.bib

# 7) Inspect validation results
cat validation.json

Implementation Details

1) Search (Discovery)

  • Google Scholar (scripts/search_google_scholar.py)

    • Supports query operators such as exact phrases ("deep learning"), author filters (author:LeCun), title-only (intitle:"neural networks"), exclusions (-survey), and year ranges.
    • Typical parameters:
      • --year-start, --year-end: constrain publication years
      • --limit: cap results
      • --sort-by citations: prioritize highly cited papers (when supported by the script)
  • PubMed (scripts/search_pubmed.py)

    • Uses NCBI E-utilities (e.g., ESearch/EFetch/ESummary) to retrieve PMIDs and metadata.
    • Typical parameters:
      • --query: supports MeSH terms, field tags, and Boolean logic
      • --date-start, --date-end: publication date filtering
      • --publication-types: e.g., Clinical Trial,Review
      • --format: JSON or BibTeX output (if supported)

(See: references/google_scholar_search.md, references/pubmed_search.md)

2) Metadata Extraction (Normalization)

  • Identifier inputs: DOI, PMID, arXiv ID, URL, or mixed lists/files.
  • Primary sources:
    • CrossRef API for DOI-centric journal metadata
    • PubMed E-utilities for biomedical records (PMID/PMCID, MeSH, abstracts)
    • arXiv API for preprints and versioned records
    • DataCite API for datasets/software DOIs (if implemented/used)
  • Field mapping goals:
    • Required: author, title, year
    • Articles: journal, volume, number, pages, doi
    • Conferences: booktitle, pages
    • Preprints: repository + identifier (e.g., eprint, archivePrefix)

(See: references/metadata_extraction.md)

3) BibTeX Formatting (Quality & Consistency)

  • Entry types commonly produced: @article, @inproceedings, @book, @misc.
  • Formatting rules enforced/encouraged:
    • Page ranges use -- (e.g., 123--145)
    • Protect capitalization in titles with braces (e.g., {CRISPR})
    • Consistent author formatting (Last, First and Last, First)
    • Stable citation keys (project convention; often FirstAuthorYearKeyword)

(See: references/bibtex_formatting.md)

4) Validation (Correctness)

Validation typically checks:

  • DOI validity: resolves via doi.org and matches CrossRef metadata.
  • Required fields: present per entry type; no empty critical fields.
  • Consistency: year format, numeric volume/issue, page-range syntax, URL accessibility.
  • Duplicates: same DOI, near-identical titles, or same author/year/title combinations.
  • BibTeX syntax: braces/quotes, commas, unique keys, special character handling.

Outputs may include a machine-readable report (e.g., JSON) with errors and warnings. (See: references/citation_validation.md)