Agent Skills

Hmdb Database

AIPOCH

Access the Human Metabolome Database (HMDB) to search metabolites by name/structure/ID and extract chemical/biological/clinical fields when you need metabolomics research data or automated HMDB XML mining.

73
7
FILES
hmdb-database/
skill.md
scripts
hmdb_parser.py
references
hmdb_data_fields.md
90100Total Score
View Evaluation Report
Core Capability
87 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
9 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
97You need to look up a metabolite by common name (e.g., “Caffeine”) and retrieve its HMDB entry data
4/4
93You have an HMDB ID (e.g., HMDB0000001) and want to extract standardized chemical/biological/clinical fields for downstream analysis
4/4
91Search metabolites by:
4/4
91Text name
4/4
91End-to-end case for Search metabolites by:
4/4

SKILL.md

When to Use

  • You need to look up a metabolite by common name (e.g., “Caffeine”) and retrieve its HMDB entry data.
  • You have an HMDB ID (e.g., HMDB0000001) and want to extract standardized chemical/biological/clinical fields for downstream analysis.
  • You want to build a local, scriptable pipeline to mine the HMDB XML dump instead of manually browsing the website.
  • You need to map HMDB identifiers to external resources (e.g., KEGG, PubChem, ChEBI) for integration tasks.
  • You are preparing metabolomics datasets and need pathway/enzyme/transporter annotations from HMDB entries.

Key Features

  • Search metabolites by:
    • Text name
    • HMDB identifier (e.g., HMDB0000001)
    • Structure-related query (as supported by the parser/search implementation)
  • Parse the HMDB XML dataset and extract:
    • Chemical data (formula, molecular weight, InChI/SMILES where available)
    • Biological data (pathways, enzymes, transporters)
    • Clinical data (disease associations, biofluid concentrations)
  • Optional structuring of extracted results for analysis workflows (e.g., tabular outputs).
  • Supports integration workflows by exposing identifiers suitable for cross-database mapping.

Dependencies

  • Python >=3.9
  • Standard library:
    • xml.etree.ElementTree (built-in)
  • Optional:
    • pandas >= 1.5

Example Usage

1) Download HMDB XML

Download the HMDB metabolite XML dataset from:

Assume you saved it as:

data/hmdb_metabolites.xml

2) Search and Extract Fields (Runnable Example)

from scripts.hmdb_parser import HMDBParser

def main():
    # Path to the HMDB XML dump downloaded from hmdb.ca/downloads
    xml_path = "data/hmdb_metabolites.xml"

    parser = HMDBParser(xml_path)

    # Search by metabolite name (text query)
    results = parser.search("Caffeine")

    # Print basic information from the first match (structure depends on implementation)
    if not results:
        print("No results found.")
        return

    first = results[0]
    print("Top match:")
    print(first)

if __name__ == "__main__":
    main()

3) Field Reference

For a curated list of extractable fields and how they map to HMDB XML elements, see:

  • references/hmdb_data_fields.md

Implementation Details

  • Data acquisition

    • Primary workflow uses the official HMDB downloadable XML dataset (recommended for bulk parsing).
    • Single-entry lookups can be done via the HMDB website, but this skill is designed around XML parsing.
  • Parsing approach

    • The parser reads the HMDB XML and traverses metabolite entries using xml.etree.ElementTree.
    • Extracted fields should follow the definitions documented in references/hmdb_data_fields.md.
  • Search behavior

    • Name/ID search typically matches against key textual identifiers (e.g., common name, synonyms, HMDB accession).
    • Structure-based search is dependent on what structural fields are indexed/exposed by HMDBParser (e.g., SMILES/InChI).
  • Integration / cross-references

    • HMDB entries often include cross-references to external databases (e.g., KEGG, PubChem, ChEBI).
    • A common workflow is to extract these identifiers and build mapping tables for downstream joins.
  • Spectral analysis (conceptual)

    • HMDB contains NMR/MS references for some metabolites; this skill can be extended to link parsed entries to spectral metadata.
    • Actual spectral matching/identification is not guaranteed unless implemented in the codebase.