Agent Skills
Matchms
AIPOCH
Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows.
71
6
FILES
85100Total Score
View Evaluation ReportCore Capability
83 / 100
Functional Suitability
10 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
13 / 16
Human Usability
7 / 8
Security
9 / 12
Maintainability
10 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
91Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows
4/4
87Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows
4/4
85Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms
4/4
85Packaged executable path(s): scripts/similarity_pipeline.py
4/4
85End-to-end case for Scope-focused workflow aligned to: Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows
4/4
SKILL.md
Matchms Skill
When to Use
- Use this skill when you need process, clean, and compare mass spectrometry (ms/ms) spectra with matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows in a reproducible workflow.
- Use this skill when a data analytics task needs a packaged method instead of ad-hoc freeform output.
- Use this skill when the user expects a concrete deliverable, validation step, or file-based result.
- Use this skill when
scripts/similarity_pipeline.pyis the most direct path to complete the request. - Use this skill when you need the
matchmspackage behavior rather than a generic answer.
Key Features
- Scope-focused workflow aligned to: Process, clean, and compare mass spectrometry (MS/MS) spectra with Matchms; use when you need reproducible spectral filtering and similarity scoring for metabolomics workflows.
- Packaged executable path(s):
scripts/similarity_pipeline.py. - Reference material available in
references/for task-specific guidance. - Structured execution path designed to keep outputs consistent and reviewable.
Dependencies
Python:3.10+. Repository baseline for current packaged skills.Third-party packages:not explicitly version-pinned in this skill package. Add pinned versions if this skill needs stricter environment control.
Example Usage
cd "20260316/scientific-skills/Data Analytics/matchms"
python -m py_compile scripts/similarity_pipeline.py
python scripts/similarity_pipeline.py --help
Example run plan:
- Confirm the user input, output path, and any required config values.
- Edit the in-file
CONFIGblock or documented parameters if the script uses fixed settings. - Run
python scripts/similarity_pipeline.pywith the validated inputs. - Review the generated output and return the final artifact with any assumptions called out.
Implementation Details
- Execution model: validate the request, choose the packaged workflow, and produce a bounded deliverable.
- Input controls: confirm the source files, scope limits, output format, and acceptance criteria before running any script.
- Primary implementation surface:
scripts/similarity_pipeline.py. - Reference guidance:
references/contains supporting rules, prompts, or checklists. - Parameters to clarify first: input path, output path, scope filters, thresholds, and any domain-specific constraints.
- Output discipline: keep results reproducible, identify assumptions explicitly, and avoid undocumented side effects.
1. When to Use
Use this skill when you need to:
- Import and harmonize MS/MS spectra from common community formats (e.g., MGF/MSP) before analysis.
- Clean spectra (peak filtering, intensity normalization) to improve downstream similarity scoring and identification.
- Compute spectral similarity (Cosine/Modified Cosine/Fingerprint-based) for library matching or clustering.
- Build reproducible, configurable processing pipelines for metabolomics projects.
- Compare many spectra efficiently (all-vs-all or query-vs-library) and store/inspect score outputs.
2. Key Features
- Import/Export support: Read spectra from mzML, mzXML, MGF, MSP, and JSON (depending on installed readers).
- Filtering & harmonization: Metadata standardization, peak cleaning, intensity normalization, and other reusable filters.
- Similarity scoring:
- Cosine similarity (Greedy/Hungarian variants)
- Modified Cosine (accounts for precursor mass shifts)
- Fingerprint-based similarities (when molecular fingerprints are available)
- Pipeline composition: Chain filters and scoring steps into repeatable workflows.
Additional reference material (if present in the repository):
- Filters:
references/filtering.md - Similarity:
references/similarity.md - Workflows:
references/workflows.md
3. Dependencies
matchms(version depends on your environment; pin in your project, e.g.,matchms>=0.20,<1.0)numpy(e.g.,numpy>=1.20)scipy(e.g.,scipy>=1.7)rdkit(optional; required for chemistry/fingerprint-related functionality, version varies by distribution)
4. Example Usage
A minimal, runnable example that loads spectra from an MGF file and computes pairwise cosine scores:
from matchms.importing import load_from_mgf
from matchms import calculate_scores
from matchms.similarity import CosineGreedy
def main():
# Load spectra from an MGF file
spectra = list(load_from_mgf("data.mgf"))
# Compute similarity scores (all-vs-all)
scores = calculate_scores(
references=spectra,
queries=spectra,
similarity_function=CosineGreedy()
)
# Iterate over computed scores
for (reference_idx, query_idx, score, n_matches) in scores:
print(
f"ref={reference_idx:>3} query={query_idx:>3} "
f"cosine={score:.4f} matches={n_matches}"
)
if __name__ == "__main__":
main()
5. Implementation Details
- Data model: Matchms operates on
Spectrumobjects containing peak m/z and intensity arrays plus metadata (e.g., precursor m/z, charge, compound name/identifier). - Filtering stage: Typical pipelines apply filters to:
- standardize/repair metadata fields,
- remove noise peaks (e.g., by intensity threshold or m/z window rules),
- normalize intensities (commonly to a maximum of 1.0 or to unit norm).
See
references/filtering.mdfor filter patterns and recommended sequences.
- Cosine similarity (Greedy/Hungarian):
- Peaks are matched within an m/z tolerance (implementation-specific defaults; configure via the similarity class parameters).
- Greedy matching selects best available peak matches iteratively.
- Hungarian matching solves an assignment problem to maximize total match score under one-to-one constraints.
- Modified Cosine:
- Extends cosine matching by allowing peak alignment with a precursor mass shift, improving matching for related compounds/adducts.
- Typically requires precursor m/z metadata to be present and consistent.
- Fingerprint similarity (optional):
- Requires molecular fingerprints (often derived via RDKit) and compares spectra/compounds using fingerprint similarity metrics.
- Use when you have structure annotations or can compute fingerprints reliably.
- Workflow reproducibility:
- Prefer explicit, ordered filter chains and pinned dependency versions.
- Store configuration (tolerances, normalization choices, filters used) alongside results for traceability.
See
references/workflows.mdfor pipeline organization guidance.