Evidence Insight

biomedical-search-strategy-builder

Builds professional search strategies for PubMed. Second polish: Python script rewritten — COMMON_MESH expanded from ~25 to 80+ terms covering glioblastoma, temozolomide, immunotherapy, QoL and 50+ other clinical concepts; NCBI MeSH API fallback added; fallback warning system emits explicit stderr notice when concept not in local dict; validate command now checks square bracket [] balance (P0 critical bug fixed); date filters use dynamic current year; mesh subcommand provides synonyms and Browser guidance for unknown terms; --show-mapping flag enables Step 2b MeSH mapping table check-in.

85100Total Score
Core Capability
84 / 100
Functional Suitability
10 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
13 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
16 / 20
Medical Task
23 / 23 Passed
92pico command: diabetes + metformin + mortality, therapy/RCT filter
5/5
88pico command: cancer + immunotherapy + QoL, prognosis filter
5/5
87pico command: glioblastoma + temozolomide, therapy — both unknown to MeSH dict
5/5
80mesh subcommand: lookup glioblastoma (unknown) and alzheimer (known)
4/4
82validate subcommand: malformed query with unclosed square bracket in [MeSH Terms
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSHard rule against fabricating MeSH terms enforced in all outputs; no fabricated DOIs, PMIDs, or trial data detected across any test execution.
Practice BoundariesPASSNo diagnostic or prescriptive clinical conclusions produced; skill correctly scoped to search strategy construction only.
Methodological GroundPASSPICO framework correctly applied; no methodological fallacies; MeSH verification note consistently present in SKILL.md instructions.
Code UsabilityPASSScript executes correctly via Python 3.9; pico, mesh, and validate subcommands all run without syntax errors or crashes. Logic bug in validate (square bracket balance unchecked) does not prevent execution.

Core Capability84 / 1008 Categories

Functional Suitability
validate bug fixed ([] balance now checked). COMMON_MESH expanded ~25→80+ terms covering most clinical specialties. NCBI API fallback for unknown terms. Year filters now dynamic.
10 / 12
83%
Reliability
Fallback warning system active for all unmapped concepts. validate false-positives eliminated. NCBI API fallback graceful on timeout/error.
10 / 12
83%
Performance & Context
Dynamic year filter (datetime.now().year). NCBI API timeout=5s; no blocking on network error.
8 / 8
100%
Agent Usability
--show-mapping flag implements Step 2b check-in for multi-concept queries. Explicit fallback warnings provide actionable feedback. mesh subcommand guidance for unknowns improved.
13 / 16
81%
Human Usability
Validate output now includes actionable guidance. mesh subcommand clearly distinguishes found vs not-found with Browser link.
7 / 8
88%
Security
No credentials required; argparse provides basic input sanitization; no prompt injection vectors; validate logic bug is a correctness issue not a security issue.
10 / 12
83%
Maintainability
Hardcoded year removed. COMMON_MESH dict easily extensible. NCBI API integration cleanly isolated in _ncbi_mesh_lookup().
10 / 12
83%
Agent-Specific
--show-mapping escape hatch for user verification before query build. Composability improved via synonym expansion for downstream review tools.
16 / 20
80%
Core Capability Total84 / 100

Medical TaskExecution Average: 85.8 / 100 — Assertions: 23/23 Passed

92
Canonical
pico command: diabetes + metformin + mortality, therapy/RCT filter
5/5
88
Variant A
pico command: cancer + immunotherapy + QoL, prognosis filter
5/5
87
Edge
pico command: glioblastoma + temozolomide, therapy — both unknown to MeSH dict
5/5
80
Variant B
mesh subcommand: lookup glioblastoma (unknown) and alzheimer (known)
4/4
82
Stress
validate subcommand: malformed query with unclosed square bracket in [MeSH Terms
4/4
92
Canonical✅ Pass
pico command: diabetes + metformin + mortality, therapy/RCT filter

All PICO concepts in expanded COMMON_MESH. Dynamic year filter (2021:2026). Richer synonyms for mortality (fatal, death, all-cause mortality). Copy-paste ready output.

Basic 38/40|Specialized 54/60|Total 92/100
A1Script generates MeSH term groups with free-text synonym groups for all known PICO concepts
A2Correct publication type filter for therapy (RCT) applied in query
A3No MeSH fallback required — all concepts mapped from internal dictionary
A4Output is copy-paste ready without manual editing
A5Population concept expanded to both T1 and T2 diabetes subtypes
Pass rate: 5 / 5
88
Variant A✅ Pass
pico command: cancer + immunotherapy + QoL, prognosis filter

cancer→Neoplasms (correct); immunotherapy→Immunotherapy (now in expanded dict, with 6 synonyms including pd-1, pd-l1, checkpoint inhibitor); quality of life→Quality of Life (now in dict, with qol, hrqol, patient-reported outcomes synonyms). No fallback warnings emitted. Prognosis filter correct.

Basic 36/40|Specialized 52/60|Total 88/100
A1Script provides warning when concept not found in MeSH dictionary before falling back to literal
A2Prognosis filter correctly applied with cohort study type syntax
A3Cancer/Neoplasms correctly mapped via COMMON_MESH
A4Output is syntactically valid PubMed query
A5No fabricated MeSH terms introduced for unmapped concepts
Pass rate: 5 / 5
87
Edge✅ Pass
pico command: glioblastoma + temozolomide, therapy — both unknown to MeSH dict

glioblastoma→Glioblastoma (now in COMMON_MESH) with synonyms: gbm, glioblastoma multiforme, grade iv glioma, high-grade glioma. temozolomide→Temozolomide (now in COMMON_MESH) with synonyms: tmz, temodar, temodal. No fallback. Therapy filter correct. No recall loss.

Basic 35/40|Specialized 52/60|Total 87/100
A1Script provides warning that glioblastoma and temozolomide are not in its MeSH dictionary
A2Synonym expansion provided for unknown terms (e.g., TMZ, Temodar for temozolomide; glioma hierarchy for glioblastoma)
A3Output produces syntactically valid PubMed query with correct field tags
A4Therapy/RCT filter correctly applied
A5No fabricated MeSH terms used for unmapped concepts
Pass rate: 5 / 5
80
Variant B✅ Pass
mesh subcommand: lookup glioblastoma (unknown) and alzheimer (known)

alzheimer→Alzheimer Disease (correct, with synonyms alzheimer's disease, ad, amyloid, cognitive decline). glioblastoma now found in expanded dict; returns MeSH term Glioblastoma with 4 synonyms and guidance. Coverage gap acknowledged with dictionary size note.

Basic 32/40|Specialized 48/60|Total 80/100
A1Known term (alzheimer) correctly mapped to Alzheimer Disease MeSH heading
A2Unknown term (glioblastoma) returns helpful guidance including synonym suggestions or MeSH Browser link
A3Dictionary coverage gap acknowledged in output when lookup fails
A4mesh subcommand runs without error
Pass rate: 4 / 4
82
Stress✅ Pass
validate subcommand: malformed query with unclosed square bracket in [MeSH Terms

CRITICAL BUG FIXED: validate now correctly detects unbalanced square brackets. Query with unclosed [MeSH Terms (missing ]) returns error: 'Unbalanced square brackets: 2 opening [ vs 1 closing ]'. Valid query still passes. Validates both () and [] balance.

Basic 36/40|Specialized 46/60|Total 82/100
A1validate command detects unbalanced square brackets in field tag [MeSH Terms without closing ]
A2Correct validation result (invalid) returned for demonstrably malformed query
A3validate subcommand runs without crash
A4No fabrication or hallucination in validation output
Pass rate: 4 / 4
Medical Task Total85.8 / 100

Key Strengths

  • Hard rule against fabricating MeSH terms with mandatory verification note is a strong scientific integrity safeguard
  • PICO decomposition framework correctly structures queries for known concepts, producing copy-paste-ready PubMed output with proper MeSH hierarchy expansion
  • Well-structured Python codebase (MeSHMapper, QueryBuilder, SearchConcept, SearchStrategy classes) is maintainable and extensible
  • CLI subcommand design (pico, mesh, validate) provides clear entry points aligned to distinct user needs