pubmed-search-specialist
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | The archived evaluation kept the skill tied to retrieved records or indexed source material rather than invented scientific claims. |
| Practice Boundaries | PASS | The legacy review kept this workflow on the evidence-access side of the boundary, not the advice-giving side. |
| Methodological Ground | PASS | The older review treated the package logic as methodologically aligned with its stated workflow. |
| Code Usability | PASS | Code usability passed because the search or lookup workflow still exposed a usable entrypoint and output expectation. |
Core Capability88 / 100 — 8 Categories
Medical TaskExecution Average: 83.6 / 100 — Assertions: 18/20 Passed
The Build complex Boolean query strings for precise PubMed/MEDLINE... scenario completed within the documented Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval.... boundary.
The archived evaluation treated Use this skill for evidence insight tasks that require explicit... as a clean in-scope run.
The Build complex Boolean query strings for precise PubMed/MEDLINE... path verified the packaged helper command without exposing a deeper execution issue.
The Packaged executable path(s): scripts/main.py scenario completed within the documented Build complex Boolean query strings for precise PubMed/MEDLINE literature retrieval.... boundary.
This stress case was mostly intact, but the archived review centered its concern on: The output stays within declared skill scope and target objective.
Key Strengths
- Primary routing is Evidence Insight with execution mode B
- Static quality score is 88/100 and dynamic average is 83.6/100
- Assertions and command execution outcomes are recorded per input for human review