reference-finder
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | The archived evaluation did not indicate fabricated or unsupported scientific claims in reference-finder. |
| Practice Boundaries | PASS | Practice boundaries held because the package remained focused on source handling, lookup, or structured evidence use. |
| Methodological Ground | PASS | Methodological grounding was preserved through the documented inputs, transformations, and expected artifacts. |
| Code Usability | PASS | The legacy audit did not record a code-usability failure in the packaged analysis path. |
Core Capability88 / 100 — 8 Categories
Medical TaskExecution Average: 92.6 / 100 — Assertions: 20/20 Passed
The archived run treated You have a scientific paragraph and want suggested PubMed papers... as a bounded extraction workflow, keeping attention on source fields, fallback logic, and output shape.
This variant a case stayed focused on extracting and normalizing evidence from the provided records instead of drifting into unsupported interpretation.
This edge case stayed within the packaged analysis boundary and kept a reviewable task contract.
The archived run treated Returns the top N (default: 3) most relevant PubMed records per sentence as a bounded extraction workflow, keeping attention on source fields, fallback logic, and output shape.
End-to-end case for Sentence-level reference matching for... remained tied to the documented analysis contract even when the preserved evidence centered on instructions instead of a full rerun.
Key Strengths
- Primary routing is Evidence Insight with execution mode B
- Static quality score is 88/100 and dynamic average is 79.6/100
- Assertions and command execution outcomes are recorded per input for human review
- Execution verification summary: Script verification 1/1; adjustment=5. find_refs.py: OK