quality-assessment
Automates critical appraisal and quality assessment for research papers by analyzing text against established methodological standards (such as risk of bias tools, quality checklists, or reporting guidelines) and synthesizing a structured evaluation report. Use when you need to assess the methodological quality, internal validity, or reporting completeness of any type of study—including RCTs, observational studies, systematic reviews, qualitative research, or diagnostic accuracy studies.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Scientific integrity held because extraction and analysis outputs stayed tied to provided text, metadata, or runtime evidence rather than invented study findings. |
| Practice Boundaries | PASS | The evaluated outputs stayed inside the Automates critical appraisal and quality assessment for research papers by analyzing text... and did not drift into unsupported interpretation beyond the available inputs. |
| Methodological Ground | PASS | The workflow stayed grounded in its declared rubric or scale-selection logic rather than improvised criteria. |
| Code Usability | PASS | No code-usability failure was preserved for quality-assessment in the legacy evaluation. |
Core Capability88 / 100 — 8 Categories
Medical TaskExecution Average: 86.2 / 100 — Assertions: 15/20 Passed
This canonical case was mostly intact, but the archived review centered its concern on: The script execution path completed successfully for the documented case.
The main issue in this variant a run was: The script execution path completed successfully for the documented case.
The main issue in this edge run was: The script execution path completed successfully for the documented case.
This variant b case was mostly intact, but the archived review centered its concern on: The script execution path completed successfully for the documented case.
The main issue in this stress run was: The script execution path completed successfully for the documented case.
Key Strengths
- Primary routing is Data Analysis with execution mode B
- Static quality score is 88/100 and dynamic average is 72.6/100
- Assertions and command execution outcomes are recorded per input for human review
- Execution verification summary: Script verification 1/1; adjustment=5. extract_pdf.py: OK