protocol-deviation-classifier
Determine whether an incident in a clinical trial is a "major deviation.
Veto GatesRequired pass for any deployment consideration
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | The archived review kept this workflow anchored to supplied data fields and observable execution behavior, not fabricated results. |
| Practice Boundaries | PASS | Practice boundaries held because the package remained focused on Determine whether an incident in a clinical trial is a "major deviation rather than overclaiming what the records supported. |
| Methodological Ground | PASS | Methodological grounding was preserved through the documented inputs, transformations, and expected artifacts. |
| Code Usability | PASS | The legacy audit did not record a code-usability failure in the packaged analysis path. |
Core Capability87 / 100 — 8 Categories
Medical TaskExecution Average: 84.2 / 100 — Assertions: 17/20 Passed
The Determine whether an incident in a clinical trial is a "major deviation scenario completed within the documented Determine whether an incident in a clinical trial is a "major deviation boundary.
The archived evaluation treated Use this skill for data analysis tasks that require explicit... as a clean in-scope run.
The archived run for Determine whether an incident in a clinical trial is a "major deviation confirmed the helper entrypoint and left the workflow in a stable state.
The archived evidence for Packaged executable path(s): scripts/main.py shows a real execution snag, though not one that erased the workflow contract.
The main issue in this stress run was: The output stays within declared skill scope and target objective.
Key Strengths
- Primary routing is Data Analysis with execution mode B
- Static quality score is 87/100 and dynamic average is 84.2/100
- Assertions and command execution outcomes are recorded per input for human review