Data Analysis

diagnostic-study-quality-assessment-quadas-2

86100Total Score
Core Capability
81 / 100
Functional Suitability
10 / 12
Reliability
9 / 12
Performance & Context
8 / 8
Agent Usability
13 / 16
Human Usability
7 / 8
Security
9 / 12
Maintainability
9 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
93Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4
89Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4
87Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool
4/4
87Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)
4/4
87End-to-end case for Scope-focused workflow aligned to: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSScientific integrity held because extraction and analysis outputs stayed tied to provided text, metadata, or runtime evidence rather than invented study findings.
Practice BoundariesPASSThe archived review kept this package within Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool, not freeform inference detached from source data.
Methodological GroundPASSMethodological grounding held because the package kept its judgments tied to explicit rubric logic.
Code UsabilityPASSThe legacy audit did not flag code-usability issues for the packaged diagnostic-study-quality-assessment-quadas-2 workflow.

Core Capability81 / 1008 Categories

Functional Suitability
Functional suitability was softened by the legacy issue 'Improve stress-case output rigor'. Stress and boundary scenarios show weaker consistency
10 / 12
83%
Reliability
The archived deduction in reliability traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
9 / 12
75%
Performance & Context
Performance context reached full score in the archived evaluation.
8 / 8
100%
Agent Usability
The packaged analysis path is understandable, though the archived score suggests slightly clearer routing would help.
13 / 16
81%
Human Usability
The package is readable overall, though the archived review still left a small human-usability gap.
7 / 8
88%
Security
The packaged workflow stayed safe overall, with only a small remaining deduction around boundary signaling.
9 / 12
75%
Maintainability
The archived review treated the package as maintainable, while still preserving some room for cleanup.
9 / 12
75%
Agent-Specific
Related legacy finding for diagnostic-study-quality-assessment-quadas-2: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
16 / 20
80%
Core Capability Total81 / 100

Medical TaskExecution Average: 88.6 / 100 — Assertions: 20/20 Passed

93
Canonical
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4
89
Variant A
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4
87
Edge
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool
4/4
87
Variant B
Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)
4/4
87
Stress
End-to-end case for Scope-focused workflow aligned to: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")
4/4
93
Canonical✅ Pass
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

The archived run treated Analyzes clinical diagnostic accuracy studies for bias using the... as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 35/40|Specialized 58/60|Total 93/100
A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
89
Variant A✅ Pass
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

This variant a case stayed within the packaged analysis boundary and kept a reviewable task contract.

Basic 33/40|Specialized 56/60|Total 89/100
A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
87
Edge✅ Pass
Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool

This edge case stayed within the packaged analysis boundary and kept a reviewable task contract.

Basic 32/40|Specialized 55/60|Total 87/100
A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
87
Variant B✅ Pass
Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)

Packaged executable path(s): scripts/pdf_extractor.py plus 1... remained an analysis-style extraction path whose value came from structured data capture rather than a freeform narrative response.

Basic 31/40|Specialized 56/60|Total 87/100
A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
87
Stress✅ Pass
End-to-end case for Scope-focused workflow aligned to: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

The archived run treated Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 28/40|Specialized 59/60|Total 87/100
A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
Medical Task Total88.6 / 100

Key Strengths

  • Primary routing is Data Analysis with execution mode B
  • Static quality score is 81/100 and dynamic average is 77.6/100
  • Assertions and command execution outcomes are recorded per input for human review
  • Execution verification summary: Script verification 1/2; adjustment=3. pdf_extractor.py: rc=1; quadas_assessment.py: OK