Data Analysis

diagnostic-study-quality-assessment-quadas-2

Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2").

86100Total Score

Core Capability

81 / 100

Functional Suitability

10 / 12

Reliability

9 / 12

Performance & Context

8 / 8

Agent Usability

13 / 16

Human Usability

7 / 8

Security

9 / 12

Maintainability

9 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

93Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

4/4

89Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

4/4

87Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool

4/4

87Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)

4/4

87End-to-end case for Scope-focused workflow aligned to: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Scientific integrity held because extraction and analysis outputs stayed tied to provided text, metadata, or runtime evidence rather than invented study findings.
Practice Boundaries	PASS	The archived review kept this package within Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool, not freeform inference detached from source data.
Methodological Ground	PASS	Methodological grounding held because the package kept its judgments tied to explicit rubric logic.
Code Usability	PASS	The legacy audit did not flag code-usability issues for the packaged diagnostic-study-quality-assessment-quadas-2 workflow.

Core Capability81 / 100 — 8 Categories

Functional Suitability

Functional suitability was softened by the legacy issue 'Improve stress-case output rigor'. Stress and boundary scenarios show weaker consistency

10 / 12

83%

Reliability

The archived deduction in reliability traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

9 / 12

75%

Performance & Context

Performance context reached full score in the archived evaluation.

8 / 8

100%

Agent Usability

The packaged analysis path is understandable, though the archived score suggests slightly clearer routing would help.

13 / 16

81%

Human Usability

The package is readable overall, though the archived review still left a small human-usability gap.

7 / 8

88%

Security

The packaged workflow stayed safe overall, with only a small remaining deduction around boundary signaling.

9 / 12

75%

Maintainability

The archived review treated the package as maintainable, while still preserving some room for cleanup.

9 / 12

75%

Agent-Specific

Related legacy finding for diagnostic-study-quality-assessment-quadas-2: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

16 / 20

80%

Core Capability Total81 / 100

Medical TaskExecution Average: 88.6 / 100 — Assertions: 20/20 Passed

Canonical

4/4 ✓

Variant A

4/4 ✓

Edge

Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool

4/4 ✓

Variant B

Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)

4/4 ✓

Stress

End-to-end case for Scope-focused workflow aligned to: Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool. Use when Claude needs to assess the quality, risk of bias, or applicability of diagnostic accuracy studies (e.g., "Assess this paper using QUADAS-2")

4/4 ✓

Canonical✅ Pass

The archived run treated Analyzes clinical diagnostic accuracy studies for bias using the... as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 35/40|Specialized 58/60|Total 93/100

✅A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant A✅ Pass

This variant a case stayed within the packaged analysis boundary and kept a reviewable task contract.

Basic 33/40|Specialized 56/60|Total 89/100

✅A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Edge✅ Pass

Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool

This edge case stayed within the packaged analysis boundary and kept a reviewable task contract.

Basic 32/40|Specialized 55/60|Total 87/100

✅A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant B✅ Pass

Packaged executable path(s): scripts/pdf_extractor.py plus 1 additional script(s)

Packaged executable path(s): scripts/pdf_extractor.py plus 1... remained an analysis-style extraction path whose value came from structured data capture rather than a freeform narrative response.

Basic 31/40|Specialized 56/60|Total 87/100

✅A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Stress✅ Pass

The archived run treated Analyzes clinical diagnostic accuracy studies for bias using the QUADAS-2 tool as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 28/40|Specialized 59/60|Total 87/100

✅A1The diagnostic-study-quality-assessment-quadas-2 output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Medical Task Total88.6 / 100

Key Strengths

Primary routing is Data Analysis with execution mode B
Static quality score is 81/100 and dynamic average is 77.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: Script verification 1/2; adjustment=3. pdf_extractor.py: rc=1; quadas_assessment.py: OK