Protocol Design

clinic-research-design

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs.

91100Total Score

Core Capability

83 / 100

Functional Suitability

11 / 12

Reliability

10 / 12

Performance & Context

8 / 8

Agent Usability

13 / 16

Human Usability

7 / 8

Security

9 / 12

Maintainability

9 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

100Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4

98Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4

96Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4

95Packaged executable path(s): scripts/calculators/sample_size.py plus 4 additional script(s)

4/4

95End-to-end case for Scope-focused workflow aligned to: Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Scientific integrity held because the archived workflow stayed at the level of study planning, hypothesis framing, and experiment design rather than claiming completed results.
Practice Boundaries	PASS	The package remained on the planning side of the boundary and did not cross into clinical or diagnostic advice.
Methodological Ground	PASS	Methodological grounding was preserved through the documented inputs, transformations, and expected artifacts.
Code Usability	N/A	This package is packaging-first and output-first, not code-first, so code usability is treated as not applicable.

Core Capability83 / 100 — 8 Categories

Functional Suitability

The archived review left some room to tighten how Generates a structured prompt framework for clinical study protocols. Supports Diagnostic,... maps onto a finished protocol-style deliverable.

11 / 12

92%

Reliability

The package stayed structured, but the archived score suggests more consistency would help under sparse or stress-case inputs.

10 / 12

83%

Performance & Context

No point loss was recorded for performance context in the legacy audit.

8 / 8

100%

Agent Usability

The planning path is understandable, but the archived score suggests a little more trigger clarity would help agents route into it faster.

13 / 16

81%

Human Usability

Human usability was softened by the legacy issue 'Minor polish before wide rollout'. No major defects found

7 / 8

88%

Security

Security scored well, though the archived review still left some room to make boundary language even more explicit.

9 / 12

75%

Maintainability

The package remains maintainable, though the archived review saw modest room to simplify or stabilize its planning logic.

9 / 12

75%

Agent-Specific

Agent-specific quality remained high, with a small gap around determinism or edge-case prompting behavior.

16 / 20

80%

Core Capability Total83 / 100

Medical TaskExecution Average: 96.8 / 100 — Assertions: 20/20 Passed

100

Canonical

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4 ✓

Variant A

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4 ✓

Edge

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4 ✓

Variant B

Packaged executable path(s): scripts/calculators/sample_size.py plus 4 additional script(s)

4/4 ✓

Stress

End-to-end case for Scope-focused workflow aligned to: Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

4/4 ✓

100

Canonical✅ Pass

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

The archived run for Generates a structured prompt framework for clinical study... confirmed the helper entrypoint and left the workflow in a stable state.

Basic 38/40|Specialized 60/60|Total 100/100

✅A1The clinic-research-design output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant A✅ Pass

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

The archived run for Generates a structured prompt framework for clinical study... confirmed the helper entrypoint and left the workflow in a stable state.

Basic 36/40|Specialized 60/60|Total 98/100

✅A1The clinic-research-design output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Edge✅ Pass

Generates a structured prompt framework for clinical study protocols. Supports Diagnostic, Efficacy, Etiology, and Prognosis studies. Calculates sample size and provides logic guides for LLMs

The archived run for Generates a structured prompt framework for clinical study... confirmed the helper entrypoint and left the workflow in a stable state.

Basic 35/40|Specialized 60/60|Total 96/100

✅A1The clinic-research-design output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant B✅ Pass

Packaged executable path(s): scripts/calculators/sample_size.py plus 4 additional script(s)

The archived run for Packaged executable path(s): scripts/calculators/sample_size.py... confirmed the helper entrypoint and left the workflow in a stable state.

Basic 34/40|Specialized 60/60|Total 95/100

✅A1The clinic-research-design output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Stress✅ Pass

The Generates a structured prompt framework for clinical study protocols. Supports Diagnostic,... path verified the packaged helper command without exposing a deeper execution issue.

Basic 31/40|Specialized 60/60|Total 95/100

✅A1The clinic-research-design output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Medical Task Total96.8 / 100

Key Strengths

Primary routing is Protocol Design with execution mode B
Static quality score is 83/100 and dynamic average is 84.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: Script verification 4/4; adjustment=5. main.py: OK; protocol_writer.py: OK; study_classifier.py: OK; validate_skill.py: OK