Protocol Design

basic-research-design

A biomedical research topic designer that generates progressive experimental subtitles and detailed research outlines based on a given subject. Use when the user wants to design a research proposal, outline experiments for a topic, or structure a biomedical study.

87100Total Score

Core Capability

85 / 100

Functional Suitability

11 / 12

Reliability

9 / 12

Performance & Context

7 / 8

Agent Usability

14 / 16

Human Usability

8 / 8

Security

11 / 12

Maintainability

9 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

93A biomedical research topic designer that generates progressive experimental subtitles and detailed research outlines based on a given subject

4/4

89Step 2: Generate Research Outline

4/4

87Step 2: Generate Research Outline

4/4

87Step 1: Generate Subtitles

4/4

87Quality Rules

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	The archived evaluation treated outputs as protocol guidance to be tested later, not as validated experimental findings.
Practice Boundaries	PASS	The package remained on the planning side of the boundary and did not cross into clinical or diagnostic advice.
Methodological Ground	PASS	The archived evaluation treated the workflow as method-linked rather than ad hoc.
Code Usability	N/A	The package is evaluated primarily as a structured deliverable rather than an executable scientific code workflow.

Core Capability85 / 100 — 8 Categories

Functional Suitability

The archived deduction in functional suitability traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

11 / 12

92%

Reliability

Reliability was softened by the legacy issue 'Improve stress-case output rigor'. Stress and boundary scenarios show weaker consistency

9 / 12

75%

Performance & Context

The workflow scales reasonably, but the archived review still recorded a small performance-context deduction.

7 / 8

88%

Agent Usability

Agent usability was strong, though the package could make its decision points even easier to follow at first read.

14 / 16

88%

Human Usability

Human usability reached full score in the archived evaluation.

8 / 8

100%

Security

The planning workflow stayed safe overall, but the archived score suggests slightly stronger boundary signaling would help.

11 / 12

92%

Maintainability

The package remains maintainable, though the archived review saw modest room to simplify or stabilize its planning logic.

9 / 12

75%

Agent-Specific

The archived deduction in agent specific traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

16 / 20

80%

Core Capability Total85 / 100

Medical TaskExecution Average: 88.6 / 100 — Assertions: 20/20 Passed

Canonical

A biomedical research topic designer that generates progressive experimental subtitles and detailed research outlines based on a given subject

4/4 ✓

Variant A

Step 2: Generate Research Outline

4/4 ✓

Edge

Step 2: Generate Research Outline

4/4 ✓

Variant B

Step 1: Generate Subtitles

4/4 ✓

Stress

Quality Rules

4/4 ✓

Canonical✅ Pass

A biomedical research topic designer that generates progressive experimental subtitles and detailed research outlines based on a given subject

This canonical case stayed in proposal-building mode, using A biomedical research topic designer that generates progressive... to refine the study plan instead of running code.

Basic 36/40|Specialized 57/60|Total 93/100

✅A1The basic-research-design output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant A✅ Pass

Step 2: Generate Research Outline

This variant a case stayed in proposal-building mode, using Step 2: Generate Research Outline to refine the study plan instead of running code.

Basic 34/40|Specialized 55/60|Total 89/100

✅A1The basic-research-design output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Edge✅ Pass

Step 2: Generate Research Outline

This edge case stayed in proposal-building mode, using Step 2: Generate Research Outline to refine the study plan instead of running code.

Basic 33/40|Specialized 54/60|Total 87/100

✅A1The basic-research-design output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant B✅ Pass

Step 1: Generate Subtitles

The archived run treated Step 1: Generate Subtitles as a protocol-planning step that structures hypotheses, methods, and scope rather than executing the study itself.

Basic 32/40|Specialized 55/60|Total 87/100

✅A1The basic-research-design output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Stress✅ Pass

Quality Rules

Quality Rules remained a design-stage deliverable aimed at shaping the research plan, not producing validated experimental output.

Basic 29/40|Specialized 58/60|Total 87/100

✅A1The basic-research-design output structure matches the documented deliverable

✅A2The instruction path remains actionable for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Medical Task Total88.6 / 100

Key Strengths

Primary routing is Protocol Design with execution mode B
Static quality score is 85/100 and dynamic average is 80.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: No runnable Python scripts were available for verification