Protocol Design

protocol-standardization

Standardize fragmented experimental steps into reproducible protocol documents when you need method organization, lab SOP drafting, or cross-operator reproducibility; missing parameters must be explicitly marked as "To be supplemented/Not provided".

85100Total Score
Core Capability
81 / 100
Functional Suitability
10 / 12
Reliability
9 / 12
Performance & Context
7 / 8
Agent Usability
13 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
9 / 12
Agent-Specific
15 / 20
Medical Task
20 / 20 Passed
92You have messy notes (chat logs, notebook fragments, bullet points) and need a formal, reproducible experimental protocol
4/4
88You are preparing a lab SOP for standardization across multiple operators or sites
4/4
86Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)
4/4
86Enforces parameter completeness for reproducibility (e.g., temperature, time, concentration, volume, mixing/rotation speed)
4/4
86End-to-end case for Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSThe archived evaluation treated outputs as protocol guidance to be tested later, not as validated experimental findings.
Practice BoundariesPASSThe package remained on the planning side of the boundary and did not cross into clinical or diagnostic advice.
Methodological GroundPASSThe legacy review kept the package aligned with its named analysis library, data structure, or processing workflow.
Code UsabilityN/AThe audited artifact centers on document or reasoning outputs, so code usability is not the main evaluation target here.

Core Capability81 / 1008 Categories

Functional Suitability
Related legacy finding for protocol-standardization: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
10 / 12
83%
Reliability
Related legacy finding for protocol-standardization: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
9 / 12
75%
Performance & Context
The archived evaluation left minor headroom in how efficiently the workflow handles heavier planning contexts.
7 / 8
88%
Agent Usability
The planning path is understandable, but the archived score suggests a little more trigger clarity would help agents route into it faster.
13 / 16
81%
Human Usability
No point loss was recorded for human usability in the legacy audit.
8 / 8
100%
Security
Security scored well, though the archived review still left some room to make boundary language even more explicit.
10 / 12
83%
Maintainability
Maintainability held up, but a little more consolidation or clearer packaging would likely close the remaining gap.
9 / 12
75%
Agent-Specific
The archived deduction in agent specific traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
15 / 20
75%
Core Capability Total81 / 100

Medical TaskExecution Average: 87.6 / 100 — Assertions: 20/20 Passed

92
Canonical
You have messy notes (chat logs, notebook fragments, bullet points) and need a formal, reproducible experimental protocol
4/4
88
Variant A
You are preparing a lab SOP for standardization across multiple operators or sites
4/4
86
Edge
Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)
4/4
86
Variant B
Enforces parameter completeness for reproducibility (e.g., temperature, time, concentration, volume, mixing/rotation speed)
4/4
86
Stress
End-to-end case for Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)
4/4
92
Canonical✅ Pass
You have messy notes (chat logs, notebook fragments, bullet points) and need a formal, reproducible experimental protocol

You have messy notes (chat logs, notebook fragments, bullet points)... stayed in planning mode and returned a bounded design deliverable without relying on a runnable script.

Basic 35/40|Specialized 57/60|Total 92/100
A1The protocol-standardization output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
88
Variant A✅ Pass
You are preparing a lab SOP for standardization across multiple operators or sites

The archived run treated You are preparing a lab SOP for standardization across multiple... as a protocol-design path rather than an executable workflow.

Basic 33/40|Specialized 55/60|Total 88/100
A1The protocol-standardization output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
86
Edge✅ Pass
Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)

The archived run treated Converts fragmented experimental steps into a standardized protocol... as a protocol-design path rather than an executable workflow.

Basic 32/40|Specialized 54/60|Total 86/100
A1The protocol-standardization output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
86
Variant B✅ Pass
Enforces parameter completeness for reproducibility (e.g., temperature, time, concentration, volume, mixing/rotation speed)

Enforces parameter completeness for reproducibility (e.g.,... stayed in planning mode and returned a bounded design deliverable without relying on a runnable script.

Basic 31/40|Specialized 55/60|Total 86/100
A1The protocol-standardization output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
86
Stress✅ Pass
End-to-end case for Converts fragmented experimental steps into a standardized protocol structure (prep → execution → closing)

This stress case remained a study-design support path, not a code-driven execution run.

Basic 28/40|Specialized 58/60|Total 86/100
A1The protocol-standardization output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
Medical Task Total87.6 / 100

Key Strengths

  • Primary routing is Protocol Design with execution mode A
  • Static quality score is 81/100 and dynamic average is 79.6/100
  • Assertions and command execution outcomes are recorded per input for human review
  • Execution verification summary: No script verification was applicable