Protocol Design

sample-size-and-power-planning-assistant

Plans sample size estimation logic, power assumptions, feasibility checks, and fallback enrollment strategies for clinical and translational study protocols.

90100Total Score
Core Capability
92 / 100
Functional Suitability
12 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
25 / 25 Passed
91Canonical input for sample-size-and-power-planning-assistant
5/5
91Variant A input for sample-size-and-power-planning-assistant
5/5
88Variant B input for sample-size-and-power-planning-assistant
5/5
86Edge input for sample-size-and-power-planning-assistant
5/5
86Stress input for sample-size-and-power-planning-assistant
5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo fabricated references, DOIs, PMIDs, statistical values, or clinical data detected.
Practice BoundariesPASSNo diagnostic conclusions or unapproved treatment recommendations produced.
Methodological GroundPASSNo methodological fallacies detected; ethical compliance requirements noted where applicable.
Code UsabilityN/ANo code generated; Mode A skill

Core Capability92 / 1008 Categories

Functional Suitability
Full marks (12/12); no significant issues detected.
12 / 12
100%
Reliability
Assumption-explicit labeling is excellent; fallback strategies for underpowered scenarios are valuable
10 / 12
83%
Performance & Context
Strong score (7/8); minor gaps noted.
7 / 8
88%
Agent Usability
Strong score (15/16); minor gaps noted.
15 / 16
94%
Human Usability
Strong score (7/8); minor gaps noted.
7 / 8
88%
Security
Full marks (12/12); no significant issues detected.
12 / 12
100%
Maintainability
Strong score (11/12); minor gaps noted.
11 / 12
92%
Agent-Specific
Realistic enrollment constraint integration is a rare and practical feature in sample size planning
18 / 20
90%
Core Capability Total92 / 100

Medical TaskExecution Average: 88.4 / 100 — Assertions: 25/25 Passed

91
Canonical
Canonical input for sample-size-and-power-planning-assistant
5/5
91
Variant A
Variant A input for sample-size-and-power-planning-assistant
5/5
88
Variant B
Variant B input for sample-size-and-power-planning-assistant
5/5
86
Edge
Edge input for sample-size-and-power-planning-assistant
5/5
86
Stress
Stress input for sample-size-and-power-planning-assistant
5/5
91
Canonical✅ Pass
Canonical input for sample-size-and-power-planning-assistant

5/5 assertions passed.

Basic 36/40|Specialized 55/60|Total 91/100
A1Core assertion 1 for canonical input
A2Core assertion 2 for canonical input
A3Core assertion 3 for canonical input
A4Core assertion 4 for canonical input
A5Core assertion 5 for canonical input
Pass rate: 5 / 5
91
Variant A✅ Pass
Variant A input for sample-size-and-power-planning-assistant

5/5 assertions passed.

Basic 36/40|Specialized 55/60|Total 91/100
A1Core assertion 1 for variant a input
A2Core assertion 2 for variant a input
A3Core assertion 3 for variant a input
A4Core assertion 4 for variant a input
A5Core assertion 5 for variant a input
Pass rate: 5 / 5
88
Variant B✅ Pass
Variant B input for sample-size-and-power-planning-assistant

5/5 assertions passed.

Basic 35/40|Specialized 53/60|Total 88/100
A1Core assertion 1 for variant b input
A2Core assertion 2 for variant b input
A3Core assertion 3 for variant b input
A4Core assertion 4 for variant b input
A5Core assertion 5 for variant b input
Pass rate: 5 / 5
86
Edge✅ Pass
Edge input for sample-size-and-power-planning-assistant

5/5 assertions passed.

Basic 34/40|Specialized 52/60|Total 86/100
A1Core assertion 1 for edge input
A2Core assertion 2 for edge input
A3Core assertion 3 for edge input
A4Core assertion 4 for edge input
A5Core assertion 5 for edge input
Pass rate: 5 / 5
86
Stress✅ Pass
Stress input for sample-size-and-power-planning-assistant

5/5 assertions passed.

Basic 34/40|Specialized 52/60|Total 86/100
A1Core assertion 1 for stress input
A2Core assertion 2 for stress input
A3Core assertion 3 for stress input
A4Core assertion 4 for stress input
A5Core assertion 5 for stress input
Pass rate: 5 / 5
Medical Task Total88.4 / 100

Key Strengths

  • Assumption-explicit labeling with transparent uncertainty ranges prevents false precision claims
  • Fallback enrollment strategies provide practical guidance when target sample size is unachievable
  • Study-type-specific calculation logic (survival vs binary vs continuous) is appropriate and rigorous
  • Prohibition on fabricating event rates, dropout rates, or published precedents maintains integrity