Protocol Design
confounder-and-bias-control-planner
Plans confounder control, variable adjustment logic, and bias mitigation strategies at the protocol stage for clinical, epidemiologic, translational, observational, and biomarker studies. Always use this skill when a user needs to identify major confounders, decide which variable
89100Total Score
Core Capability
93 / 100
Functional Suitability
12 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
16 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
17 / 20
Medical Task
25 / 25 Passed
88Baseline CRP as predictor of sepsis mortality — what to adjust for
5/5
87Retrospective EHR cohort — identify bias and confounders before protocol finalization
5/5
87Case-control study of smoking and lupus — which variables to match on
5/5
87Mixed variable list including post-treatment response — role classification challenge
5/5
86Pressure-test a complex observational protocol with propensity score plan already drafted
5/5
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated references, DOIs, PMIDs, statistical values, or clinical data detected. |
| Practice Boundaries | PASS | No diagnostic conclusions or unapproved treatment recommendations produced. |
| Methodological Ground | PASS | Collider bias, mediator misclassification, and immortal time bias all explicitly handled |
| Code Usability | N/A | No code generated; Category 2, design planning only |
Core Capability93 / 100 — 8 Categories
Functional Suitability
Full marks (12/12); no significant issues detected.
12 / 12
100%
Reliability
Strong handling of uncertain variable roles; gap: no fallback when user provides no variable list at all
10 / 12
83%
Performance & Context
Strong score (7/8); minor gaps noted.
7 / 8
88%
Agent Usability
Full marks (16/16); no significant issues detected.
16 / 16
100%
Human Usability
Strong score (7/8); minor gaps noted.
7 / 8
88%
Security
Full marks (12/12); no significant issues detected.
12 / 12
100%
Maintainability
Full marks (12/12); no significant issues detected.
12 / 12
100%
Agent-Specific
Strong critical posture enforced; description well-targeted but could be shortened and add natural trigger phrases
17 / 20
85%
Core Capability Total93 / 100
Medical TaskExecution Average: 87 / 100 — Assertions: 25/25 Passed
88
Canonical
Baseline CRP as predictor of sepsis mortality — what to adjust for
5/5 ✓
87
Variant A
Retrospective EHR cohort — identify bias and confounders before protocol finalization
5/5 ✓
87
Variant B
Case-control study of smoking and lupus — which variables to match on
5/5 ✓
87
Edge
Mixed variable list including post-treatment response — role classification challenge
5/5 ✓
86
Stress
Pressure-test a complex observational protocol with propensity score plan already drafted
5/5 ✓
88
Canonical✅ Pass
Baseline CRP as predictor of sepsis mortality — what to adjust for
5/5 assertions passed.
Basic 35/40|Specialized 53/60|Total 88/100
✅A1Variable role map produced with explicit confounder/mediator/collider classifications
✅A2Time order established before any adjustment recommendation
✅A3Post-baseline variables excluded from baseline adjustment set with warning
✅A4Minimum sufficient control set defined with reasoning, not just variable list
✅A5Residual confounding acknowledged as non-removable in Section J
Pass rate: 5 / 5
87
Variant A✅ Pass
Retrospective EHR cohort — identify bias and confounders before protocol finalization
5/5 assertions passed.
Basic 35/40|Specialized 52/60|Total 87/100
✅A1Immortal time bias risk assessed for EHR time-zero structure
✅A2Selection bias and confounding by indication both identified
✅A3Control strategy recommendation justified for the specific design context
✅A4Critical weak points section present with specific protocol revision recommendation
✅A5No fabricated variable availability or dataset fields assumed
Pass rate: 5 / 5
87
Variant B✅ Pass
Case-control study of smoking and lupus — which variables to match on
5/5 assertions passed.
Basic 35/40|Specialized 52/60|Total 87/100
✅A1Variables on causal pathway explicitly excluded from matching recommendation
✅A2Overmatching risk and analytic consequences stated for each matching candidate
✅A3Alternative to matching (restriction or multivariable adjustment) presented as comparison
✅A4Collider bias risk checked for proposed matching variables
✅A5No recommendation to 'adjust for everything available'
Pass rate: 5 / 5
87
Edge✅ Pass
Mixed variable list including post-treatment response — role classification challenge
5/5 assertions passed.
Basic 35/40|Specialized 52/60|Total 87/100
✅A1Post-treatment variables correctly classified as post-baseline and excluded from baseline adjustment
✅A2Role-uncertain variables labeled rather than forced into a false classification
✅A3Mediator vs confounder boundary explicitly addressed for ambiguous variables
✅A4Prediction variables correctly distinguished from confounders
✅A5Caution against statistical complexity masking bias explicitly stated
Pass rate: 5 / 5
86
Stress✅ Pass
Pressure-test a complex observational protocol with propensity score plan already drafted
5/5 assertions passed.
Basic 35/40|Specialized 51/60|Total 86/100
✅A1Propensity score plan reviewed with explicit justification check (design, data quality, covariate set)
✅A2Critical review of proposed propensity model identifies whether covariate set is appropriate
✅A3Residual bias after propensity score method acknowledged
✅A4Bias mitigation actions provided per major risk identified
✅A5Practical next step section provides most useful immediate action
Pass rate: 5 / 5
Medical Task Total87 / 100
Key Strengths
- Variable role classification before any adjustment recommendation is a methodologically sound and rare discipline in AI-assisted protocol review
- Explicit collider bias detection and prohibition of mediator adjustment are strong differentiators from generic statistics advice
- Hard Rules effectively prevent the most common critical errors: adjust-for-everything, post-baseline variables in baseline set, propensity score by default
- Eight reference modules with precise step-level mapping provide comprehensive bias-sensing coverage