Other

medication-reconciliation

Compare patient pre-admission medication lists with inpatient orders to automatically identify omitted or duplicated medications and improve medication safety.

89100Total Score
Core Capability
90 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
15 / 20
Medical Task
20 / 20 Passed
89Reconcile pre-admission list against inpatient orders using --example
4/4
87Patient with missing critical anticoagulant in inpatient orders
4/4
86Patient with duplicate medication (same drug, same dose, different brand names)
4/4
85Patient with dose change (same drug, different dose — Metformin 500mg vs 1000mg)
4/4
86Malformed JSON input file (missing required fields)
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS

Core Capability90 / 1008 Categories

Functional Suitability
Dose-change detection now documented in workflow (step 6) and output format with JSON example. All six output categories covered: continued, dose_changed, discontinued, new_medications, duplicates, warnings.
12 / 12
100%
Reliability
Comprehensive error handling retained. PHI check step added as step 1 with explicit user confirmation prompt before processing.
11 / 12
92%
Performance & Context
SKILL.md is 126 lines — concise. Minor: example data generation still creates files on disk as side effect of --example flag.
7 / 8
88%
Agent Usability
Workflow steps are clear and clinically logical. PHI check and dose-change detection steps are now explicit. Output template well-defined. Minor: --verbose flag behavior still not fully documented.
15 / 16
94%
Human Usability
Description is natural and discoverable. HIPAA compliance reminder is appropriate and now enforced via workflow step.
7 / 8
88%
Security
PHI check step now explicitly prompts user to confirm de-identification before processing. Medical disclaimer and HIPAA reminder present. No hardcoded secrets.
12 / 12
100%
Maintainability
Clean class separation retained. Dose-change detection documented in SKILL.md. Drug synonym mapping externalized as class constant.
11 / 12
92%
Agent-Specific
Trigger precision good. Escape hatches present. Idempotent by design. Composability still limited — no structured API mode. Critical drug class list still hardcoded.
15 / 20
75%
Core Capability Total90 / 100

Medical TaskExecution Average: 88.4 / 100 — Assertions: 20/20 Passed

89
Canonical
Reconcile pre-admission list against inpatient orders using --example
4/4
87
Variant A
Patient with missing critical anticoagulant in inpatient orders
4/4
86
Edge
Patient with duplicate medication (same drug, same dose, different brand names)
4/4
85
Variant B
Patient with dose change (same drug, different dose — Metformin 500mg vs 1000mg)
4/4
86
Stress
Malformed JSON input file (missing required fields)
4/4
89
Canonical✅ Pass
Reconcile pre-admission list against inpatient orders using --example

PHI check step now prompts for de-identification confirmation before processing. Example data runs correctly. Atorvastatin/Lipitor synonym match works. Report structure complete.

Basic 37/40|Specialized 52/60|Total 89/100
A1Output report contains continued, discontinued, new_medications, and duplicates sections
A2Drug synonym matching correctly identifies Atorvastatin/Lipitor as the same drug
A3PHI check step prompts user to confirm de-identification before processing
A4Medical disclaimer present in SKILL.md and output
Pass rate: 4 / 4
87
Variant A✅ Pass
Patient with missing critical anticoagulant in inpatient orders

Critical drug class detection correctly fires for anticoagulant. Warning level set to 'critical'. Recommendation generated for physician review.

Basic 36/40|Specialized 51/60|Total 87/100
A1Critical warning generated for missing anticoagulant
A2Warning level correctly set to 'critical' (not 'info')
A3Recommendation includes physician review suggestion
A4Output does not prescribe or recommend specific medications
Pass rate: 4 / 4
86
Edge✅ Pass
Patient with duplicate medication (same drug, same dose, different brand names)

Duplicate detection correctly identifies same generic name + same dose as duplicate. Warning generated.

Basic 36/40|Specialized 50/60|Total 86/100
A1Duplicate medication correctly identified
A2Duplicate warning generated with both drug names
A3Duplicate count reflected in summary
A4Output does not make clinical decision about duplicate
Pass rate: 4 / 4
85
Variant B✅ Pass
Patient with dose change (same drug, different dose — Metformin 500mg vs 1000mg)

Dose-change detection now documented in workflow step 6 and output format. Metformin 500mg vs 1000mg correctly flagged as dose_changed with physician verification warning.

Basic 35/40|Specialized 50/60|Total 85/100
A1Dose change between pre-admission and inpatient order is detected and flagged as dose_changed
A2Output correctly identifies the drug as present in both lists
A3Dose-change warning includes physician verification message
A4Output does not make clinical judgment about dose change
Pass rate: 4 / 4
86
Stress✅ Pass
Malformed JSON input file (missing required fields)

JSONDecodeError caught and reported clearly. Script exits with error code 1. No crash or silent failure.

Basic 36/40|Specialized 50/60|Total 86/100
A1Script does not crash on malformed JSON input
A2Error message clearly identifies the JSON parsing failure
A3Script exits with non-zero exit code on error
A4No partial output written on error
Pass rate: 4 / 4
Medical Task Total88.4 / 100

Key Strengths

  • PHI check step now enforced as step 1 in workflow — mandatory de-identification confirmation before any patient data is processed
  • Dose-change detection fully documented in workflow and output format with JSON example, closing the clinically significant gap from v1
  • Drug synonym mapping (brand/generic) enables robust matching across naming conventions
  • Critical drug class detection with tiered warning levels (critical/warning/info) is clinically meaningful
  • Comprehensive error handling with clear exit codes and actionable error messages