Other

lab-inventory-predictor

Predict depletion time of critical lab reagents based on historical usage frequency, and automatically generate purchase alerts when stock falls below safety thresholds.

83100Total Score
Core Capability
85 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
83Add reagent and record usage, then check status
4/4
83Generate purchase alerts for multiple reagents near threshold
4/4
80Reagent with zero usage history — depletion prediction requested
4/4
82Generate full inventory report in JSON format
4/4
78Request to predict depletion for 20 reagents with irregular usage patterns
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS

Core Capability85 / 1008 Categories

Functional Suitability
All five core capabilities documented; LOW_CONFIDENCE flag specified; prediction algorithm clearly stated; per-reagent inline risk note mandated in Response Template
11 / 12
92%
Reliability
Fallback behavior documented; LOW_CONFIDENCE flag added for fewer than 3 usage records; path traversal rejection in Error Handling; per-reagent inline risk note mandated
10 / 12
83%
Performance & Context
No external deps is efficient; SKILL.md is 199 lines — lean
7 / 8
88%
Agent Usability
Workflow steps clear; response template well-defined; LOW_CONFIDENCE flag guidance added; per-reagent inline risk note mandated in Response Template
14 / 16
88%
Human Usability
Description is natural and discoverable; forgiveness good via fallback template
7 / 8
88%
Security
Path traversal rejection explicitly documented in Error Handling for --data-file; no hardcoded secrets; no injection vectors
10 / 12
83%
Maintainability
Script 565 lines with clear class structure; SKILL.md well-separated; Python 3.8+ requirement prominently stated with upgrade instructions
10 / 12
83%
Agent-Specific
Trigger precision good; progressive disclosure present; escape hatches documented; LOW_CONFIDENCE flag closes idempotency concern on sparse data; per-reagent inline risk note mandated
16 / 20
80%
Core Capability Total85 / 100

Medical TaskExecution Average: 81.2 / 100 — Assertions: 20/20 Passed

83
Canonical
Add reagent and record usage, then check status
4/4
83
Variant A
Generate purchase alerts for multiple reagents near threshold
4/4
80
Edge
Reagent with zero usage history — depletion prediction requested
4/4
82
Variant B
Generate full inventory report in JSON format
4/4
78
Stress
Request to predict depletion for 20 reagents with irregular usage patterns
4/4
83
Canonical✅ Pass
Add reagent and record usage, then check status

Script requires Python 3.8+ (dataclasses); evaluated via Mode A. Python version requirement prominently documented. All output fields present.

Basic 33/40|Specialized 50/60|Total 83/100
A1Output includes reagent name, current stock, and predicted depletion date
A2Output separates assumptions from deliverables
A3Output does not fabricate inventory data
A4Output stays within lab inventory scope
Pass rate: 4 / 4
83
Variant A✅ Pass
Generate purchase alerts for multiple reagents near threshold

Alert logic correctly applies both time-based and stock-based triggers per documented algorithm.

Basic 33/40|Specialized 50/60|Total 83/100
A1Output lists reagents triggering alerts with reason (time-based or stock-based)
A2Output includes safety_days and lead_time_days in alert rationale
A3Output does not recommend purchasing reagents not near threshold
A4Output includes next-step checks
Pass rate: 4 / 4
80
Edge✅ Pass
Reagent with zero usage history — depletion prediction requested

Division-by-zero risk when daily_consumption=0; skill correctly falls back and states assumption cannot be made. Fallback structure complete.

Basic 32/40|Specialized 48/60|Total 80/100
A1Output explicitly states that depletion cannot be predicted without usage history
A2Output does not fabricate a consumption rate
A3Output provides a next-step recommendation (record at least one usage event)
A4Output uses the documented fallback structure
Pass rate: 4 / 4
82
Variant B✅ Pass
Generate full inventory report in JSON format

Report action with --format json produces structured output correctly.

Basic 33/40|Specialized 49/60|Total 82/100
A1Output is valid JSON when --format json is specified
A2Output includes all reagents with stock, consumption rate, and depletion date
A3Output does not include fabricated data
A4Output scope stays within inventory reporting
Pass rate: 4 / 4
78
Stress✅ Pass
Request to predict depletion for 20 reagents with irregular usage patterns

LOW_CONFIDENCE flag documented and emitted for reagents with fewer than 3 usage records. Per-reagent inline risk note mandated in Response Template.

Basic 32/40|Specialized 46/60|Total 78/100
A1Output covers all 20 reagents without truncation
A2Output flags reagents with fewer than 3 usage records as LOW_CONFIDENCE predictions
A3Output does not fabricate usage data for reagents with no records
A4Per-reagent inline risk note is emitted adjacent to each LOW_CONFIDENCE prediction
Pass rate: 4 / 4
Medical Task Total81.2 / 100

Key Strengths

  • Comprehensive prediction algorithm with both time-based and stock-based alert triggers clearly documented
  • LOW_CONFIDENCE flag for sparse usage data with per-reagent inline risk note mandated in Response Template
  • Strong fallback behavior with explicit error reporting and manual recovery path
  • No external dependencies makes the skill highly portable and stable