Other

lab-inventory-predictor

Predict depletion time of critical lab reagents based on historical usage frequency, and automatically generate purchase alerts when stock falls below safety thresholds.

83100Total Score

Core Capability

85 / 100

Functional Suitability

11 / 12

Reliability

10 / 12

Performance & Context

7 / 8

Agent Usability

14 / 16

Human Usability

7 / 8

Security

10 / 12

Maintainability

10 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

83Add reagent and record usage, then check status

4/4

83Generate purchase alerts for multiple reagents near threshold

4/4

80Reagent with zero usage history — depletion prediction requested

4/4

82Generate full inventory report in JSON format

4/4

78Request to predict depletion for 20 reagents with irregular usage patterns

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Core Capability85 / 100 — 8 Categories

Functional Suitability

All five core capabilities documented; LOW_CONFIDENCE flag specified; prediction algorithm clearly stated; per-reagent inline risk note mandated in Response Template

11 / 12

92%

Reliability

Fallback behavior documented; LOW_CONFIDENCE flag added for fewer than 3 usage records; path traversal rejection in Error Handling; per-reagent inline risk note mandated

10 / 12

83%

Performance & Context

No external deps is efficient; SKILL.md is 199 lines — lean

7 / 8

88%

Agent Usability

Workflow steps clear; response template well-defined; LOW_CONFIDENCE flag guidance added; per-reagent inline risk note mandated in Response Template

14 / 16

88%

Human Usability

Description is natural and discoverable; forgiveness good via fallback template

7 / 8

88%

Security

Path traversal rejection explicitly documented in Error Handling for --data-file; no hardcoded secrets; no injection vectors

10 / 12

83%

Maintainability

Script 565 lines with clear class structure; SKILL.md well-separated; Python 3.8+ requirement prominently stated with upgrade instructions

10 / 12

83%

Agent-Specific

Trigger precision good; progressive disclosure present; escape hatches documented; LOW_CONFIDENCE flag closes idempotency concern on sparse data; per-reagent inline risk note mandated

16 / 20

80%

Core Capability Total85 / 100

Medical TaskExecution Average: 81.2 / 100 — Assertions: 20/20 Passed

Canonical

Add reagent and record usage, then check status

4/4 ✓

Variant A

Generate purchase alerts for multiple reagents near threshold

4/4 ✓

Edge

Reagent with zero usage history — depletion prediction requested

4/4 ✓

Variant B

Generate full inventory report in JSON format

4/4 ✓

Stress

Request to predict depletion for 20 reagents with irregular usage patterns

4/4 ✓

Canonical✅ Pass

Add reagent and record usage, then check status

Script requires Python 3.8+ (dataclasses); evaluated via Mode A. Python version requirement prominently documented. All output fields present.

Basic 33/40|Specialized 50/60|Total 83/100

✅A1Output includes reagent name, current stock, and predicted depletion date

✅A2Output separates assumptions from deliverables

✅A3Output does not fabricate inventory data

✅A4Output stays within lab inventory scope

Pass rate: 4 / 4

Variant A✅ Pass

Generate purchase alerts for multiple reagents near threshold

Alert logic correctly applies both time-based and stock-based triggers per documented algorithm.

Basic 33/40|Specialized 50/60|Total 83/100

✅A1Output lists reagents triggering alerts with reason (time-based or stock-based)

✅A2Output includes safety_days and lead_time_days in alert rationale

✅A3Output does not recommend purchasing reagents not near threshold

✅A4Output includes next-step checks

Pass rate: 4 / 4

Edge✅ Pass

Reagent with zero usage history — depletion prediction requested

Division-by-zero risk when daily_consumption=0; skill correctly falls back and states assumption cannot be made. Fallback structure complete.

Basic 32/40|Specialized 48/60|Total 80/100

✅A1Output explicitly states that depletion cannot be predicted without usage history

✅A2Output does not fabricate a consumption rate

✅A3Output provides a next-step recommendation (record at least one usage event)

✅A4Output uses the documented fallback structure

Pass rate: 4 / 4

Variant B✅ Pass

Generate full inventory report in JSON format

Report action with --format json produces structured output correctly.

Basic 33/40|Specialized 49/60|Total 82/100

✅A1Output is valid JSON when --format json is specified

✅A2Output includes all reagents with stock, consumption rate, and depletion date

✅A3Output does not include fabricated data

✅A4Output scope stays within inventory reporting

Pass rate: 4 / 4

Stress✅ Pass

Request to predict depletion for 20 reagents with irregular usage patterns

LOW_CONFIDENCE flag documented and emitted for reagents with fewer than 3 usage records. Per-reagent inline risk note mandated in Response Template.

Basic 32/40|Specialized 46/60|Total 78/100

✅A1Output covers all 20 reagents without truncation

✅A2Output flags reagents with fewer than 3 usage records as LOW_CONFIDENCE predictions

✅A3Output does not fabricate usage data for reagents with no records

✅A4Per-reagent inline risk note is emitted adjacent to each LOW_CONFIDENCE prediction

Pass rate: 4 / 4

Medical Task Total81.2 / 100

Key Strengths

Comprehensive prediction algorithm with both time-based and stock-based alert triggers clearly documented
LOW_CONFIDENCE flag for sparse usage data with per-reagent inline risk note mandated in Response Template
Strong fallback behavior with explicit error reporting and manual recovery path
No external dependencies makes the skill highly portable and stable