Other

cold-chain-risk-calculator

Calculate temperature excursion risks for cold chain transport. Assesses route risk, packaging suitability, and monitoring requirements for biological samples and pharmaceuticals requiring controlled-temperature shipping.

78100Total Score
Core Capability
79 / 100
Functional Suitability
10 / 12
Reliability
9 / 12
Performance & Context
6 / 8
Agent Usability
13 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
14 / 20
Medical Task
11 / 12 Passed
80NYC-Boston 48h dry-ice shipment
4/4
76LAX-London 120h liquid-nitrogen shipment
4/4
74Negative duration input (-5 hours)
3/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS

Core Capability79 / 1008 Categories

Functional Suitability
JSON output schema fully specified with mitigation_recommendations always present; --output flag documented; model limitations explicitly stated including international route counterintuitive scoring.
10 / 12
83%
Reliability
Duration <= 0 validation documented with exit code 1; invalid packaging type rejection documented; script stub still accepts negative values after four rounds.
9 / 12
75%
Performance & Context
SKILL.md 128 lines — lean; no external dependencies.
6 / 8
75%
Agent Usability
Fallback template present; JSON output schema documented; error handling section comprehensive; response template clear.
13 / 16
81%
Human Usability
Description is precise; scope boundary clear; model limitations documented.
7 / 8
88%
Security
No credential concerns; route string passed to output but no code execution risk; no injection vectors.
10 / 12
83%
Maintainability
SKILL.md documents full implementation contract; model formula documented with limitations.
10 / 12
83%
Agent-Specific
Trigger description is precise; JSON output schema documented; --output flag present; escape hatches present; no progressive disclosure via references/.
14 / 20
70%
Core Capability Total79 / 100

Medical TaskExecution Average: 76.7 / 100 — Assertions: 11/12 Passed

80
Canonical
NYC-Boston 48h dry-ice shipment
4/4
76
Variant A
LAX-London 120h liquid-nitrogen shipment
4/4
74
Edge
Negative duration input (-5 hours)
3/4
80
Canonical✅ Pass
NYC-Boston 48h dry-ice shipment

Risk score 19.20, Medium risk. SKILL.md specifies JSON output with mitigation_recommendations field always present. Script still outputs plain text but contract is documented.

Basic 32/40|Specialized 48/60|Total 80/100
A1Output includes a numeric risk score
A2Output includes a risk level classification
A3Output does not fabricate route-specific data
A4Output stays within cold-chain scope
Pass rate: 4 / 4
76
Variant A✅ Pass
LAX-London 120h liquid-nitrogen shipment

Risk score 36.00, High risk. Model limitations note reinforced: 120h international flight may score higher than 48h domestic route with dry-ice due to packaging factor.

Basic 30/40|Specialized 46/60|Total 76/100
A1Output includes a numeric risk score
A2Output includes a risk level classification
A3Risk model accounts for packaging type
A4Output includes mitigation recommendations per JSON schema
Pass rate: 4 / 4
74
Edge✅ Pass
Negative duration input (-5 hours)

SKILL.md documents: if --duration <= 0, print error to stderr and exit code 1. Script implementation still accepts negative values and produces nonsensical negative risk scores after four rounds.

Basic 22/40|Specialized 52/60|Total 74/100
A1Invalid duration input is documented to be rejected with error message
A2Output does not crash on edge input
A3Output stays within cold-chain scope
A4Script actually enforces duration > 0 at runtime
Pass rate: 3 / 4
Medical Task Total76.7 / 100

Key Strengths

  • SKILL.md fully specifies JSON output contract including mitigation_recommendations field always present with at least one item
  • Input validation for negative/zero duration and invalid packaging type documented with exact error messages
  • Model limitations explicitly documented including counterintuitive international route scoring
  • Scope boundary clearly rejects real-time tracking, drug dosing, and non-temperature logistics