Other

cold-chain-risk-calculator

Calculate temperature excursion risks for cold chain transport. Assesses route risk, packaging suitability, and monitoring requirements for biological samples and pharmaceuticals requiring controlled-temperature shipping.

78100Total Score

Core Capability

79 / 100

Functional Suitability

10 / 12

Reliability

9 / 12

Performance & Context

6 / 8

Agent Usability

13 / 16

Human Usability

7 / 8

Security

10 / 12

Maintainability

10 / 12

Agent-Specific

14 / 20

Medical Task

11 / 12 Passed

80NYC-Boston 48h dry-ice shipment

4/4

76LAX-London 120h liquid-nitrogen shipment

4/4

74Negative duration input (-5 hours)

3/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Core Capability79 / 100 — 8 Categories

Functional Suitability

JSON output schema fully specified with mitigation_recommendations always present; --output flag documented; model limitations explicitly stated including international route counterintuitive scoring.

10 / 12

83%

Reliability

Duration <= 0 validation documented with exit code 1; invalid packaging type rejection documented; script stub still accepts negative values after four rounds.

9 / 12

75%

Performance & Context

SKILL.md 128 lines — lean; no external dependencies.

6 / 8

75%

Agent Usability

Fallback template present; JSON output schema documented; error handling section comprehensive; response template clear.

13 / 16

81%

Human Usability

Description is precise; scope boundary clear; model limitations documented.

7 / 8

88%

Security

No credential concerns; route string passed to output but no code execution risk; no injection vectors.

10 / 12

83%

Maintainability

SKILL.md documents full implementation contract; model formula documented with limitations.

10 / 12

83%

Agent-Specific

Trigger description is precise; JSON output schema documented; --output flag present; escape hatches present; no progressive disclosure via references/.

14 / 20

70%

Core Capability Total79 / 100

Medical TaskExecution Average: 76.7 / 100 — Assertions: 11/12 Passed

Canonical

NYC-Boston 48h dry-ice shipment

4/4 ✓

Variant A

LAX-London 120h liquid-nitrogen shipment

4/4 ✓

Edge

Negative duration input (-5 hours)

3/4 ✓

Canonical✅ Pass

NYC-Boston 48h dry-ice shipment

Risk score 19.20, Medium risk. SKILL.md specifies JSON output with mitigation_recommendations field always present. Script still outputs plain text but contract is documented.

Basic 32/40|Specialized 48/60|Total 80/100

✅A1Output includes a numeric risk score

✅A2Output includes a risk level classification

✅A3Output does not fabricate route-specific data

✅A4Output stays within cold-chain scope

Pass rate: 4 / 4

Variant A✅ Pass

LAX-London 120h liquid-nitrogen shipment

Risk score 36.00, High risk. Model limitations note reinforced: 120h international flight may score higher than 48h domestic route with dry-ice due to packaging factor.

Basic 30/40|Specialized 46/60|Total 76/100

✅A1Output includes a numeric risk score

✅A2Output includes a risk level classification

✅A3Risk model accounts for packaging type

✅A4Output includes mitigation recommendations per JSON schema

Pass rate: 4 / 4

Edge✅ Pass

Negative duration input (-5 hours)

SKILL.md documents: if --duration <= 0, print error to stderr and exit code 1. Script implementation still accepts negative values and produces nonsensical negative risk scores after four rounds.

Basic 22/40|Specialized 52/60|Total 74/100

✅A1Invalid duration input is documented to be rejected with error message

✅A2Output does not crash on edge input

✅A3Output stays within cold-chain scope

❌A4Script actually enforces duration > 0 at runtime

Pass rate: 3 / 4

Medical Task Total76.7 / 100

Key Strengths

SKILL.md fully specifies JSON output contract including mitigation_recommendations field always present with at least one item
Input validation for negative/zero duration and invalid packaging type documented with exact error messages
Model limitations explicitly documented including counterintuitive international route scoring
Scope boundary clearly rejects real-time tracking, drug dosing, and non-temperature logistics