Other
multi-panel-figure-assembler
82100Total Score
Core Capability
85 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
15 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
15 / 20
Medical Task
20 / 20 Passed
82Assemble 6 PNG panels into 2x3 composite at 300 DPI
4/4
82Assemble 6 panels in 3x2 layout at 600 DPI with custom label size
4/4
83Only 4 of 6 panels provided
4/4
83Input panel path contains ../ traversal
4/4
82Request to generate plots from data instead of assembling existing images
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSCore Capability85 / 100 — 8 Categories
Functional Suitability
2x3 and 3x2 layouts documented; DPI, label, padding, border all configurable; 6-panel scope design note added; future --panels parameter mentioned
11 / 12
92%
Reliability
Fallback template documented with detailed fields; error handling explicit including panel count mismatch; PIL dependency install instruction in Error Handling
10 / 12
83%
Performance & Context
SKILL.md is 133 lines — lean; Pillow and numpy are reasonable deps for image processing
7 / 8
88%
Agent Usability
Workflow clear; input validation enforced as Step 1 hard gate with explicit no-partial-processing instruction; matplotlib/seaborn/ggplot2 alternative tool suggestions added
15 / 16
94%
Human Usability
Description is natural and discoverable; forgiveness good via auto-resize of panels
7 / 8
88%
Security
Path traversal check explicitly documented; no hardcoded secrets; no injection vectors
10 / 12
83%
Maintainability
Script 400 lines with clear class structure; SKILL.md well-separated; testability good with example command
10 / 12
83%
Agent-Specific
Trigger precision good; escape hatches documented; input validation now a hard gate with no-partial-processing instruction; 6-panel scope design note improves composability clarity
15 / 20
75%
Core Capability Total85 / 100
Medical TaskExecution Average: 80.4 / 100 — Assertions: 20/20 Passed
82
Canonical
Assemble 6 PNG panels into 2x3 composite at 300 DPI
4/4 ✓
82
Variant A
Assemble 6 panels in 3x2 layout at 600 DPI with custom label size
4/4 ✓
83
Edge
Only 4 of 6 panels provided
4/4 ✓
83
Variant B
Input panel path contains ../ traversal
4/4 ✓
82
Stress
Request to generate plots from data instead of assembling existing images
4/4 ✓
82
Canonical✅ Pass
Assemble 6 PNG panels into 2x3 composite at 300 DPI
Script requires PIL; evaluated via Mode A. Logic is correct based on code review. PIL dependency check added to Quick Check.
Basic 32/40|Specialized 50/60|Total 82/100
✅A1Output composite image has 2x3 grid layout with panels A-F
✅A2Output DPI matches specified 300 DPI
✅A3Output does not fabricate panel content
✅A4Output stays within figure assembly scope
Pass rate: 4 / 4
82
Variant A✅ Pass
Assemble 6 panels in 3x2 layout at 600 DPI with custom label size
3x2 layout and 600 DPI correctly applied; label size parameter correctly passed.
Basic 32/40|Specialized 50/60|Total 82/100
✅A1Output uses 3x2 grid layout
✅A2Output DPI matches specified 600 DPI
✅A3Output label size matches --label-size 32
✅A4Output does not fabricate panel content
Pass rate: 4 / 4
83
Edge✅ Pass
Only 4 of 6 panels provided
Panel count mismatch correctly detected and reported with exact count.
Basic 33/40|Specialized 50/60|Total 83/100
✅A1Output reports panel count mismatch (4 provided, 6 required)
✅A2Output uses the documented fallback template structure
✅A3Output does not attempt assembly with incomplete panels
✅A4Output provides next-step guidance
Pass rate: 4 / 4
83
Variant B✅ Pass
Input panel path contains ../ traversal
Path traversal correctly detected and rejected with warning.
Basic 33/40|Specialized 50/60|Total 83/100
✅A1Output rejects path containing ../ with path traversal warning
✅A2Output does not process the traversal path
✅A3Output provides a clear error message
✅A4Output stays within scope
Pass rate: 4 / 4
82
Stress✅ Pass
Request to generate plots from data instead of assembling existing images
Out-of-scope request correctly declined with verbatim redirect message and alternative tool suggestions before any processing. Hard gate fully enforced.
Basic 32/40|Specialized 50/60|Total 82/100
✅A1Output declines plot generation request as out of scope
✅A2Output provides the documented redirect message
✅A3Output suggests appropriate alternative tool for plot generation (matplotlib, seaborn, ggplot2)
✅A4Output does not generate any data context before refusal
Pass rate: 4 / 4
Medical Task Total80.4 / 100
Key Strengths
- Most detailed fallback template in this collection — includes Assumptions, Constraints, Risks, and Unresolved fields
- Input validation hard gate fully enforced — no processing before refusal fires
- Panel count mismatch detection prevents silent assembly failures
- Path traversal protection explicitly documented and enforced