Data Analysis

meta-sensitivity-plot

86100Total Score
Core Capability
79 / 100
Functional Suitability
10 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
12 / 16
Human Usability
6 / 8
Security
9 / 12
Maintainability
9 / 12
Agent-Specific
15 / 20
Medical Task
20 / 20 Passed
95"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."
4/4
91"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."
4/4
89Step 1: Validate input
4/4
89Step 3: Output
4/4
89Step 3: Output
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo scientific-integrity problem was surfaced because the package did not claim more than the available records, article text, or script evidence supported.
Practice BoundariesPASSThe evaluated outputs stayed inside the "Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file... and did not drift into unsupported interpretation beyond the available inputs.
Methodological GroundPASSThe archived evaluation treated the workflow as method-linked rather than ad hoc.
Code UsabilityPASSThe legacy audit did not record a code-usability failure in the packaged analysis path.

Core Capability79 / 1008 Categories

Functional Suitability
Functional suitability was softened by the legacy issue 'Improve stress-case output rigor'. Stress and boundary scenarios show weaker consistency
10 / 12
83%
Reliability
Related legacy finding for meta-sensitivity-plot: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
10 / 12
83%
Performance & Context
No point loss was recorded for performance context in the legacy audit.
8 / 8
100%
Agent Usability
The packaged analysis path is understandable, though the archived score suggests slightly clearer routing would help.
12 / 16
75%
Human Usability
The archived score suggests the output contract could be a little easier for users to inspect or reuse.
6 / 8
75%
Security
Security remained strong, though the archived review still left some room for clearer execution guardrails.
9 / 12
75%
Maintainability
The analysis package is maintainable overall, though the archived score suggests modest cleanup headroom.
9 / 12
75%
Agent-Specific
The archived deduction in agent specific traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
15 / 20
75%
Core Capability Total79 / 100

Medical TaskExecution Average: 90.6 / 100 — Assertions: 20/20 Passed

95
Canonical
"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."
4/4
91
Variant A
"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."
4/4
89
Edge
Step 1: Validate input
4/4
89
Variant B
Step 3: Output
4/4
89
Stress
Step 3: Output
4/4
95
Canonical✅ Pass
"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."

"Generate leave-one-out sensitivity analysis plots for... remained tied to the documented analysis contract even when the preserved evidence centered on instructions instead of a full rerun.

Basic 35/40|Specialized 60/60|Total 95/100
A1The meta-sensitivity-plot output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
91
Variant A✅ Pass
"Generate leave-one-out sensitivity analysis plots for meta-analysis. Input is a CSV file containing meta-analysis data; outputs are a sensitivity forest plot (PNG) and a sensitivity data table (CSV) showing pooled effect estimates after excluding each study in turn."

The archived run treated "Generate leave-one-out sensitivity analysis plots for... as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 33/40|Specialized 58/60|Total 91/100
A1The meta-sensitivity-plot output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
89
Edge✅ Pass
Step 1: Validate input

The archived run treated Step 1: Validate input as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 32/40|Specialized 57/60|Total 89/100
A1The meta-sensitivity-plot output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
89
Variant B✅ Pass
Step 3: Output

The archived run treated Step 3: Output as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 31/40|Specialized 58/60|Total 89/100
A1The meta-sensitivity-plot output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
89
Stress✅ Pass
Step 3: Output

The archived run treated Step 3: Output as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 28/40|Specialized 60/60|Total 89/100
A1The meta-sensitivity-plot output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
Medical Task Total90.6 / 100

Key Strengths

  • Primary routing is Data Analysis with execution mode B
  • Static quality score is 79/100 and dynamic average is 77.6/100
  • Assertions and command execution outcomes are recorded per input for human review
  • Execution verification summary: Script verification 1/1; adjustment=5. sensitivity_analysis.py: OK