Data Analysis

experimental-data-analysis

Statistical analysis and reporting for experimental datasets; use when you need to interpret experimental results, test significance (t-tests/ANOVA), or generate reproducible reports.

91100Total Score

Core Capability

88 / 100

Functional Suitability

11 / 12

Reliability

10 / 12

Performance & Context

8 / 8

Agent Usability

14 / 16

Human Usability

8 / 8

Security

10 / 12

Maintainability

10 / 12

Agent-Specific

17 / 20

Medical Task

20 / 20 Passed

98You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report)

4/4

94You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes

4/4

92Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

4/4

92Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors)

4/4

92End-to-end case for Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No scientific-integrity problem was surfaced because the package did not claim more than the available records, article text, or script evidence supported.
Practice Boundaries	PASS	The evaluated outputs stayed inside the Statistical analysis and reporting for experimental datasets and did not drift into unsupported interpretation beyond the available inputs.
Methodological Ground	PASS	Methodological grounding was preserved through the documented inputs, transformations, and expected artifacts.
Code Usability	PASS	The legacy audit did not record a code-usability failure in the packaged analysis path.

Core Capability88 / 100 — 8 Categories

Functional Suitability

The archived review left a small gap in how directly Statistical analysis and reporting for experimental datasets resolves into a finished analysis deliverable.

11 / 12

92%

Reliability

The legacy audit preserved a modest reliability gap around harder runs or more demanding inputs.

10 / 12

83%

Performance & Context

No point loss was recorded for performance context in the legacy audit.

8 / 8

100%

Agent Usability

The packaged analysis path is understandable, though the archived score suggests slightly clearer routing would help.

14 / 16

88%

Human Usability

No point loss was recorded for human usability in the legacy audit.

8 / 8

100%

Security

Security remained strong, though the archived review still left some room for clearer execution guardrails.

10 / 12

83%

Maintainability

The analysis package is maintainable overall, though the archived score suggests modest cleanup headroom.

10 / 12

83%

Agent-Specific

The package is strongly shaped for agent use, though the archived score still left a small gap in execution determinism.

17 / 20

85%

Core Capability Total88 / 100

Medical TaskExecution Average: 93.6 / 100 — Assertions: 20/20 Passed

Canonical

You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report)

4/4 ✓

Variant A

You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes

4/4 ✓

Edge

Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

4/4 ✓

Variant B

Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors)

4/4 ✓

Stress

End-to-end case for Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

4/4 ✓

Canonical✅ Pass

You have experimental results in CSV form and need a reproducible end-to-end analysis workflow (clean → test → report)

The You have experimental results in CSV form and need a reproducible... scenario completed within the documented Statistical analysis and reporting for experimental datasets boundary.

Basic 38/40|Specialized 60/60|Total 98/100

✅A1The experimental-data-analysis output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant A✅ Pass

You need to compare two conditions (independent or paired) and determine statistical significance with effect sizes

The archived evaluation treated You need to compare two conditions (independent or paired) and... as a clean in-scope run.

Basic 36/40|Specialized 58/60|Total 94/100

✅A1The experimental-data-analysis output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Edge✅ Pass

Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

The Reproducible, run-based execution that writes all artifacts into... scenario completed within the documented Statistical analysis and reporting for experimental datasets boundary.

Basic 35/40|Specialized 57/60|Total 92/100

✅A1The experimental-data-analysis output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant B✅ Pass

Data preparation guidance: missing values, outliers, and variable type identification (continuous/categorical; grouping factors)

Data preparation guidance: missing values, outliers, and variable... remained well-aligned with the documented contract in the preserved audit.

Basic 34/40|Specialized 58/60|Total 92/100

✅A1The experimental-data-analysis output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Stress✅ Pass

End-to-end case for Reproducible, run-based execution that writes all artifacts into outputs/runs/<timestamp>/

The End-to-end case for Reproducible, run-based execution that writes... scenario completed within the documented Statistical analysis and reporting for experimental datasets boundary.

Basic 31/40|Specialized 60/60|Total 92/100

✅A1The experimental-data-analysis output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Medical Task Total93.6 / 100

Key Strengths

Primary routing is Data Analysis with execution mode B
Static quality score is 88/100 and dynamic average is 82.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: Script verification 1/2; adjustment=3. analyze_experiment.py: rc=1; init_run.py: OK