Other

adaptyv

90100Total Score

Core Capability

83 / 100

Functional Suitability

11 / 12

Reliability

10 / 12

Performance & Context

8 / 8

Agent Usability

13 / 16

Human Usability

7 / 8

Security

9 / 12

Maintainability

9 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

99Cloud laboratory platform for automated protein testing and validation

4/4

95Cloud laboratory platform for automated protein testing and validation

4/4

93Cloud laboratory platform for automated protein testing and validation

4/4

93Packaged executable path(s): scripts/validate_skill.py

4/4

93End-to-end case for Scope-focused workflow aligned to: Cloud laboratory platform for automated protein testing and validation; use when you have designed protein sequences and need wet-lab experimental validation (e.g., binding, expression, thermostability, enzyme activity) and API-based submission/status/result retrieval

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Core Capability83 / 100 — 8 Categories

Functional Suitability

Related legacy finding for adaptyv: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

11 / 12

92%

Reliability

Related legacy finding for adaptyv: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

10 / 12

83%

Performance & Context

Performance context reached full score in the archived evaluation.

8 / 8

100%

Agent Usability

A modest deduction remained in agent usability for adaptyv in the archived review.

13 / 16

81%

Human Usability

The legacy audit deducted points for adaptyv in human usability.

7 / 8

88%

Security

A modest deduction remained in security for adaptyv in the archived review.

9 / 12

75%

Maintainability

The legacy audit deducted points for adaptyv in maintainability.

9 / 12

75%

Agent-Specific

Related legacy finding for adaptyv: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

16 / 20

80%

Core Capability Total83 / 100

Medical TaskExecution Average: 94.6 / 100 — Assertions: 20/20 Passed

Canonical

Cloud laboratory platform for automated protein testing and validation

4/4 ✓

Variant A

Cloud laboratory platform for automated protein testing and validation

4/4 ✓

Edge

Cloud laboratory platform for automated protein testing and validation

4/4 ✓

Variant B

Packaged executable path(s): scripts/validate_skill.py

4/4 ✓

Stress

End-to-end case for Scope-focused workflow aligned to: Cloud laboratory platform for automated protein testing and validation; use when you have designed protein sequences and need wet-lab experimental validation (e.g., binding, expression, thermostability, enzyme activity) and API-based submission/status/result retrieval

4/4 ✓

Canonical✅ Pass

Cloud laboratory platform for automated protein testing and validation

The Cloud laboratory platform for automated protein testing and validation path verified the packaged helper command without exposing a deeper execution issue.

Basic 38/40|Specialized 60/60|Total 99/100

✅A1The adaptyv output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant A✅ Pass

Cloud laboratory platform for automated protein testing and validation

For Cloud laboratory platform for automated protein testing and validation, the preserved evidence is lightweight but positive: the packaged validation command behaved as expected.

Basic 36/40|Specialized 59/60|Total 95/100

✅A1The adaptyv output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Edge✅ Pass

Cloud laboratory platform for automated protein testing and validation

The archived run for Cloud laboratory platform for automated protein testing and validation confirmed the helper entrypoint and left the workflow in a stable state.

Basic 35/40|Specialized 58/60|Total 93/100

✅A1The adaptyv output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Variant B✅ Pass

Packaged executable path(s): scripts/validate_skill.py

For Packaged executable path(s): scripts/validate_skill.py, the preserved evidence is lightweight but positive: the packaged validation command behaved as expected.

Basic 34/40|Specialized 59/60|Total 93/100

✅A1The adaptyv output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Stress✅ Pass

The Cloud laboratory platform for automated protein testing and validation path verified the packaged helper command without exposing a deeper execution issue.

Basic 31/40|Specialized 60/60|Total 93/100

✅A1The adaptyv output structure matches the documented deliverable

✅A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 4 / 4

Medical Task Total94.6 / 100

Key Strengths

Primary routing is Other with execution mode B
Static quality score is 83/100 and dynamic average is 81.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: Script verification 1/1; adjustment=5. validate_skill.py: OK