Evidence Insight

gget

85100Total Score

Core Capability

85 / 100

Functional Suitability

11 / 12

Reliability

9 / 12

Performance & Context

8 / 8

Agent Usability

14 / 16

Human Usability

8 / 8

Security

9 / 12

Maintainability

9 / 12

Agent-Specific

17 / 20

Medical Task

15 / 20 Passed

88You need to search genes/proteins by keyword and species across common databases (e.g., Ensembl/UniProt/NCBI)

3/4

85You want to fetch detailed metadata for one or many Ensembl/UniProt/NCBI identifiers

3/4

85Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

3/4

85Gene/protein search and identifier resolution across multiple databases

3/4

85End-to-end case for Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

3/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	The archived evaluation kept the skill tied to retrieved records or indexed source material rather than invented scientific claims.
Practice Boundaries	PASS	The package stayed in retrieval, extraction, or evidence-organization scope rather than drifting into unsupported interpretation.
Methodological Ground	PASS	No methodological-grounding issue was recorded for gget in the archived evaluation.
Code Usability	PASS	The legacy evaluation did not preserve a usability failure in the packaged retrieval path.

Core Capability85 / 100 — 8 Categories

Functional Suitability

Related legacy finding for gget: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

11 / 12

92%

Reliability

The archived deduction in reliability traces back to: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency

9 / 12

75%

Performance & Context

No point loss was recorded for performance context in the legacy audit.

8 / 8

100%

Agent Usability

The legacy audit deducted points for gget in agent usability.

14 / 16

88%

Human Usability

Human usability reached full score in the archived evaluation.

8 / 8

100%

Security

The archived evaluation left some headroom for gget under security.

9 / 12

75%

Maintainability

The legacy audit deducted points for gget in maintainability.

9 / 12

75%

Agent-Specific

Related legacy finding for gget: Stabilize executable path and fallback behavior. Some inputs only reached PARTIAL due to execution gaps or weak boundary handling

17 / 20

85%

Core Capability Total85 / 100

Medical TaskExecution Average: 85.6 / 100 — Assertions: 15/20 Passed

Canonical

You need to search genes/proteins by keyword and species across common databases (e.g., Ensembl/UniProt/NCBI)

3/4 ✓

Variant A

You want to fetch detailed metadata for one or many Ensembl/UniProt/NCBI identifiers

3/4 ✓

Edge

Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

3/4 ✓

Variant B

Gene/protein search and identifier resolution across multiple databases

3/4 ✓

Stress

End-to-end case for Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

3/4 ✓

Canonical✅ Pass

You need to search genes/proteins by keyword and species across common databases (e.g., Ensembl/UniProt/NCBI)

The workflow for You need to search genes/proteins by keyword and species across... is present, though the archived execution was cut short by a timeout.

Basic 33/40|Specialized 55/60|Total 88/100

✅A1The gget output structure matches the documented deliverable

❌A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 3 / 4

Variant A✅ Pass

You want to fetch detailed metadata for one or many Ensembl/UniProt/NCBI identifiers

The packaged path for You want to fetch detailed metadata for one or many... remained intelligible, but the observed run timed out before completion.

Basic 31/40|Specialized 54/60|Total 85/100

✅A1The gget output structure matches the documented deliverable

❌A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 3 / 4

Edge✅ Pass

Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

The Unified wrapper (scripts/wrapper.py) exposing multiple gget... path is defined clearly, but this run was interrupted by a timeout.

Basic 30/40|Specialized 55/60|Total 85/100

✅A1The gget output structure matches the documented deliverable

❌A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 3 / 4

Variant B✅ Pass

Gene/protein search and identifier resolution across multiple databases

The Gene/protein search and identifier resolution across multiple databases path is defined clearly, but this run was interrupted by a timeout.

Basic 29/40|Specialized 56/60|Total 85/100

✅A1The gget output structure matches the documented deliverable

❌A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 3 / 4

Stress✅ Pass

End-to-end case for Unified wrapper (scripts/wrapper.py) exposing multiple gget subcommands through a consistent interface

The workflow for End-to-end case for Unified wrapper (scripts/wrapper.py) exposing... is present, though the archived execution was cut short by a timeout.

Basic 26/40|Specialized 59/60|Total 85/100

✅A1The gget output structure matches the documented deliverable

❌A2The script execution path completed successfully for the documented case

✅A3The output stays fully within the documented skill boundary

✅A4The response quality is acceptable for the documented path

Pass rate: 3 / 4

Medical Task Total85.6 / 100

Key Strengths

Primary routing is Evidence Insight with execution mode B
Static quality score is 85/100 and dynamic average is 73.6/100
Assertions and command execution outcomes are recorded per input for human review
Execution verification summary: Script verification 1/1; adjustment=5. wrapper.py: OK