Data Analysis

ppi-network-analysis

Construct and analyze protein-protein interaction networks by querying the STRING database for input gene lists. Inputs: DEG or candidate gene list. Outputs: PPI network graph, hub genes by degree/betweenness centrality, network topology statistics.

90100Total Score

Core Capability

94 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

15 / 16

Human Usability

8 / 8

Security

12 / 12

Maintainability

12 / 12

Agent-Specific

16 / 20

Medical Task

32 / 32 Passed

92Human bundled gene list

5/5

90Numeric species with styled plot

5/5

88Lower-bound threshold 400

5/5

84Plot-only regeneration

4/4

92High-option styled run

5/5

82Unsupported species rejection

4/4

84Invalid line_type rejection

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated biological or statistical claims were introduced across the audited outputs.
Practice Boundaries	PASS	Outputs remained technical and procedural; no diagnostic, prescriptive, or treatment advice was generated.
Methodological Ground	PASS	STRING-cache mapping, score filtering, graph construction, and plot-only reuse matched the documented workflow without methodological fallacies.
Code Usability	PASS	The CLI, bundled tests, and new regression checks all ran successfully after the fixes.

Core Capability94 / 100 — 8 Categories

Functional Suitability

Full score achieved.

12 / 12

100%

Reliability

Failure modes are clear and recoverable; species validation still occurs after gene-list parsing, but no incorrect side effects remain.

11 / 12

92%

Performance & Context

Full score achieved.

8 / 8

100%

Agent Usability

The execution path is clear and example-driven, with only minor front-loaded logging before one validation failure path.

15 / 16

94%

Human Usability

Full score achieved.

8 / 8

100%

Security

Full score achieved.

12 / 12

100%

Maintainability

Full score achieved.

12 / 12

100%

Agent-Specific

Trigger precision, layering, and escape hatches are strong; the skill remains slightly conservative in one validation order but behaves consistently.

16 / 20

80%

Core Capability Total94 / 100

Medical TaskExecution Average: 87.4 / 100 — Assertions: 32/32 Passed

Canonical

Human bundled gene list

5/5 ✓

Variant A

Numeric species with styled plot

5/5 ✓

Edge

Lower-bound threshold 400

5/5 ✓

Variant B

Plot-only regeneration

4/4 ✓

Stress

High-option styled run

5/5 ✓

Scope Boundary

Unsupported species rejection

4/4 ✓

Adversarial

Invalid line_type rejection

4/4 ✓

Canonical✅ Pass

Human bundled gene list

Executed cleanly and produced the full documented artifact set.

Basic 37/40|Specialized 55/60|Total 92/100

✅A1Output creates all documented artifacts

✅A2Summary metrics are internally consistent

✅A3Execution stays within the offline STRING workflow

✅A4Output remains inside the requested skill-local directory

✅A5Reproducibility controls are available for layout generation

Pass rate: 5 / 5

Variant A✅ Pass

Numeric species with styled plot

Numeric species normalization and supported styling options worked as documented.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1Numeric species 9606 resolves successfully

✅A2Supported style options do not break the run

✅A3The skill still produces the documented output set

✅A4Core network results stay consistent for the same input and threshold

✅A5Execution stays in scope for a local STRING workflow

Pass rate: 5 / 5

Edge✅ Pass

Lower-bound threshold 400

The documented lower boundary was accepted and produced a denser network with a useful warning.

Basic 35/40|Specialized 53/60|Total 88/100

✅A1Threshold lower bound 400 is accepted

✅A2The tool emits a useful low-threshold warning

✅A3Output files are still produced successfully

✅A4Summary metrics reflect the changed threshold

✅A5The workflow remains methodologically coherent at the boundary

Pass rate: 5 / 5

Variant B✅ Pass

Plot-only regeneration

Plot-only reuse behaved as documented and avoided unnecessary rebuild work.

Basic 34/40|Specialized 50/60|Total 84/100

✅A1Plot-only mode succeeds from a prior full run

✅A2The CLI does not force unnecessary rebuild work

✅A3Output stays in the existing target directory

✅A4The reuse path matches the documented contract

Pass rate: 4 / 4

Stress✅ Pass

High-option styled run

A broad supported option set executed successfully without destabilizing the analysis outputs.

Basic 37/40|Specialized 55/60|Total 92/100

✅A1A broad supported option set executes successfully

✅A2Data outputs remain stable for the same input and threshold

✅A3Plot customization stays within the documented feature set

✅A4Reproducibility and containment remain intact

✅A5The full output contract is preserved under higher option complexity

Pass rate: 5 / 5

Scope Boundary✅ Pass

Unsupported species rejection

The command failed on the expected validation path and produced no fresh output directory.

Basic 33/40|Specialized 49/60|Total 82/100

✅A1Unsupported species is rejected with SKILL_INVALID_PARAMETER

✅A2The error message tells the user how to recover

✅A3A fresh failing run does not create a new output directory

✅A4The command fails before any cache or file-output side effects

Pass rate: 4 / 4

Adversarial✅ Pass

Invalid line_type rejection

The unsupported line_type value now fails validation immediately with no side effects.

Basic 34/40|Specialized 50/60|Total 84/100

✅A1Unsupported line_type values are rejected with SKILL_INVALID_PARAMETER

✅A2Runtime behavior now matches the documented contract for unsupported plot options

✅A3A fresh failing run does not create a new output directory

✅A4Validation fails before any analysis side effects

Pass rate: 4 / 4

Medical Task Total87.4 / 100

Key Strengths

The skill is genuinely runnable offline with a bundled STRING cache and reproducible seed controls.
The R implementation is modular, and the packaged smoke tests plus new regression checks passed without modification.
Input validation, path containment, and documented failure modes now align with runtime behavior.
Successful runs consistently wrote the documented bundle, tables, PDF plot, and session information.