Data Analysis

ppi-network-analysis

Construct and analyze protein-protein interaction networks by querying the STRING database for input gene lists. Inputs: DEG or candidate gene list. Outputs: PPI network graph, hub genes by degree/betweenness centrality, network topology statistics.

90100Total Score
Core Capability
94 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
12 / 12
Agent-Specific
16 / 20
Medical Task
32 / 32 Passed
92Human bundled gene list
5/5
90Numeric species with styled plot
5/5
88Lower-bound threshold 400
5/5
84Plot-only regeneration
4/4
92High-option styled run
5/5
82Unsupported species rejection
4/4
84Invalid line_type rejection
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo fabricated biological or statistical claims were introduced across the audited outputs.
Practice BoundariesPASSOutputs remained technical and procedural; no diagnostic, prescriptive, or treatment advice was generated.
Methodological GroundPASSSTRING-cache mapping, score filtering, graph construction, and plot-only reuse matched the documented workflow without methodological fallacies.
Code UsabilityPASSThe CLI, bundled tests, and new regression checks all ran successfully after the fixes.

Core Capability94 / 1008 Categories

Functional Suitability
Full score achieved.
12 / 12
100%
Reliability
Failure modes are clear and recoverable; species validation still occurs after gene-list parsing, but no incorrect side effects remain.
11 / 12
92%
Performance & Context
Full score achieved.
8 / 8
100%
Agent Usability
The execution path is clear and example-driven, with only minor front-loaded logging before one validation failure path.
15 / 16
94%
Human Usability
Full score achieved.
8 / 8
100%
Security
Full score achieved.
12 / 12
100%
Maintainability
Full score achieved.
12 / 12
100%
Agent-Specific
Trigger precision, layering, and escape hatches are strong; the skill remains slightly conservative in one validation order but behaves consistently.
16 / 20
80%
Core Capability Total94 / 100

Medical TaskExecution Average: 87.4 / 100 — Assertions: 32/32 Passed

92
Canonical
Human bundled gene list
5/5
90
Variant A
Numeric species with styled plot
5/5
88
Edge
Lower-bound threshold 400
5/5
84
Variant B
Plot-only regeneration
4/4
92
Stress
High-option styled run
5/5
82
Scope Boundary
Unsupported species rejection
4/4
84
Adversarial
Invalid line_type rejection
4/4
92
Canonical✅ Pass
Human bundled gene list

Executed cleanly and produced the full documented artifact set.

Basic 37/40|Specialized 55/60|Total 92/100
A1Output creates all documented artifacts
A2Summary metrics are internally consistent
A3Execution stays within the offline STRING workflow
A4Output remains inside the requested skill-local directory
A5Reproducibility controls are available for layout generation
Pass rate: 5 / 5
90
Variant A✅ Pass
Numeric species with styled plot

Numeric species normalization and supported styling options worked as documented.

Basic 36/40|Specialized 54/60|Total 90/100
A1Numeric species 9606 resolves successfully
A2Supported style options do not break the run
A3The skill still produces the documented output set
A4Core network results stay consistent for the same input and threshold
A5Execution stays in scope for a local STRING workflow
Pass rate: 5 / 5
88
Edge✅ Pass
Lower-bound threshold 400

The documented lower boundary was accepted and produced a denser network with a useful warning.

Basic 35/40|Specialized 53/60|Total 88/100
A1Threshold lower bound 400 is accepted
A2The tool emits a useful low-threshold warning
A3Output files are still produced successfully
A4Summary metrics reflect the changed threshold
A5The workflow remains methodologically coherent at the boundary
Pass rate: 5 / 5
84
Variant B✅ Pass
Plot-only regeneration

Plot-only reuse behaved as documented and avoided unnecessary rebuild work.

Basic 34/40|Specialized 50/60|Total 84/100
A1Plot-only mode succeeds from a prior full run
A2The CLI does not force unnecessary rebuild work
A3Output stays in the existing target directory
A4The reuse path matches the documented contract
Pass rate: 4 / 4
92
Stress✅ Pass
High-option styled run

A broad supported option set executed successfully without destabilizing the analysis outputs.

Basic 37/40|Specialized 55/60|Total 92/100
A1A broad supported option set executes successfully
A2Data outputs remain stable for the same input and threshold
A3Plot customization stays within the documented feature set
A4Reproducibility and containment remain intact
A5The full output contract is preserved under higher option complexity
Pass rate: 5 / 5
82
Scope Boundary✅ Pass
Unsupported species rejection

The command failed on the expected validation path and produced no fresh output directory.

Basic 33/40|Specialized 49/60|Total 82/100
A1Unsupported species is rejected with SKILL_INVALID_PARAMETER
A2The error message tells the user how to recover
A3A fresh failing run does not create a new output directory
A4The command fails before any cache or file-output side effects
Pass rate: 4 / 4
84
Adversarial✅ Pass
Invalid line_type rejection

The unsupported line_type value now fails validation immediately with no side effects.

Basic 34/40|Specialized 50/60|Total 84/100
A1Unsupported line_type values are rejected with SKILL_INVALID_PARAMETER
A2Runtime behavior now matches the documented contract for unsupported plot options
A3A fresh failing run does not create a new output directory
A4Validation fails before any analysis side effects
Pass rate: 4 / 4
Medical Task Total87.4 / 100

Key Strengths

  • The skill is genuinely runnable offline with a bundled STRING cache and reproducible seed controls.
  • The R implementation is modular, and the packaged smoke tests plus new regression checks passed without modification.
  • Input validation, path containment, and documented failure modes now align with runtime behavior.
  • Successful runs consistently wrote the documented bundle, tables, PDF plot, and session information.