Data Analysis

gsea

Perform Gene Set Enrichment Analysis using fgsea or clusterProfiler GSEA on a pre-ranked or expression-derived gene list. Inputs: ranked gene list or expression matrix with phenotype labels. Outputs: NES table, enrichment score plots, leading-edge genes.

90100Total Score

Core Capability

91 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

7 / 8

Agent Usability

14 / 16

Human Usability

6 / 8

Security

12 / 12

Maintainability

11 / 12

Agent-Specific

18 / 20

Medical Task

24 / 24 Passed

94Human KEGG analysis

5/5

92Hallmarks via RDS

5/5

76Invalid gene column

4/4

93Plot top three pathways

5/5

91Conflicting mode inputs

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated identifiers, statistics, or biomedical claims were produced in any output.
Practice Boundaries	PASS	Outputs remained computational and did not provide diagnosis, treatment advice, or unsafe medical guidance.
Methodological Ground	PASS	The tested workflows matched the stated GSEA and plotting scope without methodological fallacies.
Code Usability	PASS	Core analysis and plotting commands executed successfully, and validation failures were handled cleanly.

Core Capability91 / 100 — 8 Categories

Functional Suitability

Full score achieved.

12 / 12

100%

Reliability

Error handling is strong, but the top-6 running-score export limit is not surfaced early enough for downstream plotting expectations.

11 / 12

92%

Performance & Context

Reference loading is layered well, though the SKILL.md parameter section is dense and slightly heavier than necessary.

7 / 8

88%

Agent Usability

Instructions are clear overall, but the workflow logic is implicit rather than summarized in a short execution checklist.

14 / 16

88%

Human Usability

The description is concise but under-advertises the plotting use case and downstream limitations.

6 / 8

75%

Security

Full score achieved.

12 / 12

100%

Maintainability

Scripts are modular and testable, with only minor documentation coupling around output-file assumptions.

11 / 12

92%

Agent-Specific

Triggering is precise, but the frontmatter description does not mention plotting mode and therefore slightly under-triggers.

18 / 20

90%

Core Capability Total91 / 100

Medical TaskExecution Average: 89.2 / 100 — Assertions: 24/24 Passed

Canonical

Human KEGG analysis

5/5 ✓

Variant A

Hallmarks via RDS

5/5 ✓

Edge

Invalid gene column

4/4 ✓

Variant B

Plot top three pathways

5/5 ✓

Stress

Conflicting mode inputs

5/5 ✓

Canonical✅ Pass

Human KEGG analysis

Executed perfectly with all expected artifacts and reproducible outputs.

Basic 38/40|Specialized 56/60|Total 94/100

✅A1Output creates all documented core analysis artifacts

✅A2Output includes machine-verifiable run summary markers

✅A3Enrichment table contains expected GSEA columns

✅A4Running score table is non-empty for significant pathways

✅A5Re-running with the same seed is reproducible

Pass rate: 5 / 5

Variant A✅ Pass

Hallmarks via RDS

Alias mapping and alternate gene-set family both worked as documented.

Basic 37/40|Specialized 55/60|Total 92/100

✅A1Skill accepts HALLMARKS as documented

✅A2RDS alias mapping resolves the Hallmarks key correctly

✅A3Output includes significant Hallmark pathways

✅A4Output stays within the stated analysis scope

✅A5Result structure matches the documented schema

Pass rate: 5 / 5

Edge✅ Pass

Invalid gene column

Expected validation failure surfaced clearly and left no partial result files.

Basic 32/40|Specialized 44/60|Total 76/100

✅A1Invalid column input is rejected with a named error code

✅A2Error reporting is actionable

✅A3Failure does not leave partial result files behind

✅A4The skill remains within scope on failure

Pass rate: 4 / 4

Variant B✅ Pass

Plot top three pathways

Visualization mode executed cleanly and produced the requested PDF output.

Basic 38/40|Specialized 55/60|Total 93/100

✅A1Plotting mode works from the documented table inputs

✅A2Output confirms how many pathways were plotted

✅A3Output lists the plotted pathway IDs

✅A4Plotting parameters are respected

✅A5Visualization stays within scope and does not rerun analysis

Pass rate: 5 / 5

Stress✅ Pass

Conflicting mode inputs

Conflict handling matched the documented precedence rule and still produced the PNG plot.

Basic 37/40|Specialized 54/60|Total 91/100

✅A1Conflicting mode selection follows the documented precedence rule

✅A2The warning is explicit rather than silent

✅A3Plot output is still generated successfully after the warning

✅A4No unintended analysis outputs are created during conflict handling

✅A5Control flow remains deterministic under mixed inputs

Pass rate: 5 / 5

Medical Task Total89.2 / 100

Key Strengths

Analysis mode, plotting mode, and conflict-handling all executed successfully under real commands.
Validation errors are specific, named, and actionable rather than cryptic R stack traces.
Outputs are reproducible with a fixed seed and are easy to verify from generated artifact files.
The skill uses clean separation between CLI documentation, troubleshooting guidance, and R implementation.