Data Analysis

deg-screening-analysis

Screen and filter differentially expressed genes using volcano plots, MA plots, and multi-threshold criteria. Inputs: DEA result table with fold-change and p-values. Outputs: filtered DEG list, annotated volcano plot, enrichment-ready gene sets.

90100Total Score

Core Capability

92 / 100

Functional Suitability

11 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

15 / 16

Human Usability

8 / 8

Security

12 / 12

Maintainability

11 / 12

Agent-Specific

16 / 20

Medical Task

20 / 20 Passed

92Bundled OA vs control default run

4/4

90Raw p-value stricter-threshold run

4/4

87Zero-DEG strict-threshold run

4/4

90Case-insensitive group-label run

4/4

88Single-DEG sparse-heatmap run

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Passed checks. The skill reports computed limma statistics and does not fabricate citations, biomedical claims, or study outcomes.
Practice Boundaries	PASS	Passed checks. The skill stays within computational DEG screening and does not provide diagnosis, treatment advice, or unsafe biomedical recommendations.
Methodological Ground	PASS	Passed checks. The analysis matches a documented two-group limma workflow and the updated plot behavior now aligns with the declared p-value semantics.
Code Usability	PASS	Passed checks. Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully, and the bundled tests passed.

Core Capability92 / 100 — 8 Categories

Functional Suitability

Core promises are implemented and documented; only a minor wording mismatch remains in the workflow text that still mentions a volcano input table rather than the current direct plotting path.

11 / 12

92%

Reliability

The run paths are now stable and degrade cleanly, though the zero-DEG path emits two closely related heatmap warnings that could be consolidated.

11 / 12

92%

Performance & Context

No issues flagged. The skill remains compact and efficient to invoke.

8 / 8

100%

Agent Usability

The CLI surface and examples are clear, with only minor room to tighten wording around plotting side effects and warning behavior.

15 / 16

94%

Human Usability

No issues flagged. The examples, arguments, and output contract are now easy to follow.

8 / 8

100%

Security

No issues flagged. The scripts validate inputs, avoid dangerous execution primitives, and do not expose credentials or destructive operations.

12 / 12

100%

Maintainability

The code is modular and test-backed, with only small remaining documentation cleanup items.

11 / 12

92%

Agent-Specific

Trigger precision and scope boundaries are good. A slightly stronger body-level escape hatch for plot-free runs would make invocation guidance even clearer.

16 / 20

80%

Core Capability Total92 / 100

Medical TaskExecution Average: 89.4 / 100 — Assertions: 20/20 Passed

Canonical

Bundled OA vs control default run

4/4 ✓

Variant A

Raw p-value stricter-threshold run

4/4 ✓

Edge

Zero-DEG strict-threshold run

4/4 ✓

Variant B

Case-insensitive group-label run

4/4 ✓

Stress

Single-DEG sparse-heatmap run

4/4 ✓

Canonical✅ Pass

Bundled OA vs control default run

Executed successfully with all declared core artifacts present.

Basic 36/40|Specialized 56/60|Total 92/100

✅A1Core declared output files are created

✅A2The canonical run completes successfully with the bundled data

✅A3The DEG table is a filtered subset of the full differential table

✅A4The workflow stays within the declared two-group limma scope

Pass rate: 4 / 4

Variant A✅ Pass

Raw p-value stricter-threshold run

Executed successfully under stricter thresholds and raw p-value screening.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1Raw p-value screening executes successfully

✅A2The volcano path remains compatible with the raw p-value mode

✅A3The stricter thresholds still produce a coherent DEG table

✅A4Plot outputs are still generated under the variant configuration

Pass rate: 4 / 4

Edge✅ Pass

Zero-DEG strict-threshold run

Completed cleanly with an empty DEG table and informative warnings.

Basic 35/40|Specialized 52/60|Total 87/100

✅A1Strict thresholds still complete without crashing

✅A2Empty DEG output is emitted instead of failing

✅A3The skill explains the missing heatmap with warnings

✅A4The run still produces the remaining valid outputs

Pass rate: 4 / 4

Variant B✅ Pass

Case-insensitive group-label run

Executed successfully with case-insensitive group matching and a smaller heatmap selection budget.

Basic 36/40|Specialized 54/60|Total 90/100

✅A1Case and control matching is tolerant to letter case

✅A2A multi-column group file layout is accepted

✅A3Plot outputs are still generated under the variant configuration

✅A4The run remains within the documented CLI surface

Pass rate: 4 / 4

Stress✅ Pass

Single-DEG sparse-heatmap run

A sparse-result run completed successfully and skipped heatmap rendering cleanly.

Basic 35/40|Specialized 53/60|Total 88/100

✅A1A valid sparse-result run completes successfully

✅A2Heatmap generation degrades gracefully when fewer than two genes are selected

✅A3The result still includes the successful non-heatmap artifacts

✅A4The request remains within the declared two-group limma scope

Pass rate: 4 / 4

Medical Task Total89.4 / 100

Key Strengths

The CLI contract is clear, runnable, and backed by bundled example data plus a passing automated test suite.
Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully without violating the skill scope.
The plotting path now aligns with the declared p-value semantics and degrades cleanly on low-signal runs.
Determinism is strong: repeated canonical runs produced identical hashes for both major CSV outputs.