Data Analysis

deg-screening-analysis

Screen and filter differentially expressed genes using volcano plots, MA plots, and multi-threshold criteria. Inputs: DEA result table with fold-change and p-values. Outputs: filtered DEG list, annotated volcano plot, enrichment-ready gene sets.

90100Total Score
Core Capability
92 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
92Bundled OA vs control default run
4/4
90Raw p-value stricter-threshold run
4/4
87Zero-DEG strict-threshold run
4/4
90Case-insensitive group-label run
4/4
88Single-DEG sparse-heatmap run
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSPassed checks. The skill reports computed limma statistics and does not fabricate citations, biomedical claims, or study outcomes.
Practice BoundariesPASSPassed checks. The skill stays within computational DEG screening and does not provide diagnosis, treatment advice, or unsafe biomedical recommendations.
Methodological GroundPASSPassed checks. The analysis matches a documented two-group limma workflow and the updated plot behavior now aligns with the declared p-value semantics.
Code UsabilityPASSPassed checks. Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully, and the bundled tests passed.

Core Capability92 / 1008 Categories

Functional Suitability
Core promises are implemented and documented; only a minor wording mismatch remains in the workflow text that still mentions a volcano input table rather than the current direct plotting path.
11 / 12
92%
Reliability
The run paths are now stable and degrade cleanly, though the zero-DEG path emits two closely related heatmap warnings that could be consolidated.
11 / 12
92%
Performance & Context
No issues flagged. The skill remains compact and efficient to invoke.
8 / 8
100%
Agent Usability
The CLI surface and examples are clear, with only minor room to tighten wording around plotting side effects and warning behavior.
15 / 16
94%
Human Usability
No issues flagged. The examples, arguments, and output contract are now easy to follow.
8 / 8
100%
Security
No issues flagged. The scripts validate inputs, avoid dangerous execution primitives, and do not expose credentials or destructive operations.
12 / 12
100%
Maintainability
The code is modular and test-backed, with only small remaining documentation cleanup items.
11 / 12
92%
Agent-Specific
Trigger precision and scope boundaries are good. A slightly stronger body-level escape hatch for plot-free runs would make invocation guidance even clearer.
16 / 20
80%
Core Capability Total92 / 100

Medical TaskExecution Average: 89.4 / 100 — Assertions: 20/20 Passed

92
Canonical
Bundled OA vs control default run
4/4
90
Variant A
Raw p-value stricter-threshold run
4/4
87
Edge
Zero-DEG strict-threshold run
4/4
90
Variant B
Case-insensitive group-label run
4/4
88
Stress
Single-DEG sparse-heatmap run
4/4
92
Canonical✅ Pass
Bundled OA vs control default run

Executed successfully with all declared core artifacts present.

Basic 36/40|Specialized 56/60|Total 92/100
A1Core declared output files are created
A2The canonical run completes successfully with the bundled data
A3The DEG table is a filtered subset of the full differential table
A4The workflow stays within the declared two-group limma scope
Pass rate: 4 / 4
90
Variant A✅ Pass
Raw p-value stricter-threshold run

Executed successfully under stricter thresholds and raw p-value screening.

Basic 36/40|Specialized 54/60|Total 90/100
A1Raw p-value screening executes successfully
A2The volcano path remains compatible with the raw p-value mode
A3The stricter thresholds still produce a coherent DEG table
A4Plot outputs are still generated under the variant configuration
Pass rate: 4 / 4
87
Edge✅ Pass
Zero-DEG strict-threshold run

Completed cleanly with an empty DEG table and informative warnings.

Basic 35/40|Specialized 52/60|Total 87/100
A1Strict thresholds still complete without crashing
A2Empty DEG output is emitted instead of failing
A3The skill explains the missing heatmap with warnings
A4The run still produces the remaining valid outputs
Pass rate: 4 / 4
90
Variant B✅ Pass
Case-insensitive group-label run

Executed successfully with case-insensitive group matching and a smaller heatmap selection budget.

Basic 36/40|Specialized 54/60|Total 90/100
A1Case and control matching is tolerant to letter case
A2A multi-column group file layout is accepted
A3Plot outputs are still generated under the variant configuration
A4The run remains within the documented CLI surface
Pass rate: 4 / 4
88
Stress✅ Pass
Single-DEG sparse-heatmap run

A sparse-result run completed successfully and skipped heatmap rendering cleanly.

Basic 35/40|Specialized 53/60|Total 88/100
A1A valid sparse-result run completes successfully
A2Heatmap generation degrades gracefully when fewer than two genes are selected
A3The result still includes the successful non-heatmap artifacts
A4The request remains within the declared two-group limma scope
Pass rate: 4 / 4
Medical Task Total89.4 / 100

Key Strengths

  • The CLI contract is clear, runnable, and backed by bundled example data plus a passing automated test suite.
  • Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully without violating the skill scope.
  • The plotting path now aligns with the declared p-value semantics and degrades cleanly on low-signal runs.
  • Determinism is strong: repeated canonical runs produced identical hashes for both major CSV outputs.