Data Analysis
deg-screening-analysis
Screen and filter differentially expressed genes using volcano plots, MA plots, and multi-threshold criteria. Inputs: DEA result table with fold-change and p-values. Outputs: filtered DEG list, annotated volcano plot, enrichment-ready gene sets.
90100Total Score
Core Capability
92 / 100
Functional Suitability
11 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
92Bundled OA vs control default run
4/4
90Raw p-value stricter-threshold run
4/4
87Zero-DEG strict-threshold run
4/4
90Case-insensitive group-label run
4/4
88Single-DEG sparse-heatmap run
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | Passed checks. The skill reports computed limma statistics and does not fabricate citations, biomedical claims, or study outcomes. |
| Practice Boundaries | PASS | Passed checks. The skill stays within computational DEG screening and does not provide diagnosis, treatment advice, or unsafe biomedical recommendations. |
| Methodological Ground | PASS | Passed checks. The analysis matches a documented two-group limma workflow and the updated plot behavior now aligns with the declared p-value semantics. |
| Code Usability | PASS | Passed checks. Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully, and the bundled tests passed. |
Core Capability92 / 100 — 8 Categories
Functional Suitability
Core promises are implemented and documented; only a minor wording mismatch remains in the workflow text that still mentions a volcano input table rather than the current direct plotting path.
11 / 12
92%
Reliability
The run paths are now stable and degrade cleanly, though the zero-DEG path emits two closely related heatmap warnings that could be consolidated.
11 / 12
92%
Performance & Context
No issues flagged. The skill remains compact and efficient to invoke.
8 / 8
100%
Agent Usability
The CLI surface and examples are clear, with only minor room to tighten wording around plotting side effects and warning behavior.
15 / 16
94%
Human Usability
No issues flagged. The examples, arguments, and output contract are now easy to follow.
8 / 8
100%
Security
No issues flagged. The scripts validate inputs, avoid dangerous execution primitives, and do not expose credentials or destructive operations.
12 / 12
100%
Maintainability
The code is modular and test-backed, with only small remaining documentation cleanup items.
11 / 12
92%
Agent-Specific
Trigger precision and scope boundaries are good. A slightly stronger body-level escape hatch for plot-free runs would make invocation guidance even clearer.
16 / 20
80%
Core Capability Total92 / 100
Medical TaskExecution Average: 89.4 / 100 — Assertions: 20/20 Passed
92
Canonical
Bundled OA vs control default run
4/4 ✓
90
Variant A
Raw p-value stricter-threshold run
4/4 ✓
87
Edge
Zero-DEG strict-threshold run
4/4 ✓
90
Variant B
Case-insensitive group-label run
4/4 ✓
88
Stress
Single-DEG sparse-heatmap run
4/4 ✓
92
Canonical✅ Pass
Bundled OA vs control default run
Executed successfully with all declared core artifacts present.
Basic 36/40|Specialized 56/60|Total 92/100
✅A1Core declared output files are created
✅A2The canonical run completes successfully with the bundled data
✅A3The DEG table is a filtered subset of the full differential table
✅A4The workflow stays within the declared two-group limma scope
Pass rate: 4 / 4
90
Variant A✅ Pass
Raw p-value stricter-threshold run
Executed successfully under stricter thresholds and raw p-value screening.
Basic 36/40|Specialized 54/60|Total 90/100
✅A1Raw p-value screening executes successfully
✅A2The volcano path remains compatible with the raw p-value mode
✅A3The stricter thresholds still produce a coherent DEG table
✅A4Plot outputs are still generated under the variant configuration
Pass rate: 4 / 4
87
Edge✅ Pass
Zero-DEG strict-threshold run
Completed cleanly with an empty DEG table and informative warnings.
Basic 35/40|Specialized 52/60|Total 87/100
✅A1Strict thresholds still complete without crashing
✅A2Empty DEG output is emitted instead of failing
✅A3The skill explains the missing heatmap with warnings
✅A4The run still produces the remaining valid outputs
Pass rate: 4 / 4
90
Variant B✅ Pass
Case-insensitive group-label run
Executed successfully with case-insensitive group matching and a smaller heatmap selection budget.
Basic 36/40|Specialized 54/60|Total 90/100
✅A1Case and control matching is tolerant to letter case
✅A2A multi-column group file layout is accepted
✅A3Plot outputs are still generated under the variant configuration
✅A4The run remains within the documented CLI surface
Pass rate: 4 / 4
88
Stress✅ Pass
Single-DEG sparse-heatmap run
A sparse-result run completed successfully and skipped heatmap rendering cleanly.
Basic 35/40|Specialized 53/60|Total 88/100
✅A1A valid sparse-result run completes successfully
✅A2Heatmap generation degrades gracefully when fewer than two genes are selected
✅A3The result still includes the successful non-heatmap artifacts
✅A4The request remains within the declared two-group limma scope
Pass rate: 4 / 4
Medical Task Total89.4 / 100
Key Strengths
- The CLI contract is clear, runnable, and backed by bundled example data plus a passing automated test suite.
- Canonical, variant, zero-DEG, and sparse-heatmap runs all completed successfully without violating the skill scope.
- The plotting path now aligns with the declared p-value semantics and degrades cleanly on low-signal runs.
- Determinism is strong: repeated canonical runs produced identical hashes for both major CSV outputs.