Data Analysis

scanpy

89100Total Score
Core Capability
86 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
96Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
92Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
90Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
90Packaged executable path(s): scripts/qc_analysis.py
4/4
90End-to-end case for Scope-focused workflow aligned to: Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo scientific-integrity problem was surfaced because the package did not claim more than the available records, article text, or script evidence supported.
Practice BoundariesPASSThe evaluated outputs stayed inside the Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization,... and did not drift into unsupported interpretation beyond the available inputs.
Methodological GroundPASSThe archived evaluation treated the workflow as method-linked rather than ad hoc.
Code UsabilityPASSThe archived review preserved a usable code path with named scripts, expected inputs, and a recognizable output contract.

Core Capability86 / 1008 Categories

Functional Suitability
Related legacy finding for scanpy: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
11 / 12
92%
Reliability
Related legacy finding for scanpy: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
10 / 12
83%
Performance & Context
Performance-context scoring suggests the package could handle larger or denser runs a little more gracefully.
7 / 8
88%
Agent Usability
The packaged analysis path is understandable, though the archived score suggests slightly clearer routing would help.
14 / 16
88%
Human Usability
The archived score suggests the output contract could be a little easier for users to inspect or reuse.
7 / 8
88%
Security
Security remained strong, though the archived review still left some room for clearer execution guardrails.
10 / 12
83%
Maintainability
Maintainability stayed solid, with only limited room to simplify scripts, dependencies, or packaging structure.
10 / 12
83%
Agent-Specific
Related legacy finding for scanpy: Improve stress-case output rigor. Stress and boundary scenarios show weaker consistency
17 / 20
85%
Core Capability Total86 / 100

Medical TaskExecution Average: 91.6 / 100 — Assertions: 20/20 Passed

96
Canonical
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
92
Variant A
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
90
Edge
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
90
Variant B
Packaged executable path(s): scripts/qc_analysis.py
4/4
90
Stress
End-to-end case for Scope-focused workflow aligned to: Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata
4/4
96
Canonical✅ Pass
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata

Standard single-cell RNA-seq analysis pipeline. For quality control... remained tied to the documented analysis contract even when the preserved evidence centered on instructions instead of a full rerun.

Basic 36/40|Specialized 60/60|Total 96/100
A1The scanpy output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
92
Variant A✅ Pass
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata

This variant a case stayed within the packaged analysis boundary and kept a reviewable task contract.

Basic 34/40|Specialized 58/60|Total 92/100
A1The scanpy output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
90
Edge✅ Pass
Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata

Standard single-cell RNA-seq analysis pipeline. For quality control... remained tied to the documented analysis contract even when the preserved evidence centered on instructions instead of a full rerun.

Basic 33/40|Specialized 57/60|Total 90/100
A1The scanpy output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
90
Variant B✅ Pass
Packaged executable path(s): scripts/qc_analysis.py

Packaged executable path(s): scripts/qc_analysis.py remained tied to the documented analysis contract even when the preserved evidence centered on instructions instead of a full rerun.

Basic 32/40|Specialized 58/60|Total 90/100
A1The scanpy output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
90
Stress✅ Pass
End-to-end case for Scope-focused workflow aligned to: Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization, dimensionality reduction (PCA/UMAP/t-SNE), clustering, differential expression analysis, and visualization. Best suited for exploratory single-cell transcriptomics analysis using established workflows. For deep learning models, use scvi-tools; for data format issues, use anndata

The archived run treated Standard single-cell RNA-seq analysis pipeline. For quality control (QC), normalization,... as a bounded analysis workflow rather than a purely narrative instruction path.

Basic 29/40|Specialized 60/60|Total 90/100
A1The scanpy output structure matches the documented deliverable
A2The instruction path remains actionable for the documented case
A3The output stays fully within the documented skill boundary
A4The response quality is acceptable for the documented path
Pass rate: 4 / 4
Medical Task Total91.6 / 100

Key Strengths

  • Primary routing is Data Analysis with execution mode B
  • Static quality score is 86/100 and dynamic average is 78.6/100
  • Assertions and command execution outcomes are recorded per input for human review
  • Execution verification summary: Script verification 1/1; adjustment=5. qc_analysis.py: OK