Data Analysis

pca-dimensionality-reduction

Reduce high-dimensional gene expression data using PCA and visualize sample clustering and variance structure. Inputs: expression matrix, sample annotation table. Outputs: PCA score plots, scree plot, variance-explained table, 2D/3D scatter plots.

93100Total Score
Core Capability
92 / 100
Functional Suitability
12 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
16 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
20 / 20 Passed
96Explicit gene-feature PCA
4/4
91High-dimensional auto-feature PCA
4/4
86Invalid component count
4/4
97Missing-value filtering with groups
4/4
96TXT export with custom parameters
4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo fabricated claims, invented statistics, or unverifiable research statements were produced across the test runs.
Practice BoundariesPASSThe skill stayed within PCA execution scope and did not emit diagnostic, prescriptive, or medically unsafe guidance.
Methodological GroundPASSAll successful runs used an appropriate PCA workflow for numeric tabular data, and the edge case rejected invalid parameters cleanly.
Code UsabilityPASSThe R code executed successfully on all valid scenarios and handled the invalid scenario with a clean validation failure.

Core Capability92 / 1008 Categories

Functional Suitability
Full score achieved
12 / 12
100%
Reliability
Validation and recovery behavior are strong, but failures after output directory creation can still leave partial directories and there are no fallback paths beyond aborting.
10 / 12
83%
Performance & Context
Progressive disclosure is good, but very wide datasets produce extremely verbose stdout because the full retained feature list is logged.
7 / 8
88%
Agent Usability
Full score achieved
16 / 16
100%
Human Usability
Trigger language is natural, but the workflow remains rigid for loosely phrased requests because it is primarily a CLI skill.
7 / 8
88%
Security
No secret-handling issues were found, but output_prefix is only checked for emptiness and the tool always writes a filtered copy of the input data into the output directory.
10 / 12
83%
Maintainability
Full score achieved
12 / 12
100%
Agent-Specific
Triggering and layering are strong, but out-of-scope guidance could be more explicit and composability would improve with a concise machine-readable execution summary.
18 / 20
90%
Core Capability Total92 / 100

Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed

96
Canonical
Explicit gene-feature PCA
4/4
91
Variant A
High-dimensional auto-feature PCA
4/4
86
Edge
Invalid component count
4/4
97
Variant B
Missing-value filtering with groups
4/4
96
Stress
TXT export with custom parameters
4/4
96
Canonical✅ Pass
Explicit gene-feature PCA

Executed perfectly on the documented happy-path example.

Basic 39/40|Specialized 57/60|Total 96/100
A1Output creates the documented table, figure, and data directories
A2Summary output includes the documented variance fields
A3Score output preserves sample and group information
A4Run completes without warnings or errors on the canonical example
Pass rate: 4 / 4
91
Variant A✅ Pass
High-dimensional auto-feature PCA

Methodology and execution were sound, but stdout became extremely verbose because all 16690 feature names were logged.

Basic 37/40|Specialized 54/60|Total 91/100
A1Auto-detection resolves a usable sample ID column
A2The run exports exactly the requested number of components
A3The documented tables and figures are written successfully
A4The method remains executable on a high-dimensional p greater than n dataset
Pass rate: 4 / 4
86
Edge✅ Pass
Invalid component count

Expected validation failure surfaced clearly before any analysis or file writes.

Basic 36/40|Specialized 50/60|Total 86/100
A1Invalid n_components is rejected before analysis begins
A2Error reporting is specific and actionable
A3No partial output directory is created for this validation failure
A4The failure stays safely in scope without unsafe side effects
Pass rate: 4 / 4
97
Variant B✅ Pass
Missing-value filtering with groups

Excellent degraded-path behavior with clear warnings and complete output generation.

Basic 39/40|Specialized 58/60|Total 97/100
A1Rows with incomplete feature values are removed and reported
A2Requested components are reduced safely when the data cannot support them
A3Group information is preserved after filtering
A4Standard outputs are still generated after filtering
Pass rate: 4 / 4
96
Stress✅ Pass
TXT export with custom parameters

Custom output format and parameter overrides were handled cleanly without regressions.

Basic 39/40|Specialized 57/60|Total 96/100
A1TXT outputs use the requested prefix consistently
A2Metadata records the parameter override correctly
A3Top-loadings output respects the requested limit
A4Figures are still generated for the customized run
Pass rate: 4 / 4
Medical Task Total93.2 / 100

Key Strengths

  • The skill is tightly aligned with its stated scope and all documented core workflows executed successfully.
  • Validation and error messages are concrete, namespaced, and easy for both humans and agents to act on.
  • The implementation is modular, deterministic, and produces reproducible tabular and figure outputs.
  • Degraded-path behavior is strong: missing-value filtering and component downscaling are both surfaced clearly.