Data Analysis

pca-dimensionality-reduction

Reduce high-dimensional gene expression data using PCA and visualize sample clustering and variance structure. Inputs: expression matrix, sample annotation table. Outputs: PCA score plots, scree plot, variance-explained table, 2D/3D scatter plots.

93100Total Score

Core Capability

92 / 100

Functional Suitability

12 / 12

Reliability

10 / 12

Performance & Context

7 / 8

Agent Usability

16 / 16

Human Usability

7 / 8

Security

10 / 12

Maintainability

12 / 12

Agent-Specific

18 / 20

Medical Task

20 / 20 Passed

96Explicit gene-feature PCA

4/4

91High-dimensional auto-feature PCA

4/4

86Invalid component count

4/4

97Missing-value filtering with groups

4/4

96TXT export with custom parameters

4/4

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated claims, invented statistics, or unverifiable research statements were produced across the test runs.
Practice Boundaries	PASS	The skill stayed within PCA execution scope and did not emit diagnostic, prescriptive, or medically unsafe guidance.
Methodological Ground	PASS	All successful runs used an appropriate PCA workflow for numeric tabular data, and the edge case rejected invalid parameters cleanly.
Code Usability	PASS	The R code executed successfully on all valid scenarios and handled the invalid scenario with a clean validation failure.

Core Capability92 / 100 — 8 Categories

Functional Suitability

Full score achieved

12 / 12

100%

Reliability

Validation and recovery behavior are strong, but failures after output directory creation can still leave partial directories and there are no fallback paths beyond aborting.

10 / 12

83%

Performance & Context

Progressive disclosure is good, but very wide datasets produce extremely verbose stdout because the full retained feature list is logged.

7 / 8

88%

Agent Usability

Full score achieved

16 / 16

100%

Human Usability

Trigger language is natural, but the workflow remains rigid for loosely phrased requests because it is primarily a CLI skill.

7 / 8

88%

Security

No secret-handling issues were found, but output_prefix is only checked for emptiness and the tool always writes a filtered copy of the input data into the output directory.

10 / 12

83%

Maintainability

Full score achieved

12 / 12

100%

Agent-Specific

Triggering and layering are strong, but out-of-scope guidance could be more explicit and composability would improve with a concise machine-readable execution summary.

18 / 20

90%

Core Capability Total92 / 100

Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed

Canonical

Explicit gene-feature PCA

4/4 ✓

Variant A

High-dimensional auto-feature PCA

4/4 ✓

Edge

Invalid component count

4/4 ✓

Variant B

Missing-value filtering with groups

4/4 ✓

Stress

TXT export with custom parameters

4/4 ✓

Canonical✅ Pass

Explicit gene-feature PCA

Executed perfectly on the documented happy-path example.

Basic 39/40|Specialized 57/60|Total 96/100

✅A1Output creates the documented table, figure, and data directories

✅A2Summary output includes the documented variance fields

✅A3Score output preserves sample and group information

✅A4Run completes without warnings or errors on the canonical example

Pass rate: 4 / 4

Variant A✅ Pass

High-dimensional auto-feature PCA

Methodology and execution were sound, but stdout became extremely verbose because all 16690 feature names were logged.

Basic 37/40|Specialized 54/60|Total 91/100

✅A1Auto-detection resolves a usable sample ID column

✅A2The run exports exactly the requested number of components

✅A3The documented tables and figures are written successfully

✅A4The method remains executable on a high-dimensional p greater than n dataset

Pass rate: 4 / 4

Edge✅ Pass

Invalid component count

Expected validation failure surfaced clearly before any analysis or file writes.

Basic 36/40|Specialized 50/60|Total 86/100

✅A1Invalid n_components is rejected before analysis begins

✅A2Error reporting is specific and actionable

✅A3No partial output directory is created for this validation failure

✅A4The failure stays safely in scope without unsafe side effects

Pass rate: 4 / 4

Variant B✅ Pass

Missing-value filtering with groups

Excellent degraded-path behavior with clear warnings and complete output generation.

Basic 39/40|Specialized 58/60|Total 97/100

✅A1Rows with incomplete feature values are removed and reported

✅A2Requested components are reduced safely when the data cannot support them

✅A3Group information is preserved after filtering

✅A4Standard outputs are still generated after filtering

Pass rate: 4 / 4

Stress✅ Pass

TXT export with custom parameters

Custom output format and parameter overrides were handled cleanly without regressions.

Basic 39/40|Specialized 57/60|Total 96/100

✅A1TXT outputs use the requested prefix consistently

✅A2Metadata records the parameter override correctly

✅A3Top-loadings output respects the requested limit

✅A4Figures are still generated for the customized run

Pass rate: 4 / 4

Medical Task Total93.2 / 100

Key Strengths

The skill is tightly aligned with its stated scope and all documented core workflows executed successfully.
Validation and error messages are concrete, namespaced, and easy for both humans and agents to act on.
The implementation is modular, deterministic, and produces reproducible tabular and figure outputs.
Degraded-path behavior is strong: missing-value filtering and component downscaling are both surfaced clearly.