Data Analysis
pca-dimensionality-reduction
Reduce high-dimensional gene expression data using PCA and visualize sample clustering and variance structure. Inputs: expression matrix, sample annotation table. Outputs: PCA score plots, scree plot, variance-explained table, 2D/3D scatter plots.
93100Total Score
Core Capability
92 / 100
Functional Suitability
12 / 12
Reliability
10 / 12
Performance & Context
7 / 8
Agent Usability
16 / 16
Human Usability
7 / 8
Security
10 / 12
Maintainability
12 / 12
Agent-Specific
18 / 20
Medical Task
20 / 20 Passed
96Explicit gene-feature PCA
4/4
91High-dimensional auto-feature PCA
4/4
86Invalid component count
4/4
97Missing-value filtering with groups
4/4
96TXT export with custom parameters
4/4
Veto GatesRequired pass for any deployment consideration
Skill Veto✓ All 4 gates passed
✓
Operational Stability
System remains stable across varied inputs and edge cases
PASS✓
Structural Consistency
Output structure conforms to expected skill contract format
PASS✓
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS✓
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASSResearch Veto✅ PASS — Applicable
| Dimension | Result | Detail |
|---|---|---|
| Scientific Integrity | PASS | No fabricated claims, invented statistics, or unverifiable research statements were produced across the test runs. |
| Practice Boundaries | PASS | The skill stayed within PCA execution scope and did not emit diagnostic, prescriptive, or medically unsafe guidance. |
| Methodological Ground | PASS | All successful runs used an appropriate PCA workflow for numeric tabular data, and the edge case rejected invalid parameters cleanly. |
| Code Usability | PASS | The R code executed successfully on all valid scenarios and handled the invalid scenario with a clean validation failure. |
Core Capability92 / 100 — 8 Categories
Functional Suitability
Full score achieved
12 / 12
100%
Reliability
Validation and recovery behavior are strong, but failures after output directory creation can still leave partial directories and there are no fallback paths beyond aborting.
10 / 12
83%
Performance & Context
Progressive disclosure is good, but very wide datasets produce extremely verbose stdout because the full retained feature list is logged.
7 / 8
88%
Agent Usability
Full score achieved
16 / 16
100%
Human Usability
Trigger language is natural, but the workflow remains rigid for loosely phrased requests because it is primarily a CLI skill.
7 / 8
88%
Security
No secret-handling issues were found, but output_prefix is only checked for emptiness and the tool always writes a filtered copy of the input data into the output directory.
10 / 12
83%
Maintainability
Full score achieved
12 / 12
100%
Agent-Specific
Triggering and layering are strong, but out-of-scope guidance could be more explicit and composability would improve with a concise machine-readable execution summary.
18 / 20
90%
Core Capability Total92 / 100
Medical TaskExecution Average: 93.2 / 100 — Assertions: 20/20 Passed
96
Canonical
Explicit gene-feature PCA
4/4 ✓
91
Variant A
High-dimensional auto-feature PCA
4/4 ✓
86
Edge
Invalid component count
4/4 ✓
97
Variant B
Missing-value filtering with groups
4/4 ✓
96
Stress
TXT export with custom parameters
4/4 ✓
96
Canonical✅ Pass
Explicit gene-feature PCA
Executed perfectly on the documented happy-path example.
Basic 39/40|Specialized 57/60|Total 96/100
✅A1Output creates the documented table, figure, and data directories
✅A2Summary output includes the documented variance fields
✅A3Score output preserves sample and group information
✅A4Run completes without warnings or errors on the canonical example
Pass rate: 4 / 4
91
Variant A✅ Pass
High-dimensional auto-feature PCA
Methodology and execution were sound, but stdout became extremely verbose because all 16690 feature names were logged.
Basic 37/40|Specialized 54/60|Total 91/100
✅A1Auto-detection resolves a usable sample ID column
✅A2The run exports exactly the requested number of components
✅A3The documented tables and figures are written successfully
✅A4The method remains executable on a high-dimensional p greater than n dataset
Pass rate: 4 / 4
86
Edge✅ Pass
Invalid component count
Expected validation failure surfaced clearly before any analysis or file writes.
Basic 36/40|Specialized 50/60|Total 86/100
✅A1Invalid n_components is rejected before analysis begins
✅A2Error reporting is specific and actionable
✅A3No partial output directory is created for this validation failure
✅A4The failure stays safely in scope without unsafe side effects
Pass rate: 4 / 4
97
Variant B✅ Pass
Missing-value filtering with groups
Excellent degraded-path behavior with clear warnings and complete output generation.
Basic 39/40|Specialized 58/60|Total 97/100
✅A1Rows with incomplete feature values are removed and reported
✅A2Requested components are reduced safely when the data cannot support them
✅A3Group information is preserved after filtering
✅A4Standard outputs are still generated after filtering
Pass rate: 4 / 4
96
Stress✅ Pass
TXT export with custom parameters
Custom output format and parameter overrides were handled cleanly without regressions.
Basic 39/40|Specialized 57/60|Total 96/100
✅A1TXT outputs use the requested prefix consistently
✅A2Metadata records the parameter override correctly
✅A3Top-loadings output respects the requested limit
✅A4Figures are still generated for the customized run
Pass rate: 4 / 4
Medical Task Total93.2 / 100
Key Strengths
- The skill is tightly aligned with its stated scope and all documented core workflows executed successfully.
- Validation and error messages are concrete, namespaced, and easy for both humans and agents to act on.
- The implementation is modular, deterministic, and produces reproducible tabular and figure outputs.
- Degraded-path behavior is strong: missing-value filtering and component downscaling are both surfaced clearly.