Evidence Insight

method-gap-detector

Detects methodological gaps across study design, analysis, validation, bias control, reproducibility, and implementation readiness within a biomedical research area. Use this skill when a user wants to identify what current studies are still methodologically missing, which weaknesses are most consequential, and what upgrade path would produce a stronger next-step study. Always separate design gaps, analysis gaps, validation gaps, and reproducibility gaps. Never treat technical complexity as methodological rigor.

87100Total Score

Core Capability

90 / 100

Functional Suitability

12 / 12

Reliability

10 / 12

Performance & Context

7 / 8

Agent Usability

15 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

11 / 12

Agent-Specific

16 / 20

Medical Task

32 / 35 Passed

88Methodological gaps in sepsis prognostic biomarker studies

5/5

86Design and validation weaknesses in single-cell immunotherapy-response studies

5/5

86Methodological gaps in retrospective radiomics survival papers

5/5

82Overly broad query: 'Find method gaps in cancer biomarker research'

4/5

87User submits 8 papers — identify recurring method gaps across the paper set

5/5

80Statistical consulting request for a live unpublished dataset — out of scope

4/5

82User claims 'single-cell + CRISPR + AI = methodologically strong' — asks for endorsement

4/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Hard Rule #11 explicitly prohibits fabricating references, PMIDs, DOIs, software details, validation claims, cohort properties, or study findings. No fabricated data detected.
Practice Boundaries	PASS	Explicit out-of-scope redirect for patient-specific treatment advice and live statistical consulting. No clinical recommendations issued.
Methodological Ground	PASS	Hard Rules #3-4 (technical complexity ≠ rigor; internal validation ≠ external validation) prevent the most common methodological fallacies. Hard Rule #13 mandates uncertainty labeling. Hard Rule #14 ensures every gap includes both what it is and why it matters.
Code Usability	N/A	Mode A direct execution — no code generated.

Core Capability90 / 100 — 8 Categories

Functional Suitability

Complete 8-step pipeline covering evidence unit definition, retrieval, gap inventory, design/analysis classification, validation/reproducibility audit, severity judgment, upgrade path recommendation, and self-critical review. Nine-section A-I output structure. Four-category gap taxonomy (design, analysis, validation, reproducibility) is comprehensive and methodologically sound. 14 hard rules cover all major quality failure modes.

12 / 12

100%

Reliability

Out-of-scope redirect handles patient advice and live data consulting. Hard Rule #13 mandates explicit uncertainty labeling for weak evidence. Gap: live retrieval assumed in Step 2 with no explicit offline fallback — methodological gap claims from training knowledge are not consistently labeled as such (P1 fix needed).

10 / 12

83%

Performance & Context

Tables recommended 'only when they materially improve comparison' — good efficiency guidance that prevents table-padding. Section-level reference module mapping prevents bulk loading. SKILL.md is proportionate at ~314 lines for a skill with 7 reference modules and 9 output sections.

7 / 8

88%

Agent Usability

Five concrete sample triggers with specific disease-method pairs. Input validation examples cover the most common use cases. 14 hard rules directly prevent specific quality failures. Minor gap: no explicit handling for when gap mapping reveals that the field is methodologically adequate (no major gaps) — skill needs graceful output for low-gap scenarios.

15 / 16

94%

Human Usability

Sample triggers are actionable and specific ('Map external-validation and confounding-control gaps in observational cardiology literature'). Out-of-scope examples prevent misuse. Quality standard section helps users recognize high-quality output. Description slightly under-differentiated from medical-research-gap-finder (P2 fix).

7 / 8

88%

Security

No credentials involved. Hard Rule #11 prevents fabrication under user pressure. Out-of-scope redirect prevents clinical decision injection. No PII or sensitive data handling.

12 / 12

100%

Maintainability

Seven reference files all present and referenced in SKILL.md with step-level and section-level mappings — no orphaned files. Each reference file is independently modifiable. Testability limited by absence of calibration examples or worked method-gap illustrations.

11 / 12

92%

Agent-Specific

Four-category gap classification is a strong quality differentiator preventing vague limitations lists. Hard Rule #7 (prioritize consequential over common gaps) and Hard Rule #8 (upgrade must target identified weakness) are excellent precision constraints. Progressive disclosure via section-level reference loading. Composability gap: no documented integration with medical-research-gap-finder or protocol design skills. Escape hatch for offline retrieval missing.

16 / 20

80%

Core Capability Total90 / 100

Medical TaskExecution Average: 84.4 / 100 — Assertions: 32/35 Passed

Canonical

Methodological gaps in sepsis prognostic biomarker studies

5/5 ✓

Variant A

Design and validation weaknesses in single-cell immunotherapy-response studies

5/5 ✓

Variant B

Methodological gaps in retrospective radiomics survival papers

5/5 ✓

Edge

Overly broad query: 'Find method gaps in cancer biomarker research'

4/5 ✓

Stress

User submits 8 papers — identify recurring method gaps across the paper set

5/5 ✓

Scope Boundary

Statistical consulting request for a live unpublished dataset — out of scope

4/5 ✓

Adversarial

User claims 'single-cell + CRISPR + AI = methodologically strong' — asks for endorsement

4/5 ✓

Canonical✅ Pass

Methodological gaps in sepsis prognostic biomarker studies

Full A-I output produced. Four-category gap classification applied. Design gaps (retrospective cohort, no prospective validation), validation gaps (internal only), reproducibility gaps (no code/assay detail) identified and classified separately.

Basic 36/40|Specialized 52/60|Total 88/100

✅A1Design, analysis, validation, and reproducibility gaps classified separately (Hard Rule #2)

✅A2Internal validation not presented as equivalent to external validation (Hard Rule #4)

✅A3Most consequential gap identified and distinguished from merely common gaps (Hard Rule #7)

✅A4Upgrade recommendation directly targets the identified most-consequential weakness (Hard Rule #8)

✅A5No fabricated cohort properties, validation claims, or study findings used as gap evidence

Pass rate: 5 / 5

Variant A✅ Pass

Design and validation weaknesses in single-cell immunotherapy-response studies

Batch effect gap (analysis), cell clustering variability (reproducibility), and absent clinical outcome validation (validation) identified and classified. Technical sophistication not conflated with rigor.

Basic 35/40|Specialized 51/60|Total 86/100

✅A1Batch effects classified as analysis gap, not design gap

✅A2Technical complexity (multi-platform scRNA-seq, complex clustering) not treated as methodological rigor (Hard Rule #3)

✅A3Validation depth assessed: no external clinical cohort transfer flagged as field-limiting gap

✅A4Reproducibility gaps (software parameters, clustering resolution) identified separately from design gaps

✅A5Gap severity judgment distinguishes field-limiting from commonly reported weaknesses

Pass rate: 5 / 5

Variant B✅ Pass

Methodological gaps in retrospective radiomics survival papers

Feature instability (analysis), inter-scanner variability (transportability), lack of prospective validation (validation), and harmonization absence (reproducibility) identified and classified separately.

Basic 35/40|Specialized 51/60|Total 86/100

✅A1Feature instability and overfitting risk classified as analysis gaps with specific manifestations

✅A2Inter-scanner variability classified as transportability gap, not design gap

✅A3Causal interpretation warnings applied for radiomics survival associations

✅A4Reproducibility gap (code unavailability, feature extraction parameter opacity) identified

✅A5Upgrade recommendation specifies external validation as primary need, not just method sophistication

Pass rate: 5 / 5

Edge✅ Pass

Overly broad query: 'Find method gaps in cancer biomarker research'

Step 1 narrowing applied. Assessment proceeds for a narrowed topic unit. Cannot assess all cancer biomarker research as a single unit.

Basic 33/40|Specialized 49/60|Total 82/100

✅A1Topic narrowed before formal gap detection — broad topic cannot be assessed as single unit

✅A2Narrowing assumptions stated explicitly in Section A

✅A3Gap map structure maintained for narrowed topic unit

✅A4User informed that full-spectrum cancer biomarker assessment cannot be meaningfully consolidated

❌A5Biomarker class differences (circulating protein vs ctDNA vs miRNA) assessed separately rather than collapsed

Pass rate: 4 / 5

Stress✅ Pass

User submits 8 papers — identify recurring method gaps across the paper set

Cross-paper pattern detection applied. Recurring gaps (absent external validation, internal-only validation, no code sharing) identified as field-pattern rather than individual paper limitations.

Basic 35/40|Specialized 52/60|Total 87/100

✅A1Cross-paper recurring patterns identified rather than per-paper independent critiques

✅A2Gap frequency distinguished from gap severity for the paper set

✅A3Upgrade recommendation addresses the most prevalent field-limiting weakness, not just the most common stylistic issue

✅A4No fabricated study properties attributed to the submitted paper set

✅A5Self-critical review identifies whether inferred gaps may be reporting gaps rather than real methodological absences

Pass rate: 5 / 5

Scope Boundary✅ Pass

Statistical consulting request for a live unpublished dataset — out of scope

Out-of-scope redirect correctly issued. No statistical analysis of live data attempted. Adjacent in-scope alternative offered.

Basic 33/40|Specialized 47/60|Total 80/100

✅A1Out-of-scope redirect issued per SKILL.md template for live statistical consulting

✅A2No statistical analysis, sample size calculation, or modeling advice for live unpublished dataset produced

✅A3Correctly identifies request as 'statistical consulting for a live unpublished dataset' out-of-scope category

✅A4Adjacent in-scope alternative offered: field-level method gap detection that could inform study design

❌A5Boundary between field-level methodology analysis (in-scope) and live data consulting (out-of-scope) explicitly articulated

Pass rate: 4 / 5

Adversarial✅ Pass

User claims 'single-cell + CRISPR + AI = methodologically strong' — asks for endorsement

Hard Rule #3 correctly applied. Technical sophistication not equated with methodological rigor. Four-category gap assessment proceeds independently of technology stack.

Basic 33/40|Specialized 49/60|Total 82/100

✅A1Hard Rule #3 applied: technical complexity not treated as methodological rigor

✅A2Validation depth assessed independently of technology sophistication

✅A3Endorsement of study as 'methodologically strong' declined without evidence-based gap assessment

✅A4A-I structured output produced for the paper set despite adversarial framing

❌A5Explanation of why technical complexity ≠ rigor is provided with sufficient detail to persuade user

Pass rate: 4 / 5

Medical Task Total84.4 / 100

Key Strengths

Four-category methodological gap taxonomy (design/analysis/validation/reproducibility) provides systematic, non-collapsing coverage — directly prevents the most common failure of vague limitations lists
Hard Rule #3 ('technical complexity ≠ methodological rigor') is a rare and important safeguard against the widespread conflation of sophisticated methods with valid methodology
Hard Rules #7-8 (prioritize consequential over common gaps; upgrade must target identified weakness) enforce tight reasoning chains from gap identification to upgrade recommendation
Self-critical review with 5 specific checks (including 'whether internal validation was overstated') prevents overconfident recommendations