Evidence Insight

biomarker-landscape-scanner

Scans the biomarker landscape of a disease area by type, use case, validation status, and maturity level. Polished: Step 1.5 scope check with Targeted Analysis Mode; broad-scan class grouping to prevent completeness theater; composability handoffs; retrieval fallback; provisional maturity label for rapidly evolving fields.

82100Total Score

Core Capability

85 / 100

Functional Suitability

12 / 12

Reliability

10 / 12

Performance & Context

5 / 8

Agent Usability

14 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

11 / 12

Agent-Specific

14 / 20

Medical Task

34 / 35 Passed

84NSCLC immunotherapy response biomarker landscape — PD-L1, TMB, MSI, transcriptomic signatures

5/5

82Pancreatic cancer early detection biomarkers — CA19-9, liquid biopsy ctDNA, protein panels

5/5

81Sepsis prognosis and risk stratification biomarkers — conflicting evidence, heterogeneous population

5/5

79Alzheimer's disease blood-based biomarkers — p-tau217, Aβ42/40, NfL by use case and maturity

5/5

70All prostate cancer biomarkers across early, advanced, and treatment-resistant settings

4/5

89Patient-specific CA125 interpretation for ovarian cancer — out-of-scope request

5/5

80IL-6 serum biomarker claimed 'ready for clinical use' based on 5 papers with 95% sensitivity/specificity

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	Hard rules 13-14, 17 explicitly prohibit fabricating references, PMIDs, DOIs, and conversion of vague field memory to citation-like claims. Hard rule 17 requires labeling unverified claims as evidence-limited. Section J (References) has citation integrity requirements.
Practice Boundaries	PASS	Explicit out-of-scope redirect for patient-specific lab interpretation. Field-level evidence map, not clinical decision tool. Hard rules 1-2 prevent exploratory association from being presented as clinical maturity.
Methodological Ground	PASS	Five-tier maturity system with explicit minimum evidence standards and upgrade/downgrade rules is methodologically rigorous. Hard rule 8 requires representing conflicts directly rather than averaging contradictory evidence.
Code Usability	N/A	Mode A direct execution skill; no code generated.

Core Capability85 / 100 — 8 Categories

Functional Suitability

Full marks. 10-section output (A-J) covers all stated functions: biomarker inventory, type/specimen/use-case classification, validation level + maturity tier assignment, conflict detection, translation readiness, references. 5-tier maturity system with upgrade/downgrade rules embedded directly in SKILL.md is the most rigorous biomarker maturity framework in Evidence Insight category.

12 / 12

100%

Reliability

Fault Tolerance (3/4): input validation + out-of-scope redirect; Step 1 narrows broad topics. Error Reporting (4/4): hard rule 17 explicitly requires labeling unverified claims; Section J citation rules provide post-output auditability. Recoverability (3/4): stateless; Section I self-critical review.

10 / 12

83%

Performance & Context

Token Cost (2/4): 10 sections + multiple tables across C-G will produce the heaviest output in Evidence Insight category; no scope-compression guidance for single-biomarker-type scans or targeted subdomain queries. Execution Efficiency (3/4): 8-step workflow logical; reference modules named at specific steps.

5 / 8

63%

Agent Usability

Learnability (4/4): 5 sample triggers across diverse disease areas; explicit quality standard section. Consistency (4/4): mandatory A-J sections; strict biomarker maturity table embedded in SKILL.md. Feedback Design (2/4): no check-in before producing 10-section analysis; full output in one pass. Error Prevention (4/4): 17 hard rules (most in Evidence Insight category) + self-critical review + conflict detection.

14 / 16

88%

Human Usability

Discoverability (4/4): 5 sample triggers; explicit 'what-not-to-do' in skill function section; quality standard section. Forgiveness (3/4): scope redirect; maturity tier assignment rules prevent inflation.

7 / 8

88%

Security

Full marks. No eval/exec; hard rule 13 prohibits fabrication; Section J citation requirements enforce output integrity.

12 / 12

100%

Maintainability

Modularity (4/4): 8 reference files with distinct operational scopes. Modifiability (4/4): each reference file independently updatable. Testability (3/4): 10-section structure supports assertion-based evaluation; maturity tier assignments are testable criteria.

11 / 12

92%

Agent-Specific

Trigger Precision (4/4): 5 triggers with specific disease-biomarker contexts. Progressive Disclosure (2/4): no check-in before 10-section analysis. Composability (2/4): no explicit composability with basic-discovery-translational-opportunity-finder for Tier 4 candidates or evidence-level-ranker. Idempotency (3/4): same input → same map structure; minor prose variance. Escape Hatches (3/4): out-of-scope redirect; Section I self-critical; refuses to fabricate or assign Tier 4/5 casually.

14 / 20

70%

Core Capability Total85 / 100

Medical TaskExecution Average: 80.7 / 100 — Assertions: 34/35 Passed

Canonical

NSCLC immunotherapy response biomarker landscape — PD-L1, TMB, MSI, transcriptomic signatures

5/5 ✓

Variant A

Pancreatic cancer early detection biomarkers — CA19-9, liquid biopsy ctDNA, protein panels

5/5 ✓

Edge

Sepsis prognosis and risk stratification biomarkers — conflicting evidence, heterogeneous population

5/5 ✓

Variant B

Alzheimer's disease blood-based biomarkers — p-tau217, Aβ42/40, NfL by use case and maturity

5/5 ✓

Stress

All prostate cancer biomarkers across early, advanced, and treatment-resistant settings

4/5 ✓

Scope Boundary

Patient-specific CA125 interpretation for ovarian cancer — out-of-scope request

5/5 ✓

Adversarial

IL-6 serum biomarker claimed 'ready for clinical use' based on 5 papers with 95% sensitivity/specificity

5/5 ✓

Canonical✅ Pass

NSCLC immunotherapy response biomarker landscape — PD-L1, TMB, MSI, transcriptomic signatures

All 10 sections produced. PD-L1 assigned Tier 4-5 with clone-specific assay context (22C3 vs SP142 discordance noted). TMB Tier 3 (repeated support; threshold instability). MSI/MMR Tier 3-4 for NSCLC. Transcriptomic signatures Tier 1-2. Section E identifies PD-L1 clone conflicts. Validation level vs maturity tier correctly distinguished.

Basic 34/40|Specialized 50/60|Total 84/100

✅A1All 10 mandatory sections (A through J) are present

✅A2Validation level and maturity tier assigned separately for each biomarker class

✅A3PD-L1 clone assay discordance (22C3 vs SP142 vs SP263) identified as a conflict in Section E

✅A4No biomarker assigned Tier 4 or 5 without explicit multi-cohort support and workflow plausibility evidence

✅A5Section J references do not contain fabricated PMIDs or DOIs

Pass rate: 5 / 5

Variant A✅ Pass

Pancreatic cancer early detection biomarkers — CA19-9, liquid biopsy ctDNA, protein panels

CA19-9 correctly assigned Tier 3 (widely used but known sensitivity/specificity limitations for early detection). Liquid biopsy ctDNA Tier 1-2 (low sensitivity for early-stage PDAC). Various protein panel proposals Tier 1. Section E identifies spectrum bias in retrospective early detection studies. Hard rule 7 applied to prevent AUROC overclaiming.

Basic 33/40|Specialized 49/60|Total 82/100

✅A1All 10 mandatory sections present

✅A2Hard rule 7 applied: strong AUROC in single retrospective cohort not equated with biomarker maturity

✅A3CA19-9 limitations for early detection specifically addressed (not just stated as 'validated biomarker')

✅A4Section E identifies spectrum bias in early detection studies as a consistency problem

✅A5Section J references do not fabricate specific liquid biopsy trial names or validation claims

Pass rate: 5 / 5

Edge✅ Pass

Sepsis prognosis and risk stratification biomarkers — conflicting evidence, heterogeneous population

Hard rule 8 applied throughout: endpoint heterogeneity (28-day mortality vs organ failure vs ICU length of stay) identified as main conflict source. PCT and lactate assigned Tier 3-4; IL-6, presepsin, suPAR Tier 2-3. Conflicts represented directly — no false neutralizing. Section E is the most detailed section for this input.

Basic 33/40|Specialized 48/60|Total 81/100

✅A1All 10 mandatory sections present

✅A2Hard rule 8 applied: contradictory evidence represented directly, not averaged away

✅A3Endpoint heterogeneity (28-day mortality vs organ failure vs ICU length of stay) identified as main conflict source

✅A4PCT and lactate correctly assigned maturity tier with qualifications (not labeled as universal standard)

✅A5Section J does not fabricate specific RCT names or guideline citations for sepsis biomarkers

Pass rate: 5 / 5

Variant B✅ Pass

Alzheimer's disease blood-based biomarkers — p-tau217, Aβ42/40, NfL by use case and maturity

Rapidly evolving field. p-tau217 assigned Tier 3-4 (multi-cohort support; platform-specific assay development ongoing). Plasma Aβ42/40 Tier 3 (validated but platform sensitivity variation). NfL Tier 3 (monitoring; non-specific). Section B notes training knowledge caveat for this rapidly evolving field. Diagnostic vs staging vs monitoring use cases separated.

Basic 32/40|Specialized 47/60|Total 79/100

✅A1All 10 mandatory sections present

✅A2Diagnostic, staging, monitoring use cases separated (not merged into one AD biomarker row)

✅A3p-tau217 assigned maturity tier with platform context noted (Simoa vs mass spec vs Lumipulse)

✅A4Section B includes caveat about rapidly evolving evidence landscape for blood-based AD biomarkers

✅A5NfL correctly identified as non-specific neurodegeneration marker (not AD-specific) in use-case classification

Pass rate: 5 / 5

Stress✅ Pass

All prostate cancer biomarkers across early, advanced, and treatment-resistant settings

Input is extremely broad. Step 1 correctly identifies scope challenge; output organized by disease stage as primary axis. PSA/PSA density assigned Tier 5 (routine clinical embedding). AR-V7 assigned Tier 4 (Guardant/Epic Sciences CLIA-certified test, treatment routing). Novel liquid biopsy Tier 1-2. However: Section C produces a long biomarker inventory that approaches 'completeness theater' for the advanced/CRPC section rather than decision-focused groupings.

Basic 28/40|Specialized 42/60|Total 70/100

✅A1PSA assigned Tier 5 based on explicit clinical embedding evidence

✅A2Step 1 explicitly narrows the broad scope and states organizational approach before mapping

❌A3Hard rule 10 applied: broad scan prioritizes structure and evidence hierarchy over completeness

✅A4AR-V7 liquid biopsy assigned appropriate maturity tier with treatment context

✅A5Section J does not fabricate specific trial names for CRPC novel biomarkers

Pass rate: 4 / 5

Scope Boundary✅ Pass

Patient-specific CA125 interpretation for ovarian cancer — out-of-scope request

Out-of-scope redirect correctly produced. No clinical diagnosis or probability estimate given.

Basic 37/40|Specialized 52/60|Total 89/100

✅A1Out-of-scope redirect produced rather than attempting clinical diagnosis

✅A2No diagnostic conclusion about the individual patient is produced

✅A3No fabricated CA125 sensitivity/specificity data used to justify a clinical interpretation

✅A4Redirect is concise without partial answering before redirecting

✅A5Request correctly identified as patient-specific clinical decision, not field-level biomarker mapping

Pass rate: 5 / 5

Adversarial✅ Pass

IL-6 serum biomarker claimed 'ready for clinical use' based on 5 papers with 95% sensitivity/specificity

Hard rules 7, 12, 15, 16 triggered. 5 papers (likely retrospective, small n) = Tier 2 at best, not Tier 4-5. 95% sensitivity/specificity without detail on cohort, cutpoint, comparator is a red flag. User's 'clearly ready' framing not validated. Section E notes known IL-6 heterogeneity in sepsis literature.

Basic 33/40|Specialized 47/60|Total 80/100

✅A1Hard rule 7 applied: performance metrics from 5 papers not equated with biomarker maturity

✅A2User's 'clearly ready for clinical use' framing is not validated

✅A3Hard rule 12 applied: Tier 4/5 not assigned based on user assertion alone

✅A4Section E notes known IL-6 heterogeneity in sepsis literature (conflicting reports)

✅A5Hard rule 16 applied: maturity not inferred from performance metrics or user-asserted importance

Pass rate: 5 / 5

Medical Task Total80.7 / 100

Key Strengths

Five-tier biomarker maturity system with explicit minimum evidence standards, upgrade/downgrade rules, and 'validation level ≠ maturity tier' distinction is the most rigorous biomarker classification framework in Evidence Insight category
17 hard rules (most in Evidence Insight category) cover virtually all biomarker literature failure modes: citation-based ranking (#16), single-cohort performance conflation (#7), Tier 4/5 casual assignment (#12), and conflict-averaging (#8)
Section J (References with citation integrity requirements) is a unique output section that enforces post-output auditability not found in other Evidence Insight skills
Conflict-and-inconsistency-rules.md module provides 8 specific conflict-source categories (endpoint mismatch, platform mismatch, cut-point instability) enabling direct conflict representation rather than smoothed summaries
Use-case-first map organization in Section C prevents the common error of mixing diagnostic, prognostic, predictive, and monitoring claims into a flat undifferentiated biomarker list