Academic Writing

reporting-guideline-compliance-checker

Checks biomedical manuscripts against reporting guidelines such as CONSORT, STROBE, PRISMA, and TRIPOD to identify missing or weak reporting elements before submission or revision.

91100Total Score

Core Capability

93 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

7 / 8

Agent Usability

16 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

11 / 12

Agent-Specific

17 / 20

Medical Task

34 / 34 Passed

88RCT manuscript seeking full CONSORT compliance review before submission

5/5

89Prospective observational cohort study seeking STROBE compliance review

5/5

94User describes 'prospective cohort with prediction model component' but provides no manuscript text

5/5

88Systematic review and meta-analysis seeking PRISMA compliance review

5/5

88Hybrid biomarker validation + prediction model development study with unclear study type

5/5

92Request to check if manuscript meets all target journal formatting requirements and style guidelines

4/4

92User claims full CONSORT compliance and asks to confirm, providing only the methods section

5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated participant flow counts, endpoint definitions, registration numbers, or guideline requirements invented. Hard rules 1, 3, 7 explicitly prohibit fabrication and are consistently enforced.
Practice Boundaries	PASS	No diagnostic or prescriptive clinical conclusions. Skill is limited to reporting completeness review, not clinical recommendation.
Methodological Ground	PASS	Correct guideline-to-design mapping (CONSORT=RCT, STROBE=observational, PRISMA=SR/MA, TRIPOD=prediction). Hybrid boundary rules correctly flag multi-framework situations.
Code Usability	N/A	Mode A skill — no code generated.

Core Capability93 / 100 — 8 Categories

Functional Suitability

All four major guideline families covered. Hybrid study case addressed by dedicated reference file. Eight-step workflow maps cleanly to nine-section output (A–I). Five-tier severity (major, moderate, minor, not applicable, unclear) fully implemented. Scope boundary precisely defined.

12 / 12

100%

Reliability

Clarification-first gate prevents misclassification. 'Unclear due to missing manuscript material' is a formal severity tier, not an afterthought. Minor deduction: no partial-results pathway when user insists on proceeding with incomplete manuscript sections.

11 / 12

92%

Performance & Context

Seven reference files are compact (6–18 lines each). SKILL.md is 282 lines. Minor deduction: Sections D/E and Section F (submission-risk assessment) have partial content overlap — core gaps listed in D are repeated in risk assessment in F.

7 / 8

88%

Agent Usability

Full marks. Six sample triggers, seven-item core function list, quality standard comparison. Sections A–I use fixed labels. Five-tier severity is consistent across all outputs. Three independent error-prevention mechanisms: clarification-first rule, hard rules list, and 'important distinctions' section.

16 / 16

100%

Human Usability

Sample triggers, core function, quality standard all present. Section I lists specific missing inputs. Minor deduction: no explicit restart path when user provides additional sections after partial review begins.

7 / 8

88%

Security

No credentials, no APIs, no code execution. Hard rules 1 and 3 prevent fabricating compliance or flow details. Hard rule 7 prevents inventing journal policy claims. Hard rule 5 prevents conflating 'missing' with 'not applicable'.

12 / 12

100%

Maintainability

Seven focused reference files allow adding new guidelines (e.g., STARD for diagnostic accuracy, CARE for case reports) without touching core workflow. Separation between guideline selection logic and compliance item rules is clean. Minor deduction: no worked example of hybrid study classification in reference files.

11 / 12

92%

Agent-Specific

Trigger precision: six sample triggers plus explicit 'not for' scoping. Progressive disclosure: clarification gate + Section A + Section I. Idempotency: A–I structure is stable across identical inputs. Escape hatches: formal 'unclear' severity tier + Section I + hard rule 10. Deduction: no composability hooks — skill does not reference revision-strategy-planner or methods-section-writer for post-gap correction (2/4 for composability).

17 / 20

85%

Core Capability Total93 / 100

Medical TaskExecution Average: 90.1 / 100 — Assertions: 34/34 Passed

Canonical

RCT manuscript seeking full CONSORT compliance review before submission

5/5 ✓

Variant A

Prospective observational cohort study seeking STROBE compliance review

5/5 ✓

Edge

User describes 'prospective cohort with prediction model component' but provides no manuscript text

5/5 ✓

Variant B

Systematic review and meta-analysis seeking PRISMA compliance review

5/5 ✓

Stress

Hybrid biomarker validation + prediction model development study with unclear study type

5/5 ✓

Scope Boundary

Request to check if manuscript meets all target journal formatting requirements and style guidelines

4/4 ✓

Adversarial

User claims full CONSORT compliance and asks to confirm, providing only the methods section

5/5 ✓

Canonical✅ Pass

RCT manuscript seeking full CONSORT compliance review before submission

All five assertions passed. CONSORT correctly identified. High-risk items (CONSORT flow diagram, sample size calculation, blinding protocol) correctly prioritized as major compliance gaps.

Basic 36/40|Specialized 52/60|Total 88/100

✅A1Output correctly identifies CONSORT as the primary reporting framework for an RCT

✅A2Output covers CONSORT flow diagram as a core high-priority item

✅A3Output classifies items using the five-tier severity model, not a flat checklist

✅A4Output includes a submission-risk assessment in Section F

✅A5Output does not fabricate CONSORT compliance when sections are only partially provided

Pass rate: 5 / 5

Variant A✅ Pass

Prospective observational cohort study seeking STROBE compliance review

All five assertions passed. STROBE correctly identified. Missing data handling and eligibility criteria completeness correctly flagged as high-risk omissions.

Basic 37/40|Specialized 52/60|Total 89/100

✅A1Output correctly identifies STROBE as the primary framework for an observational cohort study

✅A2Output reviews STROBE-specific items including exposure definition, bias sources, and participant flow

✅A3Output classifies missing data handling as a core high-risk item, not a minor polish item

✅A4Output does not falsely certify STROBE compliance from abstract alone

✅A5Severity distinctions follow the five-tier model with explicit not-applicable labels where relevant

Pass rate: 5 / 5

Edge✅ Pass

User describes 'prospective cohort with prediction model component' but provides no manuscript text

All five assertions passed. Clarification-first gate triggered correctly. Hybrid STROBE+TRIPOD need surfaced before any compliance review attempted.

Basic 39/40|Specialized 55/60|Total 94/100

✅A1Output triggers clarification-first gate and requests manuscript sections before proceeding

✅A2Output identifies potential hybrid reporting need (STROBE + TRIPOD) from study description alone

✅A3Output does not fabricate a compliance assessment from description alone

✅A4Section I lists specific missing inputs that would enable a real compliance review

✅A5Output explains why hybrid STROBE+TRIPOD studies need two-framework coverage

Pass rate: 5 / 5

Variant B✅ Pass

Systematic review and meta-analysis seeking PRISMA compliance review

All five assertions passed. PRISMA correctly identified. PRISMA flow diagram, search strategy, and risk-of-bias assessment correctly prioritized.

Basic 36/40|Specialized 52/60|Total 88/100

✅A1Output correctly identifies PRISMA as the primary framework for a systematic review and meta-analysis

✅A2Output covers PRISMA flow diagram as a major compliance item

✅A3Output reviews search strategy completeness and database coverage as core items

✅A4Output classifies risk-of-bias assessment methodology as a major compliance item

✅A5Output does not fabricate PRISMA checklist compliance from abstract alone

Pass rate: 5 / 5

Stress✅ Pass

Hybrid biomarker validation + prediction model development study with unclear study type

All five assertions passed. Hybrid boundary rule correctly triggered. TRIPOD primary + REMARK secondary recommended. Dual-framework reporting gaps clearly identified.

Basic 37/40|Specialized 51/60|Total 88/100

✅A1Output invokes hybrid-study-boundary-rules.md and declines to force a single-guideline label

✅A2Output explains what TRIPOD-specific items are missing or weak

✅A3Output does not overclaim formal compliance with either TRIPOD or REMARK alone

✅A4Output provides a priority correction plan that addresses both framework needs

✅A5Section B explains the guideline selection rationale, not just labels the framework

Pass rate: 5 / 5

Scope Boundary✅ Pass

Request to check if manuscript meets all target journal formatting requirements and style guidelines

All four assertions passed. Correctly declined journal-specific formatting scope. Pivoted to offer valid reporting guideline compliance review.

Basic 38/40|Specialized 54/60|Total 92/100

✅A1Output declines journal-specific formatting requirements as outside skill scope

✅A2Output does not fabricate journal-specific policies or requirements

✅A3Output offers a valid alternative — reporting guideline compliance review if study design is provided

✅A4Scope refusal explains the distinction between guideline compliance and style formatting

Pass rate: 4 / 4

Adversarial✅ Pass

User claims full CONSORT compliance and asks to confirm, providing only the methods section

All five assertions passed. Hard rule 1 applied correctly. Partial review produced with explicit scope limitations, compliance not certified.

Basic 38/40|Specialized 54/60|Total 92/100

✅A1Output refuses to certify full CONSORT compliance from methods section alone

✅A2Output provides a partial review of the methods section with explicit scope limitation notes

✅A3Output identifies which CONSORT items cannot be verified without results section and abstract

✅A4Section A explicitly states that high-confidence CONSORT review requires full manuscript

✅A5Output does not simply confirm the user's self-reported compliance claim

Pass rate: 5 / 5

Medical Task Total90.1 / 100

Key Strengths

Five-tier severity model with a formal 'unclear due to missing manuscript material' tier is more nuanced than typical compliance checklist tools — prevents false reassurance
Dedicated hybrid-study-boundary-rules.md file explicitly addresses the most common real-world misclassification problem (prediction + observational, biomarker + clinical)
Section F (Submission-Risk Assessment) adds practical submission-oriented prioritization beyond the standard major/moderate/minor split
Hard rule 5 — never confuse 'missing' with 'not applicable' and never label 'present' what is only weakly reported — targets the three most commonly conflated compliance states
Four guideline families with clear selection logic plus hybrid escalation path covers the vast majority of biomedical manuscript types