Academic Writing

limitation-and-risk-writer

Acknowledges limitations in sample, design, measurement, and validation in a professional way that improves credibility without undermining the whole paper.

90100Total Score

Core Capability

92 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

16 / 16

Human Usability

7 / 8

Security

12 / 12

Maintainability

8 / 12

Agent-Specific

18 / 20

Medical Task

23 / 25 Passed

90Manuscript limitations paragraph for retrospective cohort study on statin use and CRC risk: (1) retrospective design with unmeasured confounders, (2) single-center tertiary data, (3) ICD-code-based statin exposure

5/5

90NIH R01 grant risk section: (1) enrollment risk (rare disease, 2 sites), (2) biomarker assay not yet CLIA-certified, (3) mouse model fidelity to human phenotype

5/5

84Vague input: 'Help me write my limitations section. My sample is too small.'

4/5

90Reviewer rebuttal: Reviewer 2 states 'The sample size of n=47 is too small to draw meaningful conclusions'

5/5

89Rewrite a self-defeating limitations section that catastrophizes all findings and ends with 'these limitations make it hard to trust our conclusions'

4/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated references, PMIDs, DOIs, or statistical data. Hard rules explicitly prohibit fabricating limitations and making validity judgments without user-provided grounds.
Practice Boundaries	PASS	No diagnostic or prescriptive clinical conclusions. Input validation section explicitly excludes clinical advice about whether study weaknesses invalidate conclusions for patient care.
Methodological Ground	PASS	No methodological fallacies. The acknowledge+impact+mitigation formula correctly separates limitation acknowledgment from invalidation claims.
Code Usability	N/A	No executable code generated; Mode A direct-execution skill.

Core Capability92 / 100 — 8 Categories

Functional Suitability

Full marks. 5-step workflow covers all three context types (manuscript, grant, rebuttal). 6-category limitation taxonomy is comprehensive. Phrase bank with worked examples prevents common writing errors. Calibration checklist prevents premature delivery.

12 / 12

100%

Reliability

Input validation section provides clear scope refusal template. Step 1 asks follow-up questions when input is vague (e.g., 'small sample' triggers specificity questions). Minor deduction: no guidance for handling disagreement when user disputes a limitation classification.

11 / 12

92%

Performance & Context

Full marks. Formula-driven output is concise and predictable. Phrase bank prevents redundant content generation. Step 1 input collection upfront minimizes rework.

8 / 8

100%

Agent Usability

Full marks. Trigger phrases cover all three workflow types. Category taxonomy is clearly labeled. Avoidance phrases prevent the two most common errors (dismissive and catastrophizing). Calibration checklist enforces consistent tone.

16 / 16

100%

Human Usability

Five explicit 'When to Use' scenarios make entry points clear. Input validation refusal template is graceful. Minor deduction: no re-framing option described if the user disagrees with how a limitation was categorized or worded.

7 / 8

88%

Security

Full marks. Hard rules explicitly prohibit fabricating limitations, suggesting invalidity without grounds, and inflating study strengths within limitation statements.

12 / 12

100%

Maintainability

All substantive content is inline in SKILL.md with no modular reference file separation. The only reference file (audit-reference.md) is stale — it references 'python scripts/main.py' which does not exist, misidentifies the skill as 'study-limitations-drafter', and contains no substantive limitation-writing content. This creates a false Mode B indicator risk and should be corrected.

8 / 12

67%

Agent-Specific

Trigger precision is excellent with three distinct context types. Formula ensures idempotent output. Escape hatches (input validation + scope refusal template) are well-designed. Minor deduction on composability: no explicit integration with discussion-composer or revision-strategy-planner despite being a natural workflow neighbor.

18 / 20

90%

Core Capability Total92 / 100

Medical TaskExecution Average: 88.6 / 100 — Assertions: 23/25 Passed

Canonical

Manuscript limitations paragraph for retrospective cohort study on statin use and CRC risk: (1) retrospective design with unmeasured confounders, (2) single-center tertiary data, (3) ICD-code-based statin exposure

5/5 ✓

Variant A

NIH R01 grant risk section: (1) enrollment risk (rare disease, 2 sites), (2) biomarker assay not yet CLIA-certified, (3) mouse model fidelity to human phenotype

5/5 ✓

Edge

Vague input: 'Help me write my limitations section. My sample is too small.'

4/5 ✓

Variant B

Reviewer rebuttal: Reviewer 2 states 'The sample size of n=47 is too small to draw meaningful conclusions'

5/5 ✓

Stress

Rewrite a self-defeating limitations section that catastrophizes all findings and ends with 'these limitations make it hard to trust our conclusions'

4/5 ✓

Canonical✅ Pass

Step 2 correctly classifies all three limitations (Design, Sample, Measurement). Formula applied to each with acknowledge+impact+mitigation structure. Ordered by impact severity. Forward-looking future direction statement closes the paragraph.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Each limitation classified into correct category (Design / Sample / Measurement)

✅A2Each limitation statement follows acknowledge+impact+mitigation formula

✅A3No fabricated limitations added beyond the three user-specified ones

✅A4Limitations ordered from highest to lowest impact per Step 4 guidance (design first, then sample, then measurement)

✅A5Paragraph closes with a forward-looking statement about future study design needs

Pass rate: 5 / 5

Variant A✅ Pass

NIH R01 grant risk section: (1) enrollment risk (rare disease, 2 sites), (2) biomarker assay not yet CLIA-certified, (3) mouse model fidelity to human phenotype

Grant tone correctly applied: limitations framed as 'challenges' with mitigation plans. Each risk paired with a contingency measure (additional site recruitment, CLIA timeline, complementary in vitro validation). No catastrophizing language.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Grant tone applied — limitations framed as 'challenges' with mitigation plans, not as deficiencies

✅A2Each challenge paired with a concrete mitigation measure

✅A3No language suggests the grant is non-competitive due to these risks

✅A4Mouse model limitation correctly framed as a known boundary with a validation strategy rather than a fatal flaw

✅A5No limitations fabricated beyond the three user-specified risks

Pass rate: 5 / 5

Edge✅ Pass

Vague input: 'Help me write my limitations section. My sample is too small.'

Step 1 correctly triggers follow-up questions: What was the sample size? What minimum would have been adequate? What study type? No limitation statement produced from vague input alone. Input validation scope refusal template not needed (input is in-scope but vague, not off-scope).

Basic 35/40|Specialized 49/60|Total 84/100

✅A1Skill correctly declines to produce limitation statement from vague 'small sample' input

✅A2Follow-up questions ask for sample size, minimum adequate size, and study type

✅A3No fabricated limitation statement generated from the vague input

✅A4Response explains why specifics are needed before writing the limitation

❌A5Response offers a provisional limitation formula template the user could fill in if they prefer a faster path

Pass rate: 4 / 5

Variant B✅ Pass

Reviewer rebuttal: Reviewer 2 states 'The sample size of n=47 is too small to draw meaningful conclusions'

Rebuttal tone correctly applied. Three-part structure: acknowledgment of sample constraint + contextualization of why core finding remains valid + specific analytical or textual revision offered. No catastrophizing. No dismissal of the reviewer's concern.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Rebuttal structure applied: acknowledgment + contextualization + revision offered

✅A2n=47 constraint is acknowledged without dismissal ('this limitation is common to all studies')

✅A3Reviewer's concern contextualized without invalidating the core finding

✅A4Concrete revision offered (e.g., text addition acknowledging power limitation, or additional sensitivity analysis)

✅A5Rebuttal does not use limitation response as opportunity to overclaim study strengths

Pass rate: 5 / 5

Stress✅ Pass

Rewrite a self-defeating limitations section that catastrophizes all findings and ends with 'these limitations make it hard to trust our conclusions'

All catastrophizing language identified and replaced using the formula. Each limitation rewritten with impact+mitigation structure. The 'hard to trust our conclusions' closing statement removed and replaced with a forward-looking direction. Minor issue: checklist step 5 (tone consistency) may not fully address residual hedging language in rewritten output.

Basic 37/40|Specialized 52/60|Total 89/100

✅A1All catastrophizing phrases identified and flagged before rewriting

✅A2Rewritten limitations follow the acknowledge+impact+mitigation formula

✅A3The 'hard to trust our conclusions' closing removed and replaced with a forward-looking statement

✅A4Rewritten output does not introduce new limitations that were not in the original text

❌A5Tone consistency across all rewritten limitation statements verified by calibration checklist

Pass rate: 4 / 5

Medical Task Total88.6 / 100

Key Strengths

Acknowledge+impact+mitigation formula prevents dead-end limitation statements and ensures every acknowledged weakness has a professional framing
Three-context routing (manuscript / grant / reviewer rebuttal) with distinct tone calibration rules is a practical differentiator covering the full academic workflow
Avoidance phrase list for both dismissive and catastrophizing language is a precise quality-control mechanism
Input validation scope refusal template correctly excludes clinical validity judgments while remaining helpful for writing tasks