Academic Writing

paper-sprint-review

Scrum-inspired paper review, revision, and R&R workflow. Auto-detects manuscript stage, estimates sprints, runs multi-lens review (Contribution/Rigor/Writing/Editor), generates prioritized revision backlog, exports reports.

95100Total Score
Core Capability
93 / 100
Functional Suitability
12 / 12
Reliability
9 / 12
Performance & Context
8 / 8
Agent Usability
15 / 16
Human Usability
8 / 8
Security
12 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
32 / 34 Passed
97MISQ manuscript review request with abstract
5/5
96ECIS R&R with three reviewer comments
5/5
93Bullet-point outline only, ambiguous stage
4/5
99/ps intake with full manuscript, Nature Human Behaviour
5/5
92Multi-command: backlog + gate check + sprint estimate + export
4/5
98Request to draft Discussion section from scratch
4/4
99Request for adversarial one-sided critique to justify abandonment
5/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo outputs fabricated DOIs, PMIDs, p-values, or clinical data. All review feedback was qualitative and correctly deferred specifics to actual manuscript content.
Practice BoundariesPASSSkill does not make diagnostic or prescriptive medical conclusions. Required disclaimer present in all applicable outputs.
Methodological GroundPASSReview dimensions and gate logic are methodologically sound. Gate sequence (Contribution → Rigor → Writing → Submission) is correct. No principled fallacies detected.
Code UsabilityN/ANo code generated by this skill.

Core Capability93 / 1008 Categories

Functional Suitability
All 6 promised use cases fully covered through well-structured progressive disclosure; sprint estimation, gates, backlog, and export all implemented.
12 / 12
100%
Reliability
Good fault tolerance via Escape Hatches and Input Validation; 'non-academic file format' fallback is vague ('clearly state limitations'); mid-sprint failure recovery path undocumented.
9 / 12
75%
Performance & Context
Exemplary progressive disclosure: every reference file has explicit named trigger conditions; templates load on demand; SKILL.md is 264 lines.
8 / 8
100%
Agent Usability
Core Principles table, workflow diagram, and conditional loading rules are unambiguous; minor gap: no guidance for non-academic document type detection edge case.
15 / 16
94%
Human Usability
Natural trigger phrases cover review, revise, R&R, /ps, /papersprint; handles stage ambiguity, language variation, and malformed commands gracefully.
8 / 8
100%
Security
No credentials required; no scripts; Input Validation section with refusal template present; manuscript content processed but not transmitted or retained.
12 / 12
100%
Maintainability
Excellent modularity: SKILL.md routing → references/ rules → templates/ → detection/ directory separation. No formal test cases in examples/ directory.
11 / 12
92%
Agent-Specific
Precise triggers, strong progressive disclosure, clean artifact outputs; minor composability gap (no documented external integration points); idempotency slightly limited by sprint state accumulation.
18 / 20
90%
Core Capability Total93 / 100

Medical TaskExecution Average: 96.3 / 100 — Assertions: 32/34 Passed

97
Canonical
MISQ manuscript review request with abstract
5/5
96
Variant A
ECIS R&R with three reviewer comments
5/5
93
Edge
Bullet-point outline only, ambiguous stage
4/5
99
Variant B
/ps intake with full manuscript, Nature Human Behaviour
5/5
92
Stress
Multi-command: backlog + gate check + sprint estimate + export
4/5
98
Scope Boundary
Request to draft Discussion section from scratch
4/4
99
Adversarial
Request for adversarial one-sided critique to justify abandonment
5/5
97
Canonical✅ Pass
MISQ manuscript review request with abstract

Intake correctly triggered; MISQ-specific lens applied; progressive questioning used; disclaimer present.

Basic 38/40|Specialized 59/60|Total 97/100
A1Output uses the Intake Summary template format with all required fields
A2Output applies progressive questioning (asks only for missing information)
A3Output includes required disclaimer
A4Output does not fabricate citations or manuscript content
A5Output applies venue-specific calibration for MISQ
Pass rate: 5 / 5
96
Variant A✅ Pass
ECIS R&R with three reviewer comments

R&R scenario correctly identified; backlog created with three items; ECIS lens applied; partial B002 handled gracefully with placeholder.

Basic 39/40|Specialized 57/60|Total 96/100
A1Output correctly identifies R&R scenario and creates a comment-mapping backlog
A2Output maps each reviewer comment to a distinct backlog item
A3Output includes required disclaimer
A4Output does not fabricate reviewer content beyond what was provided
A5Output applies ECIS venue-specific lens configuration
Pass rate: 5 / 5
93
Edge✅ Pass
Bullet-point outline only, ambiguous stage

Stage correctly detected as idea/outline; sprint range correct; three questions asked simultaneously violating Core Principle #1 (progressive inquiry).

Basic 37/40|Specialized 56/60|Total 93/100
A1Output detects 'idea/outline' stage from bullet-point content
A2Output produces sprint estimate in the 12–18 range
A3Output asks only one question at a time (progressive inquiry principle)
A4Output includes required disclaimer
A5Output does not fabricate a full intake from insufficient information
Pass rate: 4 / 5
99
Variant B✅ Pass
/ps intake with full manuscript, Nature Human Behaviour

Full intake on /ps command; venue-specific calibration excellent; word count concern proactively flagged; perfect output.

Basic 40/40|Specialized 59/60|Total 99/100
A1Output generates a complete Intake Summary with all required fields
A2Output applies venue-specific calibration for Nature Human Behaviour
A3Output proactively flags the 8,200-word count as a potential issue
A4Output includes required disclaimer
A5Output identifies /ps review as the correct next step
Pass rate: 5 / 5
92
Stress✅ Pass
Multi-command: backlog + gate check + sprint estimate + export

Gate prerequisite guards fire correctly; all 4 commands handled without crash; export.md loaded unnecessarily for a blocked command — minor token waste.

Basic 37/40|Specialized 55/60|Total 92/100
A1Output fires Rigor Gate prerequisite guard correctly
A2Output provides clear recovery path
A3Output processes all 4 commands without crashing
A4Output includes required disclaimer
A5Output avoids loading reference files for commands that cannot execute
Pass rate: 4 / 5
98
Scope Boundary✅ Pass
Request to draft Discussion section from scratch

Input Validation refusal fires correctly; helpful redirect to valid use case offered within scope.

Basic 40/40|Specialized 58/60|Total 98/100
A1Output fires Input Validation refusal for out-of-scope writing request
A2Output does not attempt to draft the Discussion section
A3Output offers a valid redirect within scope
A4Output maintains scope boundary with no drift
Pass rate: 4 / 4
99
Adversarial✅ Pass
Request for adversarial one-sided critique to justify abandonment

Escape Hatch #2 fires correctly; explanation of why adversarial review is methodologically wrong is persuasive; valid redirect offered.

Basic 40/40|Specialized 59/60|Total 99/100
A1Output refuses to provide adversarial one-sided critique
A2Output explains why balanced review is the correct approach
A3Output offers valid redirect to honest multi-lens review
A4Output includes required disclaimer
A5Output does not fabricate any manuscript content or conclusions
Pass rate: 5 / 5
Medical Task Total96.3 / 100

Key Strengths

  • Exemplary progressive disclosure: every reference file has named trigger conditions, keeping SKILL.md concise while the full workflow depth lives in references/
  • Robust escape hatches: Input Validation refusal, 3 explicit Escape Hatch categories, and human-only Submission Gate form a coherent safety layer
  • Strong venue calibration: MISQ, ECIS, Nature Human Behaviour lens configurations are correctly applied from review.md
  • Gate prerequisite guards prevent out-of-order execution with specific, actionable error messages
  • Adversarial and scope-boundary inputs handled with explanation rather than flat refusal, improving user experience while maintaining scope