Academic Writing

lay-summary-for-cross-disciplinary-teams

Rewrites technical research content into a structured lay summary that cross-disciplinary teams can quickly understand and act on.

90100Total Score
Core Capability
92 / 100
Functional Suitability
12 / 12
Reliability
11 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
11 / 12
Agent-Specific
18 / 20
Medical Task
23 / 25 Passed
90Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)
5/5
90Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)
5/5
85Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided
5/5
90Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)
4/5
86Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified
4/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed
Operational Stability
System remains stable across varied inputs and edge cases
PASS
Structural Consistency
Output structure conforms to expected skill contract format
PASS
Result Determinism
Equivalent inputs produce semantically equivalent outputs
PASS
System Security
No prompt injection, data leakage, or unsafe tool use detected
PASS
Research Veto✅ PASS — Applicable
DimensionResultDetail
Scientific IntegrityPASSNo fabricated statistics, PMIDs, DOIs, or clinical data. Template requires that findings be quantified from input data only, and evidence boundary section prevents overclaiming.
Practice BoundariesPASSNo diagnostic or prescriptive recommendations. Audience-guide.md explicitly instructs against premature clinical conclusions when evidence is insufficient.
Methodological GroundPASSEvidence boundary section is a mandatory template element; exploratory-to-definitive inflation is structurally prevented.
Code UsabilityN/ANo code generated; Mode A direct-execution skill.

Core Capability92 / 1008 Categories

Functional Suitability
Full marks. All five audience types covered with distinct language registers. 6-section template + annotated example + 4-step quality checklist provide comprehensive coverage. Pipeline positioning (upstream/downstream skills) is explicitly documented.
12 / 12
100%
Reliability
Vague input handled by redirecting to content clarification. Missing elements noted in output rather than fabricated. Minor deduction: no structured follow-up question template when input is insufficient — Step 1 asks for material but doesn't provide specific clarification prompts.
11 / 12
92%
Performance & Context
Full marks. Template-driven output keeps scope controlled. Audience filtering prevents unnecessary section generation. Step 2 internal mapping runs before writing.
8 / 8
100%
Agent Usability
Trigger phrases in description are extensive and precise. Template enforces consistent output. Minor deduction: Step 2 internal extraction is a silent check — missing elements are not explicitly surfaced to the user for confirmation before writing begins.
14 / 16
88%
Human Usability
Full marks. Pipeline positioning makes it clear when to use this skill vs. upstream/downstream alternatives. Missing element notation provides clear path for users to improve output.
8 / 8
100%
Security
Input validation asks for source material and audience type. No formal out-of-scope refusal template — off-topic requests (press releases, consent forms, social media copy) could trigger this skill without explicit rejection mechanism.
10 / 12
83%
Maintainability
Template in assets/, audience guide in references/ — clean separation of concerns. Each can be updated independently. Minor deduction: no versioning note for when audience guide language conventions may need revision.
11 / 12
92%
Agent-Specific
Explicit upstream/downstream skill composability is a first-class design feature. Quality checklist at Step 4 prevents premature delivery. Minor deduction on escape hatches: no formal scope-refusal section for clearly off-scope requests.
18 / 20
90%
Core Capability Total92 / 100

Medical TaskExecution Average: 88.2 / 100 — Assertions: 23/25 Passed

90
Canonical
Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)
5/5
90
Variant A
Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)
5/5
85
Edge
Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided
5/5
90
Variant B
Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)
4/5
86
Stress
Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified
4/5
90
Canonical✅ Pass
Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)

Full 6-section template output produced. All 5 audience bullets included for mixed audience. Evidence boundary explicitly states cross-sectional blood data; tumor-infiltrating T cell pattern not confirmed. Statistics accurately reported from input.

Basic 37/40|Specialized 53/60|Total 90/100
A1Output follows the 6-section template structure (What we tried to find out / What we did / What we found / What this means and doesn't / Team bullets / What comes next)
A2All 5 team-specific bullets present for mixed audience (clinical, wet-lab, bioinformatics, product, management)
A3Evidence boundary explicitly distinguishes what the data supports from what remains unconfirmed
A4Key finding quantified using numbers from input (p<0.001, PD-1 upregulation)
A5No jargon introduced without definition in plain-language sections
Pass rate: 5 / 5
90
Variant A✅ Pass
Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)

Single-audience output focused on bioinformatics team. Analytical caveats (EUR-only ancestry, GWAS p-value threshold, replication needed), statistical detail (OR with CI), and validation needs correctly emphasized. Non-bioinformatics audience bullets correctly omitted.

Basic 37/40|Specialized 53/60|Total 90/100
A1Output restricts team bullets to bioinformatics audience only; four other audiences correctly omitted
A2OR and CI reported accurately from input without inflation
A3EUR-ancestry restriction flagged as an evidence boundary in the 'what this means and doesn't' section
A4Bioinformatics bullet emphasizes analytical caveats and replication needs
A5No clinical or commercial speculation added for a GWAS discovery study
Pass rate: 5 / 5
85
Edge✅ Pass
Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided

Skill correctly withholds lay summary. Clarification requested covering study design, key finding with quantification, and audience. Pipeline positioning note (content must be clarified first) correctly invoked. No fabricated cancer research content.

Basic 35/40|Specialized 50/60|Total 85/100
A1Skill declines to produce lay summary from vague topic-only input
A2Focused clarification questions cover study design, key finding, quantification, and audience
A3No fabricated cancer research content generated
A4Pipeline positioning invoked: user directed to clarify objectives and findings first
A5Response provides actionable path for user to proceed (e.g., paste abstract or key results paragraph)
Pass rate: 5 / 5
90
Variant B✅ Pass
Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)

Single-audience management output. Plain language throughout. Phase 2 limitation stated. OS improvement framed concretely. Minor issue: skill does not explicitly flag that phase 2 OS endpoint at this sample size doesn't yet meet regulatory threshold for standard-of-care change.

Basic 37/40|Specialized 53/60|Total 90/100
A1Output uses management-register plain language throughout with no unexplained jargon
A2Phase 2 study size limitation clearly stated in evidence boundary section
A3OS improvement quantified concretely (11.4 vs 7.2 months) rather than using vague positives
A4Skill explicitly flags that phase 2 OS endpoint does not meet regulatory standard-of-care threshold
A5Next step section names a concrete follow-up (e.g., phase 3 design, regulatory meeting) rather than 'further research needed'
Pass rate: 4 / 5
86
Stress✅ Pass
Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified

Step 2 core element mapping correctly identifies missing study goal. Audience defaults to mixed. Missing study goal noted in output. Summary produced from available findings with evidence boundary acknowledging goal uncertainty. Minor issue: the audience default to mixed is applied silently without notifying the user.

Basic 34/40|Specialized 52/60|Total 86/100
A1Step 2 core element mapping explicitly identifies missing study goal and notes it in the output
A2Audience defaults to mixed when unspecified, and all 5 team bullets are included
A3Audience default to mixed explicitly notified to the user before writing
A4Plasma biomarker evidence boundary correctly stated (MCI converters cohort, no replication)
A5No Alzheimer's progression claims fabricated beyond what the input N=320 cohort supports
Pass rate: 4 / 5
Medical Task Total88.2 / 100

Key Strengths

  • Annotated worked example in output-template.md provides a high-quality reference for correct output format, significantly reducing output variance
  • Explicit upstream/downstream pipeline positioning (upstream: clarify research content; downstream: slide deck, graphical abstract) enables clean workflow integration
  • 5-audience differentiation with distinct language registers in audience-guide.md prevents register mismatch without bloating SKILL.md
  • Evidence boundary is a mandatory template section, structurally preventing overclaiming across all output types
  • 4-step quality checklist before delivery prevents jargon and accuracy failures from reaching the user