Academic Writing

lay-summary-for-cross-disciplinary-teams

Rewrites technical research content into a structured lay summary that cross-disciplinary teams can quickly understand and act on.

90100Total Score

Core Capability

92 / 100

Functional Suitability

12 / 12

Reliability

11 / 12

Performance & Context

8 / 8

Agent Usability

14 / 16

Human Usability

8 / 8

Security

10 / 12

Maintainability

11 / 12

Agent-Specific

18 / 20

Medical Task

23 / 25 Passed

90Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)

5/5

90Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)

5/5

85Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided

5/5

90Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)

4/5

86Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified

4/5

Veto GatesRequired pass for any deployment consideration

Skill Veto✓ All 4 gates passed

✓

Operational Stability

System remains stable across varied inputs and edge cases

PASS

✓

Structural Consistency

Output structure conforms to expected skill contract format

PASS

✓

Result Determinism

Equivalent inputs produce semantically equivalent outputs

PASS

✓

System Security

No prompt injection, data leakage, or unsafe tool use detected

PASS

Research Veto✅ PASS — Applicable

Dimension	Result	Detail
Scientific Integrity	PASS	No fabricated statistics, PMIDs, DOIs, or clinical data. Template requires that findings be quantified from input data only, and evidence boundary section prevents overclaiming.
Practice Boundaries	PASS	No diagnostic or prescriptive recommendations. Audience-guide.md explicitly instructs against premature clinical conclusions when evidence is insufficient.
Methodological Ground	PASS	Evidence boundary section is a mandatory template element; exploratory-to-definitive inflation is structurally prevented.
Code Usability	N/A	No code generated; Mode A direct-execution skill.

Core Capability92 / 100 — 8 Categories

Functional Suitability

Full marks. All five audience types covered with distinct language registers. 6-section template + annotated example + 4-step quality checklist provide comprehensive coverage. Pipeline positioning (upstream/downstream skills) is explicitly documented.

12 / 12

100%

Reliability

Vague input handled by redirecting to content clarification. Missing elements noted in output rather than fabricated. Minor deduction: no structured follow-up question template when input is insufficient — Step 1 asks for material but doesn't provide specific clarification prompts.

11 / 12

92%

Performance & Context

Full marks. Template-driven output keeps scope controlled. Audience filtering prevents unnecessary section generation. Step 2 internal mapping runs before writing.

8 / 8

100%

Agent Usability

Trigger phrases in description are extensive and precise. Template enforces consistent output. Minor deduction: Step 2 internal extraction is a silent check — missing elements are not explicitly surfaced to the user for confirmation before writing begins.

14 / 16

88%

Human Usability

Full marks. Pipeline positioning makes it clear when to use this skill vs. upstream/downstream alternatives. Missing element notation provides clear path for users to improve output.

8 / 8

100%

Security

Input validation asks for source material and audience type. No formal out-of-scope refusal template — off-topic requests (press releases, consent forms, social media copy) could trigger this skill without explicit rejection mechanism.

10 / 12

83%

Maintainability

Template in assets/, audience guide in references/ — clean separation of concerns. Each can be updated independently. Minor deduction: no versioning note for when audience guide language conventions may need revision.

11 / 12

92%

Agent-Specific

Explicit upstream/downstream skill composability is a first-class design feature. Quality checklist at Step 4 prevents premature delivery. Minor deduction on escape hatches: no formal scope-refusal section for clearly off-scope requests.

18 / 20

90%

Core Capability Total92 / 100

Medical TaskExecution Average: 88.2 / 100 — Assertions: 23/25 Passed

Canonical

Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)

5/5 ✓

Variant A

Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)

5/5 ✓

Edge

Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided

5/5 ✓

Variant B

Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)

4/5 ✓

Stress

Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified

4/5 ✓

Canonical✅ Pass

Mixed-audience lay summary for scRNA-seq study on CD8+ T cell exhaustion in early NSCLC (42 cancer patients, 20 controls; Tex population significantly expanded, PD-1 upregulated)

Full 6-section template output produced. All 5 audience bullets included for mixed audience. Evidence boundary explicitly states cross-sectional blood data; tumor-infiltrating T cell pattern not confirmed. Statistics accurately reported from input.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Output follows the 6-section template structure (What we tried to find out / What we did / What we found / What this means and doesn't / Team bullets / What comes next)

✅A2All 5 team-specific bullets present for mixed audience (clinical, wet-lab, bioinformatics, product, management)

✅A3Evidence boundary explicitly distinguishes what the data supports from what remains unconfirmed

✅A4Key finding quantified using numbers from input (p<0.001, PD-1 upregulation)

✅A5No jargon introduced without definition in plain-language sections

Pass rate: 5 / 5

Variant A✅ Pass

Bioinformatics-only audience for GWAS study on 12 novel T2D loci from UKBB (450,000 participants, EUR ancestry; lead SNP OR 1.23, 95% CI 1.18-1.29)

Single-audience output focused on bioinformatics team. Analytical caveats (EUR-only ancestry, GWAS p-value threshold, replication needed), statistical detail (OR with CI), and validation needs correctly emphasized. Non-bioinformatics audience bullets correctly omitted.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Output restricts team bullets to bioinformatics audience only; four other audiences correctly omitted

✅A2OR and CI reported accurately from input without inflation

✅A3EUR-ancestry restriction flagged as an evidence boundary in the 'what this means and doesn't' section

✅A4Bioinformatics bullet emphasizes analytical caveats and replication needs

✅A5No clinical or commercial speculation added for a GWAS discovery study

Pass rate: 5 / 5

Edge✅ Pass

Vague input: 'Can you write a lay summary of my research on cancer?' — no study design, findings, or audience provided

Skill correctly withholds lay summary. Clarification requested covering study design, key finding with quantification, and audience. Pipeline positioning note (content must be clarified first) correctly invoked. No fabricated cancer research content.

Basic 35/40|Specialized 50/60|Total 85/100

✅A1Skill declines to produce lay summary from vague topic-only input

✅A2Focused clarification questions cover study design, key finding, quantification, and audience

✅A3No fabricated cancer research content generated

✅A4Pipeline positioning invoked: user directed to clarify objectives and findings first

✅A5Response provides actionable path for user to proceed (e.g., paste abstract or key results paragraph)

Pass rate: 5 / 5

Variant B✅ Pass

Management/leadership audience for phase 2 RCT in locally advanced pancreatic cancer (ZZZ drug; OS improved 7.2 to 11.4 months, HR 0.68, p=0.031; SAE rate comparable)

Single-audience management output. Plain language throughout. Phase 2 limitation stated. OS improvement framed concretely. Minor issue: skill does not explicitly flag that phase 2 OS endpoint at this sample size doesn't yet meet regulatory threshold for standard-of-care change.

Basic 37/40|Specialized 53/60|Total 90/100

✅A1Output uses management-register plain language throughout with no unexplained jargon

✅A2Phase 2 study size limitation clearly stated in evidence boundary section

✅A3OS improvement quantified concretely (11.4 vs 7.2 months) rather than using vague positives

❌A4Skill explicitly flags that phase 2 OS endpoint does not meet regulatory standard-of-care threshold

✅A5Next step section names a concrete follow-up (e.g., phase 3 design, regulatory meeting) rather than 'further research needed'

Pass rate: 4 / 5

Stress✅ Pass

Partially defined multi-cohort proteomics study on Alzheimer's biomarkers (plasma GFAP and p-tau 217 in MCI converters, N=320) — study goal not formally defined, audience unspecified

Step 2 core element mapping correctly identifies missing study goal. Audience defaults to mixed. Missing study goal noted in output. Summary produced from available findings with evidence boundary acknowledging goal uncertainty. Minor issue: the audience default to mixed is applied silently without notifying the user.

Basic 34/40|Specialized 52/60|Total 86/100

✅A1Step 2 core element mapping explicitly identifies missing study goal and notes it in the output

✅A2Audience defaults to mixed when unspecified, and all 5 team bullets are included

❌A3Audience default to mixed explicitly notified to the user before writing

✅A4Plasma biomarker evidence boundary correctly stated (MCI converters cohort, no replication)

✅A5No Alzheimer's progression claims fabricated beyond what the input N=320 cohort supports

Pass rate: 4 / 5

Medical Task Total88.2 / 100

Key Strengths

Annotated worked example in output-template.md provides a high-quality reference for correct output format, significantly reducing output variance
Explicit upstream/downstream pipeline positioning (upstream: clarify research content; downstream: slide deck, graphical abstract) enables clean workflow integration
5-audience differentiation with distinct language registers in audience-guide.md prevents register mismatch without bloating SKILL.md
Evidence boundary is a mandatory template section, structurally preventing overclaiming across all output types
4-step quality checklist before delivery prevents jargon and accuracy failures from reaching the user