Academic Poster Generator
Complete workflow for generating academic research posters from PDF literature; use when you need to extract paper content from PDFs and produce a LaTeX-based poster (beamerposter/tikzposter/baposter) with mandatory figure generation and a final rendered HTML deliverable.
SKILL.md
When to Use
- You have one or more research paper PDFs and need a conference-style poster draft with standard sections (Intro/Methods/Results/Conclusions).
- You want to automatically extract title/authors/abstract/sections from a PDF and restructure them into concise poster bullet points.
- You need a LaTeX poster scaffold using beamerposter, tikzposter, or baposter, with standard poster sizes (A0/A1/36×48").
- You must ensure posters are visual-first (at least 2–3 generated figures) and pass an automated quality gate before delivery.
- You want a browser-viewable final output (
poster.rendered.html) rather than a compiled PDF (PDF compilation is optional).
Key Features
- End-to-end pipeline: PDF → metadata extraction → content structuring → figure generation → LaTeX poster assembly → HTML base conversion → agent rendering → final HTML.
- Scripted PDF processing:
- Convert PDF pages to images for reference.
- Extract metadata (title, authors, abstract, section structure).
- Structure content into poster-ready bullet points with word limits.
- Mandatory figure workflow:
- Generate at least 2–3 figures (schematics/flowcharts/mechanisms/comparison charts).
- Insert figures into LaTeX templates automatically or via config.
- Template support:
assets/templates/beamerposter-template.texassets/templates/tikzposter-template.texassets/templates/baposter-template.tex
- HTML-first delivery policy:
- Final deliverables:
poster.rendered.html+figures/ - Intermediate artifacts are temporary and must not be returned.
- Final deliverables:
- Quality control:
- Automated checks for HTML/LaTeX outputs.
- Manual checklist reference:
references/poster_quality_checklist.md
- Design guidance:
- Poster design principles:
references/design_principles.md
- Poster design principles:
Dependencies
Python (recommended)
- Python
>=3.8 pypdf(version not pinned)pdfplumber(version not pinned)pdf2image(version not pinned)pytesseract(version not pinned; required for OCR on scanned PDFs)pandas(version not pinned; optional for table handling)Pillow(version not pinned)matplotlib(version not pinned)
Install (example):
pip install pypdf pdfplumber pdf2image pytesseract pandas pillow matplotlib
System tools
tesseract-ocr(for OCR; required only for scanned PDFs)- Poppler utilities (commonly required by
pdf2image, e.g.,pdftoppm)
LaTeX (optional, only if compiling PDF)
- TeX Live / MiKTeX / MacTeX
- TeX packages (via
tlmgr, names may vary by distribution):beamerpostertikzposterbaposterqrcodexcolortcolorboxsubcaptiongraphics
Example:
tlmgr install beamerposter tikzposter baposter qrcode xcolor tcolorbox subcaption graphics
HTML conversion
pandoc(required for--html-onlybase HTML generation)pdf2htmlEX(optional; only if doing PDF → HTML rendering)
Example Usage
1) Run the end-to-end pipeline (recommended)
Creates a timestamped directory under output/ and supports HTML-only mode.
python scripts/run_poster_pipeline.py paper.pdf --html-only
Expected final structure:
output/YYYYMMDD_HHMMSS/
├── poster.rendered.html
└── figures/
├── mechanism.png
├── process_flowchart.png
└── comparison_chart.png
2) Run stages manually (fully reproducible)
Stage A — Extract metadata and structure content (temporary artifacts)
python scripts/extract_metadata.py paper.pdf metadata.json
python scripts/structure_content.py paper.pdf --json poster_content.json --latex content.tex --max-words 800
python scripts/convert_pdf_to_images.py paper.pdf paper_images/ --dpi 600
Stage B — Generate figures (mandatory)
At least 2–3 figures are required.
python scripts/generate_figures.py schematic "Cell signaling pathway" mechanism.png
python scripts/generate_figures.py flowchart "Step 1;Step 2;Step 3" process_flowchart.png
python scripts/generate_figures.py comparison '{"Control":[65],"Our Method":[87]}' comparison_chart.png
Or generate from a config:
python scripts/generate_figures.py config figures_config.json
Stage C — Create poster LaTeX from a template (temporary artifact)
cp assets/templates/beamerposter-template.tex poster.tex
# (Edit poster.tex: title/authors/institute + paste structured content)
Stage D — Insert figures into LaTeX (temporary artifact)
python scripts/insert_figures.py poster.tex --all
# or:
python scripts/insert_figures.py poster.tex --config figures_config.json
Stage E — Convert to base HTML and render final HTML (final artifact)
python scripts/convert_poster.py poster.tex --html-only
# Agent step: read poster.html, validate <img> paths, inject CSS/layout, output poster.rendered.html
Stage F — Quality control (mandatory)
python scripts/check_poster_quality.py poster.rendered.html
Implementation Details
Pipeline and file policy
All generated files must be placed under output/ (preferably a timestamped subdirectory). Only the following are considered final deliverables:
poster.rendered.htmlfigures/directory (PNG figures)
All other files are intermediate/temporary and must not be returned to the user, including (non-exhaustive):
poster.tex,poster.htmlmetadata.json,poster_content.json,figures_config.json,figures.json- LaTeX auxiliary files (
.aux,.log,.out, etc.)
Conceptual flow:
PDF → metadata extraction (temp) → content structuring (temp) → figure generation (final figures/)
→ LaTeX poster (temp) → base HTML (temp) → agent rendering → poster.rendered.html (final)
PDF extraction approach
- Text and tables are extracted via
pdfplumber. - For scanned PDFs, OCR is performed by converting pages to images (
pdf2image) and runningpytesseract.
Typical extraction targets:
- Title/authors (first-page patterns)
- Abstract (from “Abstract” header to next section)
- Section blocks (Introduction/Methods/Results/Conclusion)
- Tables (converted to summaries for poster bullets)
Content structuring rules
- Convert paragraphs to bullet points suitable for poster blocks.
- Enforce a poster-friendly word budget (commonly 600–800 words total).
- Prefer 3–6 key visuals; summarize tables into a small set of metrics.
Figure requirements (mandatory)
- Every poster must include at least 2–3 generated figures.
- Target 40–50% of poster area as visual content.
- Figures should be ≥300 DPI, with clear labels and concise captions.
Supported figure categories:
- Schematics (systems, pathways, conceptual diagrams)
- Flowcharts (procedures, pipelines, algorithms)
- Mechanism diagrams (biological/chemical processes)
- Comparison charts (bar charts, benchmarks)
LaTeX package selection
Use one of the supported poster packages depending on style needs:
- beamerposter (traditional academic)
- tikzposter (modern, colorful)
- baposter (multi-column layouts)
Templates are provided in assets/templates/.
HTML rendering requirements (agent step)
After pandoc produces poster.html, the renderer must:
- Verify all
<img>references exist and paths resolve tofigures/. - Apply poster-grade CSS (grid columns, typography, spacing, captions).
- Ensure accessibility: WCAG AA contrast ≥ 4.5:1.
- Output
poster.rendered.htmlas the only final HTML artifact.
Minimum layout expectations:
- 2–3 column grid on desktop; single column on narrow screens.
- Clear hierarchy: title/authors/institution, section headers, bullet lists, figure blocks with captions.
Quality control requirements (mandatory)
Run scripts/check_poster_quality.py on the final output:
- Validate that images exist and render correctly.
- Confirm layout integrity (columns/blocks).
- Ensure readability (font sizes, spacing).
- Confirm contrast compliance (≥ 4.5:1).
- Confirm figure count (≥ 2–3) and that no placeholder text remains.
References:
- Design principles:
references/design_principles.md - Manual checklist:
references/poster_quality_checklist.md