Agent Skills

Content Proofreading

AIPOCH

An academic proofreading skill for Chinese/English manuscripts, triggered when you need automated checks for spelling, grammar, terminology consistency, and formatting before submission.

2
0
FILES
content-proofreading/
skill.md
scripts
annotation_generator.py
chinese_checker.py
english_checker.py
init_run.py
terminology_manager.py
word_converter.py
assets
terminology
biology.json
sample.txt
test1_report.json
test2_report.json
test3_report.json
test4_report.json
test5_report.json
88100Total Score
View Evaluation Report
Core Capability
79 / 100
Functional Suitability
10 / 12
Reliability
9 / 12
Performance & Context
8 / 8
Agent Usability
12 / 16
Human Usability
7 / 8
Security
9 / 12
Maintainability
9 / 12
Agent-Specific
15 / 20
Medical Task
20 / 20 Passed
98You are preparing an academic paper for journal/conference submission and need a final language + formatting pass
4/4
94You have bilingual (Chinese/English) content and want consistent punctuation, wording, and style across both languages
4/4
92English checks
4/4
92Spelling (including US/UK variants)
4/4
92End-to-end case for English checks
4/4

SKILL.md

When to Use

  • You are preparing an academic paper for journal/conference submission and need a final language + formatting pass.
  • You have bilingual (Chinese/English) content and want consistent punctuation, wording, and style across both languages.
  • Your manuscript contains domain terminology (e.g., life sciences) and you need consistent Chinese–English term mapping and abbreviation rules.
  • You need to validate references, numbers/units, and heading levels against a required style (APA/MLA/GB/T 7714).
  • You want a shareable report (HTML or Markdown annotations) with precise error locations and revision suggestions.

Key Features

  • English checks

    • Spelling (including US/UK variants)
    • Grammar (agreement, tense, articles, clause structure)
    • Punctuation conventions (US/UK)
    • Style suggestions (redundancy detection, passive voice optimization)
  • Chinese checks

    • Typo/misused character detection (dictionary-based)
    • Grammar and collocation checks
    • Chinese vs. English punctuation normalization
    • Academic expression optimization suggestions
  • Terminology consistency

    • Domain terminology database (life sciences by default)
    • Bidirectional Chinese–English correspondence checks
    • Abbreviation rules (require full form on first occurrence)
    • Synonym unification to preferred standard terms
  • Formatting checks

    • Reference style validation (APA/MLA/GB/T 7714, etc.)
    • Number and unit normalization
    • Heading level consistency
    • Abbreviation consistency across the document
  • Reporting

    • HTML interactive report or Markdown annotations
    • Precise error localization
    • Actionable revision suggestions

Dependencies

  • Python: >= 3.8

  • Python packages (install via pip install -r requirements.txt)

    • languagetool-python (version: see requirements.txt) — English grammar checking
    • opencc (version: see requirements.txt) — Traditional/Simplified Chinese conversion
    • jieba (version: see requirements.txt) — Chinese tokenization
    • pyenchant (version: see requirements.txt) — spelling checks
    • markdown (version: see requirements.txt) — Markdown rendering
    • python-docx (version: see requirements.txt) — .docx reading
    • docx2pdf (version: see requirements.txt) — Word-to-PDF conversion

Example Usage

1) Install

python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -r requirements.txt

2) Run (basic)

python scripts/init_run.py --input <paper_file_path> --output <output_path>

3) Run (advanced)

python scripts/init_run.py \
  --input paper.md \
  --output report.html \
  --lang en \
  --style apa \
  --terminology biology \
  --format html

4) CLI parameters

ParameterDescriptionDefault
--inputInput file pathRequired
--outputOutput report pathGenerates an HTML report by default
--langLanguage to check (en / zh / both)both
--styleReference style (apa / mla / gb)apa
--terminologyDomain terminology setbiology
--formatOutput format (html / markdown)html
--no-pdfSkip PDF generation during Word→PDF conversionfalse

5) Use as a Python module (end-to-end)

from scripts.english_checker import EnglishChecker
from scripts.chinese_checker import ChineseChecker
from scripts.terminology_manager import TerminologyManager
from scripts.annotation_generator import AnnotationGenerator

text = """
Messenger RNA (mRNA) is transcribed in the nucleus.
"""

en_checker = EnglishChecker()
zh_checker = ChineseChecker()
term_manager = TerminologyManager(domain="biology")

results = []
results.extend(en_checker.check(text))
results.extend(zh_checker.check(text))
results.extend(term_manager.check(text))

generator = AnnotationGenerator(output_format="html")
report = generator.generate(results)

with open("report.html", "w", encoding="utf-8") as f:
    f.write(report)

Implementation Details

Architecture / Core Modules

  • english_checker.py

    • Core engine for English spelling/grammar/style checks.
    • Designed to be rule-extensible (add or register new rule sets).
  • chinese_checker.py

    • Core engine for Chinese typo/grammar/style checks.
    • Includes a library of common academic writing error patterns.
  • terminology_manager.py

    • Terminology database management (import/export/query/update).
    • Performs term consistency checks, bilingual mapping validation, and abbreviation policy checks.
  • annotation_generator.py

    • Converts detected issues into a visual report (HTML) or annotated Markdown.
    • Ensures issues include location, type, and suggested fix.
  • word_converter.py

    • Extracts text from .docx.
    • Optionally converts Word to PDF (can be disabled via --no-pdf).

Terminology database format (JSON)

Organized by domain; each entry can include bilingual forms and abbreviation metadata:

{
  "biology": {
    "cell": {
      "en": "cell",
      "abbrev": null,
      "full_form": null
    },
    "mrna": {
      "en": "mRNA",
      "abbrev": "mRNA",
      "full_form": "messenger RNA"
    }
  }
}

Checking logic (typical):

  • If an abbreviation (e.g., mRNA) appears, verify the full form appears at first mention (e.g., messenger RNA (mRNA)).
  • If both Chinese and English terms appear, verify they match the configured mapping for the selected domain.
  • If synonyms are detected, prefer the standardized term defined in the database.

Rule database format (JSON)

Rules are grouped by language and category:

{
  "english": {
    "spelling": [],
    "grammar": [],
    "style": []
  },
  "format": {
    "references": [],
    "numbers": [],
    "units": []
  }
}

How rules are applied (high level):

  • Load rule sets by --lang and --style.
  • Run language-specific checks (English/Chinese) and formatting checks.
  • Merge results into a unified issue list.
  • Render issues into the selected output format (html / markdown) with location-aware annotations.

Extensibility

  • Add new rules

    1. Create a rule file under assets/rules/.
    2. Implement rules following the project’s rule template.
    3. Register the rule set in the rule index.
    4. Run tests to validate precision/recall and avoid false positives.
  • Add new terminology sets

    1. Create a terminology JSON under assets/terminology/.
    2. Follow the domain structure shown above.
    3. Register the new domain in the terminology index so it can be selected via --terminology.