Agent Skills
Image Ocr
AIPOCH
Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").
2
0
FILES
87100Total Score
View Evaluation ReportCore Capability
88 / 100
Functional Suitability
11 / 12
Reliability
10 / 12
Performance & Context
8 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
10 / 12
Agent-Specific
17 / 20
Medical Task
20 / 20 Passed
90You need to extract text from an image file (PNG/JPEG/TIFF/BMP) for downstream processing or review
4/4
86You want to run OCR with a specific Tesseract language model (e.g., eng, chi_sim)
4/4
86OCR text extraction using Tesseract via pytesseract
4/4
86Supports common image formats: PNG, JPEG, TIFF, BMP (via Pillow)
4/4
86End-to-end case for OCR text extraction using Tesseract via pytesseract
4/4
SKILL.md
When to Use
- You need to extract text from an image file (PNG/JPEG/TIFF/BMP) for downstream processing or review.
- You want to run OCR with a specific Tesseract language model (e.g.,
eng,chi_sim). - You prefer providing a natural-language request that contains an image path (e.g., "Interpret the image at ...") instead of manually setting
image_path. - You need a quick local OCR verification workflow from the command line.
- You want a simple JSON-configured OCR runner that can be integrated into scripts or automation.
Key Features
- OCR text extraction using Tesseract via
pytesseract. - Supports common image formats: PNG, JPEG, TIFF, BMP (via Pillow).
- Multi-language OCR through the
langconfiguration option. - Natural-language request parsing to automatically locate the image path.
- Config-driven execution through
scripts/ocr_config.json.
Dependencies
- Python packages:
pytesseract(version not specified)Pillow(version not specified)
- System dependency:
- Tesseract OCR (installed separately; ensure
tesseract_cmdpoints to the executable)
- Tesseract OCR (installed separately; ensure
Example Usage
- Install dependencies (example):
pip install pytesseract Pillow
-
Install Tesseract OCR (system-level) and ensure it is accessible.
- If it is not on
PATH, settesseract_cmdto the full executable path in the config.
- If it is not on
-
Create or edit
scripts/ocr_config.json:
Option A: Direct image path
{
"image_path": "C:\\Users\\xuw\\Desktop\\test_image.png",
"request": "",
"lang": "chi_sim",
"tesseract_cmd": "tesseract"
}
Option B: Natural-language request (image path embedded)
{
"request": "Interpret the image at C:\\Users\\xuw\\Desktop\\test_image.png",
"lang": "chi_sim",
"tesseract_cmd": "tesseract"
}
- Run:
python scripts/image_ocr.py
Implementation Details
-
Configuration inputs
image_path: Explicit path to the image file to OCR.request: Natural-language instruction that includes an image path; when provided, the script extracts the path from this text and uses it as the OCR target.lang: Tesseract language model code (e.g.,eng,chi_sim). This is passed to Tesseract to control recognition language.tesseract_cmd: The Tesseract executable name or full path; used to configurepytesseractto locate Tesseract.
-
Execution flow (high level)
- Load
scripts/ocr_config.json. - Determine the target image path:
- Use
image_pathif present and non-empty; otherwise parse the path fromrequest.
- Use
- Load the image via Pillow.
- Run OCR via
pytesseractwith the configuredlang. - Output the extracted text (script-defined output behavior).
- Load
-
Language model requirement
- The selected
langmust be installed in your local Tesseract language data; otherwise OCR may fail or fall back depending on your Tesseract setup.
- The selected
When Not to Use
- Do not use this skill when the required source data, identifiers, files, or credentials are missing.
- Do not use this skill when the user asks for fabricated results, unsupported claims, or out-of-scope conclusions.
- Do not use this skill when a simpler direct answer is more appropriate than the documented workflow.
Required Inputs
- A clearly specified task goal aligned with the documented scope.
- All required files, identifiers, parameters, or environment variables before execution.
- Any domain constraints, formatting requirements, and expected output destination if applicable.
Recommended Workflow
- Validate the request against the skill boundary and confirm all required inputs are present.
- Select the documented execution path and prefer the simplest supported command or procedure.
- Produce the expected output using the documented file format, schema, or narrative structure.
- Run a final validation pass for completeness, consistency, and safety before returning the result.
Deterministic Output Rules
- Use the same section order for every supported request of this skill.
- Keep output field names stable and do not rename documented keys across examples.
- If a value is unavailable, emit an explicit placeholder instead of omitting the field.
Output Contract
- Return a structured deliverable that is directly usable without reformatting.
- If a file is produced, prefer a deterministic output name such as
image_ocr_result.mdunless the skill documentation defines a better convention. - Include a short validation summary describing what was checked, what assumptions were made, and any remaining limitations.
Validation and Safety Rules
- Validate required inputs before execution and stop early when mandatory fields or files are missing.
- Do not fabricate measurements, references, findings, or conclusions that are not supported by the provided source material.
- Emit a clear warning when credentials, privacy constraints, safety boundaries, or unsupported requests affect the result.
- Keep the output safe, reproducible, and within the documented scope at all times.
Failure Handling
- If validation fails, explain the exact missing field, file, or parameter and show the minimum fix required.
- If an external dependency or script fails, surface the command path, likely cause, and the next recovery step.
- If partial output is returned, label it clearly and identify which checks could not be completed.
Completion Checklist
- Confirm all required inputs were present and valid.
- Confirm the supported execution path completed without unresolved errors.
- Confirm the final deliverable matches the documented format exactly.
- Confirm assumptions, limitations, and warnings are surfaced explicitly.
Quick Validation
Run this minimal verification path before full execution when possible:
python scripts/image_ocr.py --help
Expected output format:
Result file: image_ocr_result.md
Validation summary: PASS/FAIL with brief notes
Assumptions: explicit list if any
Scope Reminder
- Core purpose: Extract text from images with Tesseract OCR; use it when you need to recognize text from PNG/JPEG/TIFF/BMP images, select a language model, or run OCR via natural-language requests (e.g., "Interpret the image at C:\path\image.png").