Back to Blog
5 min read

Medical Skill Auditor: How AIPOCH Evaluates Medical Research Agent Skills?

Explore AIPOCH’s Medical Skill Auditor, a structured evaluation framework for medical research agent skills. Learn how veto gates, static and dynamic scoring ensure reliability, security, and scientific integrity.

AIPOCHApril 8, 2026

You can explore a growing collection of Medical Research Agent Skills on

AIPOCH Github.

If you find it useful, consider giving it a ⭐ to support the project!

What is Medical Skill Auditor?

Skill Evaluator is a standardized tool designed to assess the quality of Agent Skills. Its core function is to perform a comprehensive quality check on a Skill before it is officially deployed to users.

How does Medical Skill Auditor Work?

Veto Gates

To enforce strict quality control, Skill Auditor is designed with two layers of veto mechanisms. Any failure in these checks may lead to immediate rejection of a skill.

Skill Veto

Take the agent skill “medical-research-literature-reader-pro” as an example:

Skill Veto

  • Operational Stability
  • Structural Consistency
  • Result Determinism
  • System Security

Research Veto

Take the agent skill “medical-research-literature-reader-pro” as an example:

Research Veto

  • Scientific Integrity
  • Practice Boundaries
  • Methodological Ground
  • Code Usability

Core Capability

Take the agent skill “medical-research-literature-reader-pro” as an example: static Score

Evaluates a skill’s design and contract against key dimensions such as Functional Suitability, Reliability, Performance & Context, Agent Usability, Human Usability, Security, Agent-Specific and Maintainability.

Medical Task

Take the agent skill “medical-research-literature-reader-pro” as an example:

Medical Task

Assesses actual outputs of a skill with layered criteria.

For skill testing, the AI automatically generates inputs. The number of inputs in specific categories will increase or decrease depending on the complexity of the skill. The following 7 inputs represent the most comprehensive version.

  • Canonical
  • Variant A
  • Edge
  • Variant B
  • Stress
  • Scope Boundary
  • Adversarial

Skill Complexity Classification

LabelCode/RankDefinition
SimpleSNarrow task scope
ModerateMModerate branching or multiple task types
ComplexCBroad or multi-step specialized skill

Simple (S): 3 inputs

Moderate (M): 5 inputs

Complex (C): 7 inputs

Final Score

Take the agent skill “medical-research-literature-reader-pro” as an example:

Final Score

The Skill Evaluator uses a two-stage scoring system: static evaluation (design quality, accounting for 40%) and dynamic evaluation (runtime performance, accounting for 60%). The final overall score is derived by combining both.

  • Static (40%)
  • Dynamic (60%)

Final Score = Static Score × 40% + Dynamic Score × 60%

You can view evaluation results for selected AIPOCH skills here.