Agent Skills

Esm

AIPOCH

Toolkit for protein language models (ESM3 for multimodal generative protein design; ESM C for efficient embeddings). Use when you need sequence/structure/function generation or prediction, inverse folding, protein embeddings, or scalable inference via local weights or the Forge API.

151
7
FILES
esm/
skill.md
references
esm-c-api.md
esm3-api.md
forge-api.md
workflows.md
87100Total Score
View Evaluation Report
Core Capability
84 / 100
Functional Suitability
11 / 12
Reliability
9 / 12
Performance & Context
7 / 8
Agent Usability
14 / 16
Human Usability
8 / 8
Security
10 / 12
Maintainability
9 / 12
Agent-Specific
16 / 20
Medical Task
20 / 20 Passed
93Designing novel proteins with desired properties by generating sequences (optionally conditioned on structure/function) using ESM3
4/4
89Completing or editing sequences (e.g., filling masked residues, generating variants) for protein engineering workflows
4/4
87ESM3 multimodal generation across *sequence*, *structure*, and *function* tracks
4/4
87Local inference (e.g., esm3-sm-open-v1) and cloud inference via Forge (e.g., esm3-medium-2024-08, esm3-large-2024-03)
4/4
87End-to-end case for ESM3 multimodal generation across *sequence*, *structure*, and *function* tracks
4/4

SKILL.md

When to Use

  • Designing novel proteins with desired properties by generating sequences (optionally conditioned on structure/function) using ESM3.
  • Completing or editing sequences (e.g., filling masked residues, generating variants) for protein engineering workflows.
  • Predicting 3D structure from sequence or performing inverse folding (designing sequences for a target structure) with ESM3’s structure/sequence tracks.
  • Generating protein embeddings for downstream ML tasks (classification, clustering, similarity search, function prediction) using ESM C.
  • Scaling inference to many sequences using the Forge API (async/batch execution, hosted large models).

Key Features

  • ESM3 multimodal generation across sequence, structure, and function tracks.
  • Local inference (e.g., esm3-sm-open-v1) and cloud inference via Forge (e.g., esm3-medium-2024-08, esm3-large-2024-03).
  • Structure prediction (sequence → coordinates/PDB) and inverse folding (structure → designed sequence).
  • Functional conditioning via function annotations to bias generation toward desired functional regions.
  • ESM C embeddings for efficient, high-quality protein representations.
  • Async batch processing with Forge for high-throughput workloads.

Additional reference docs (if present in this skill package):

  • references/esm3-api.md (ESM3 API, generation parameters, multimodal prompting)
  • references/esm-c-api.md (ESM C API, embedding strategies, optimization)
  • references/forge-api.md (authentication, rate limits, batching)
  • references/workflows.md (end-to-end workflows)

Dependencies

  • esm (Python package; install via pip/uv)
  • flash-attn (optional; recommended for faster attention on supported GPUs)

Version notes: exact versions depend on your environment and CUDA/PyTorch stack. Install commands below reflect the upstream package usage.

Example Usage

The following script demonstrates:

  1. local ESM3 sequence completion,
  2. Forge-based async batch generation, and
  3. local ESM C embeddings.
"""
End-to-end example for ESM:
- Local ESM3: sequence completion
- Forge ESM3: async batch generation (requires token)
- Local ESM C: embeddings
"""

import os
import asyncio

# ---------- 1) Local ESM3: sequence completion ----------
from esm.models.esm3 import ESM3
from esm.sdk.api import ESMProtein, GenerationConfig

def local_esm3_sequence_completion():
    # Load a local ESM3 model (open weights)
    model = ESM3.from_pretrained("esm3-sm-open-v1").to("cuda")

    # '_' indicates masked/unknown residues to be generated
    protein = ESMProtein(sequence="MPRT___KEND")

    completed = model.generate(
        protein,
        GenerationConfig(track="sequence", num_steps=8)
    )
    print("Local ESM3 completed sequence:", completed.sequence)


# ---------- 2) Forge ESM3: async batch generation ----------
from esm.sdk.forge import ESM3ForgeInferenceClient

async def forge_batch_generation():
    token = os.environ.get("FORGE_TOKEN", "<token>")
    client = ESM3ForgeInferenceClient(
        model="esm3-medium-2024-08",
        url="https://forge.evolutionaryscale.ai",
        token=token,
    )

    proteins = [ESMProtein(sequence="MPRT" + "_" * 50 + "KEND") for _ in range(5)]
    tasks = [
        client.async_generate(p, GenerationConfig(track="sequence", num_steps=50))
        for p in proteins
    ]
    results = await asyncio.gather(*tasks)
    print("Forge batch results (first):", results[0].sequence)


# ---------- 3) Local ESM C: embeddings ----------
from esm.models.esmc import ESMC

def local_esmc_embeddings():
    model = ESMC.from_pretrained("esmc-300m").to("cuda")

    protein = ESMProtein(sequence="MPRTKEINDAGLIVHSP")
    encoded = model.encode(protein)
    embeddings = model.forward(encoded)

    # embeddings is a tensor-like output; exact shape depends on model/config
    print("ESM C embeddings computed.")


if __name__ == "__main__":
    local_esm3_sequence_completion()

    # Run Forge example only if you have a valid token
    # export FORGE_TOKEN="..."
    asyncio.run(forge_batch_generation())

    local_esmc_embeddings()

Installation Commands

# Base
uv pip install esm

# Optional acceleration (GPU environments where supported)
uv pip install flash-attn --no-build-isolation

Implementation Details

ESM3 Tracks and Generation

  • Tracks determine what the model generates:

    • track="sequence": generates amino-acid tokens (use _ for masked positions).
    • track="structure": predicts 3D coordinates; can be exported as PDB (see references/esm3-api.md).
    • track="function": predicts or conditions on functional annotations.
  • Core generation parameters (via GenerationConfig):

    • num_steps: number of iterative generation steps; commonly aligned with the number of masked residues for sequence completion, or set to a design budget for de novo generation.
    • temperature: controls sampling diversity (lower = more deterministic; higher = more diverse).
    • Additional advanced controls and multimodal prompting patterns are documented in references/esm3-api.md.

Structure Prediction and Inverse Folding

  • Structure prediction: provide a sequence and generate on the structure track to obtain coordinates and/or a PDB representation.
  • Inverse folding: start from a target structure (e.g., ESMProtein.from_pdb(...)), remove/omit the sequence, then generate on the sequence track to design a sequence compatible with the structure.

ESM C Embeddings

  • ESM C models are optimized for representation learning:
    • Use model.encode(ESMProtein(...)) to tokenize/prepare inputs.
    • Use model.forward(...) to obtain embeddings/logits suitable for downstream tasks (classification, clustering, similarity).
  • For batching and performance strategies (padding, caching, normalization), see references/esm-c-api.md.

Forge API (Hosted Inference)

  • Forge provides access to larger hosted models and scalable execution:
    • Use ESM3ForgeInferenceClient(...) with a token.
    • Prefer async_generate + asyncio.gather(...) for throughput.
  • Authentication, rate limits, and batching modes are detailed in references/forge-api.md.