Shard Access (Signed URLs)

Return signed URLs for binder and non-binder Parquet shards so you can rank compounds, train models, and support target decisions with binding and selectivity scores. Access requires either an active dataset subscription or qualifying buyout (public datasets), or a qualifying owned data-generation order for the requested `protein_uuid` (private datasets).

POST/v2/data-access/shards

Shard Access (Signed URLs)

Return signed URLs for existing Parquet shards. Use the Om SDK or your analytics workflow to load, filter, and analyze shard data.

cURL
curl -X POST https://api.omtx.ai/v2/data-access/shards \
  -H "x-api-key: YOUR_API_KEY" \
  -H "idempotency-key: export-$(date +%s)" \
  -H "Content-Type: application/json" \
  -d '{
    "protein_uuid": "550e8400-e29b-41d4-a716-446655440000"
  }'
Response
{
  "protein_uuid": "550e8400-e29b-41d4-a716-446655440000",
  "dataset_id": "dataset-public-v5",
  "vintage_id": "2b8a7a1b-6f38-4c8e-9e1d-0d4e6e8f9a12",
  "vintage": "20260201_om1",
  "binders": {
    "urls": [
      {
        "file_path": "part-00000.parquet",
        "url": "SECURE SIGNED URL"
      }
    ]
  },
  "non_binders": {
    "urls": [
      {
        "file_path": "part-00000.parquet",
        "url": "SECURE SIGNED URL"
      }
    ]
  },
  "binder_urls": [
    "SECURE SIGNED URL"
  ],
  "non_binder_urls": [
    "SECURE SIGNED URL"
  ],
  "expires_at": "2026-02-01T00:00:00Z"
}
  • Public datasets require an active dataset subscription or qualifying buyout.
  • Private datasets require a qualifying owned data-generation order for the requested protein UUID.
  • The API returns the newest dataset snapshot available to your account for the requested `protein_uuid`.
  • Access follows your active plan, buyout, or qualifying generation order.
  • Returns signed URLs for the full shard set in that dataset export.
  • Primary SDK path: `client.load_data(...)` loads binders and non-binders together in one call.
  • For explicit per-pool control, use `client.load_binders(...)` and `client.load_nonbinders(...)`.
  • Use `binding_score` and `selectivity_score` to rank compounds and prioritize candidates.
  • The API returns all shard URLs; your SDK decides how many rows to load into modeling workflows.
  • If `n` is omitted (or `n=None`) in SDK loaders, the full pool is loaded.
  • `client.binders.urls(...)` returns flat `binder_urls` / `non_binder_urls` lists for direct URL handling.
  • Signed URLs expire after 60 minutes. Request fresh URLs when needed for long-running jobs.
  • `Idempotency-Key` header is required for POST requests.
  • Dataset outputs can be reused in Models Hub workflows and paired with Diligence target research.

Request Parameters

  • protein_uuid (required): Target protein UUID.

Python SDK

Combined training load (recommended)Python
from omtx import OmClient

client = OmClient(api_key="YOUR_API_KEY")

loaded = client.load_data(
    protein_uuid="550e8400-e29b-41d4-a716-446655440000",
    binders=50000,
    nonbinder_multiplier=5,
    sample_seed=42,
)
binders = loaded["binders"]
nonbinders = loaded["nonbinders"]

print("Loaded shapes:", binders.shape, nonbinders.shape)
binders.show(top_n=24)  # defaults: smiles + binding_score
Load binder and non-binder pools directly (recommended)Python
from omtx import OmClient

client = OmClient(api_key="YOUR_API_KEY")

binders = client.load_binders(
    protein_uuid="550e8400-e29b-41d4-a716-446655440000",
    n=1000,
    sample_seed=42,
)
nonbinders = client.load_nonbinders(
    protein_uuid="550e8400-e29b-41d4-a716-446655440000",
    n=10000,
    sample_seed=42,
)
# Omit n (or set n=None) to load the full pool.

print("Loaded shapes:", binders.shape, nonbinders.shape)
binders.show(top_n=24)  # defaults: smiles + binding_score
Advanced: request signed URL listsPython
from omtx import OmClient

client = OmClient(api_key="YOUR_API_KEY")

urls = client.binders.urls(
    protein_uuid="550e8400-e29b-41d4-a716-446655440000",
)

print("Binder shards:", len(urls["binder_urls"]))
print("First binder URL:", urls["binder_urls"][0])

Parquet Columns

  • smiles
  • gene
  • uniprot_id
  • sequence
  • MW
  • logP
  • PSA
  • HBD
  • HBA
  • binding_score
  • selectivity_score