Dataset Catalog

Unified dataset catalog for discovery plus the dataset access available to your account. The catalog call itself does not deduct Wallet Credits.

GET/v2/datasets/catalog

Dataset Catalog

Returns public discovery rows, your generated dataset rows, and the dataset IDs you can download right now.

cURL
curl -G https://api.omtx.ai/v2/datasets/catalog \
  -H "x-api-key: YOUR_API_KEY"
Response
{
  "catalog": {
    "items": [
      {
        "dataset_id": "dataset-public-v5",
        "protein_uuid": "550e8400-e29b-41d4-a716-446655440000",
        "protein_name": "Breast cancer type 1 susceptibility protein",
        "uniprot_id": "P38398",
        "sequence": null,
        "vintage": "20260201_om1",
        "vintage_id": "9512ba4b-8a67-4310-9100-fcb6fb06cf70",
        "is_public": true,
        "num_data_points": 83143571,
        "num_actives": 450612,
        "num_inactives": 82692959
      }
    ],
    "count": 1
  },
  "data_generated": {
    "items": [
      {
        "protein_uuid": "43d75238-ae2a-5935-8ade-30115565034e",
        "protein_name": "Generated Protein",
        "generation_status": "ready",
        "license_kind": "exclusive",
        "requires_subscription": false,
        "subscription_entitled": true,
        "dataset_id": "dataset-private-v2",
        "latest_vintage_id": "9512ba4b-8a67-4310-9100-fcb6fb06cf70",
        "latest_vintage": "20260201_om1",
        "is_public": false,
        "num_data_points": 83143571,
        "num_actives": 450612,
        "num_inactives": 82692959
      }
    ],
    "count": 1
  },
  "accessible_generated_protein_uuids": [
    "43d75238-ae2a-5935-8ade-30115565034e"
  ],
  "accessible_dataset_ids": [
    "dataset-public-v5",
    "dataset-private-v2"
  ]
}
  • No Wallet Credits charge for catalog discovery.
  • `catalog` contains only `is_public=true` rows
  • `data_generated` contains generated rows you can currently use
  • `accessible_generated_protein_uuids` is the deduplicated list of generated protein UUIDs available to your account right now
  • `accessible_dataset_ids` lists the datasets your account can download right now
  • Public dataset downloads require an active subscription or qualifying buyout.
  • Private dataset downloads require a qualifying owned data-generation order for that `protein_uuid`.
  • Use `protein_uuid` with `client.load_binders(...)` / `client.load_nonbinders(...)` (recommended) or `/v2/data-access/shards` for direct signed URL access.

Response Fields

  • catalog.items[] --- Public discovery rows (is_public=true) with dataset metadata from datasets
  • catalog.count --- Number of public discovery rows
  • data_generated.items[] --- Generated dataset rows currently available to your account
  • data_generated.count --- Number of user generation rows
  • accessible_generated_protein_uuids[] --- Deduplicated list of generated protein UUIDs available to your account
  • accessible_dataset_ids[] --- Dataset IDs you can download with /v2/data-access/shards

Python SDK

Python
from omtx import OmClient

client = OmClient(api_key="YOUR_API_KEY")

# Get unified catalog payload
catalog = client.datasets.catalog()

print(f"Public rows: {catalog['catalog']['count']}")
print(f"Generated rows: {catalog['data_generated']['count']}")
print(f"Generated protein UUIDs: {len(catalog['accessible_generated_protein_uuids'])}")
print(f"Accessible dataset IDs: {len(catalog['accessible_dataset_ids'])}")

# Recommended next step: load binder/non-binder pools from a catalog protein UUID
items = catalog["catalog"]["items"]
binders = client.load_binders(
    protein_uuid=items[0]["protein_uuid"],
    n=1000,
    sample_seed=42,
)
nonbinders = client.load_nonbinders(
    protein_uuid=items[0]["protein_uuid"],
    n=10000,
    sample_seed=42,
)
print("Loaded shapes:", binders.shape, nonbinders.shape)