Dataset Catalog
Unified dataset catalog for discovery plus the dataset access available to your account. The catalog call itself does not deduct Wallet Credits.
GET
/v2/datasets/catalogDataset Catalog
Returns public discovery rows, your generated dataset rows, and the dataset IDs you can download right now.
cURL
curl -G https://api.omtx.ai/v2/datasets/catalog \
-H "x-api-key: YOUR_API_KEY"Response
{
"catalog": {
"items": [
{
"dataset_id": "dataset-public-v5",
"protein_uuid": "550e8400-e29b-41d4-a716-446655440000",
"protein_name": "Breast cancer type 1 susceptibility protein",
"uniprot_id": "P38398",
"sequence": null,
"vintage": "20260201_om1",
"vintage_id": "9512ba4b-8a67-4310-9100-fcb6fb06cf70",
"is_public": true,
"num_data_points": 83143571,
"num_actives": 450612,
"num_inactives": 82692959
}
],
"count": 1
},
"data_generated": {
"items": [
{
"protein_uuid": "43d75238-ae2a-5935-8ade-30115565034e",
"protein_name": "Generated Protein",
"generation_status": "ready",
"license_kind": "exclusive",
"requires_subscription": false,
"subscription_entitled": true,
"dataset_id": "dataset-private-v2",
"latest_vintage_id": "9512ba4b-8a67-4310-9100-fcb6fb06cf70",
"latest_vintage": "20260201_om1",
"is_public": false,
"num_data_points": 83143571,
"num_actives": 450612,
"num_inactives": 82692959
}
],
"count": 1
},
"accessible_generated_protein_uuids": [
"43d75238-ae2a-5935-8ade-30115565034e"
],
"accessible_dataset_ids": [
"dataset-public-v5",
"dataset-private-v2"
]
}- No Wallet Credits charge for catalog discovery.
- `catalog` contains only `is_public=true` rows
- `data_generated` contains generated rows you can currently use
- `accessible_generated_protein_uuids` is the deduplicated list of generated protein UUIDs available to your account right now
- `accessible_dataset_ids` lists the datasets your account can download right now
- Public dataset downloads require an active subscription or qualifying buyout.
- Private dataset downloads require a qualifying owned data-generation order for that `protein_uuid`.
- Use `protein_uuid` with `client.load_binders(...)` / `client.load_nonbinders(...)` (recommended) or `/v2/data-access/shards` for direct signed URL access.
Response Fields
catalog.items[]--- Public discovery rows (is_public=true) with dataset metadata fromdatasetscatalog.count--- Number of public discovery rowsdata_generated.items[]--- Generated dataset rows currently available to your accountdata_generated.count--- Number of user generation rowsaccessible_generated_protein_uuids[]--- Deduplicated list of generated protein UUIDs available to your accountaccessible_dataset_ids[]--- Dataset IDs you can download with/v2/data-access/shards
Python SDK
Python
from omtx import OmClient
client = OmClient(api_key="YOUR_API_KEY")
# Get unified catalog payload
catalog = client.datasets.catalog()
print(f"Public rows: {catalog['catalog']['count']}")
print(f"Generated rows: {catalog['data_generated']['count']}")
print(f"Generated protein UUIDs: {len(catalog['accessible_generated_protein_uuids'])}")
print(f"Accessible dataset IDs: {len(catalog['accessible_dataset_ids'])}")
# Recommended next step: load binder/non-binder pools from a catalog protein UUID
items = catalog["catalog"]["items"]
binders = client.load_binders(
protein_uuid=items[0]["protein_uuid"],
n=1000,
sample_seed=42,
)
nonbinders = client.load_nonbinders(
protein_uuid=items[0]["protein_uuid"],
n=10000,
sample_seed=42,
)
print("Loaded shapes:", binders.shape, nonbinders.shape)