Shard Access (Signed URLs)
Return signed URLs for binder and non-binder Parquet shards so you can rank compounds, train models, and support target decisions with binding and selectivity scores. Access requires either an active dataset subscription or qualifying buyout (public datasets), or a qualifying owned data-generation order for the requested `protein_uuid` (private datasets).
POST
/v2/data-access/shardsShard Access (Signed URLs)
Return signed URLs for existing Parquet shards. Use the Om SDK or your analytics workflow to load, filter, and analyze shard data.
cURL
curl -X POST https://api.omtx.ai/v2/data-access/shards \
-H "x-api-key: YOUR_API_KEY" \
-H "idempotency-key: export-$(date +%s)" \
-H "Content-Type: application/json" \
-d '{
"protein_uuid": "0d64fb6a-8a66-50ad-82b6-fabee8bb1516"
}'Response
{
"protein_uuid": "0d64fb6a-8a66-50ad-82b6-fabee8bb1516",
"dataset_id": "dataset-public-v5",
"vintage_id": "2b8a7a1b-6f38-4c8e-9e1d-0d4e6e8f9a12",
"vintage": "20260201_om1",
"binders": {
"urls": [
{
"file_path": "part-00000.parquet",
"url": "SECURE SIGNED URL"
}
]
},
"non_binders": {
"urls": [
{
"file_path": "part-00000.parquet",
"url": "SECURE SIGNED URL"
}
]
},
"binder_urls": [
"SECURE SIGNED URL"
],
"non_binder_urls": [
"SECURE SIGNED URL"
],
"expires_at": "2026-02-01T00:00:00Z"
}- Public datasets require an active dataset subscription or qualifying buyout.
- Private datasets require a qualifying owned data-generation order for the requested protein UUID.
- The API returns the newest dataset snapshot available to your account for the requested
protein_uuid. - Examples below use the public SOX2 dataset UUID
0d64fb6a-8a66-50ad-82b6-fabee8bb1516. - Access follows your active plan, buyout, or qualifying generation order.
- Returns signed URLs for the full shard set in that dataset export.
- Primary SDK path:
client.load_data(...)loads binders and non-binders together in one call. - For explicit per-pool control, use
client.load_binders(...)andclient.load_nonbinders(...). - Use
binding_scoreandselectivity_scoreto rank compounds and prioritize candidates. - The API returns all shard URLs; your SDK decides how many rows to load into modeling workflows.
- If
nis omitted (orn=None) in SDK loaders, the full pool is loaded. client.binders.urls(...)returns flatbinder_urls/non_binder_urlslists for direct URL handling.- Signed URLs expire after 60 minutes. Request fresh URLs when needed for long-running jobs.
Idempotency-Keyheader is required for POST requests.- Dataset outputs can be reused in Models Hub workflows and paired with Diligence target research.
Request Parameters
protein_uuid(required): Target protein UUID.
Python SDK
Combined training load (recommended)Python
from omtx import OmClient
client = OmClient(api_key="YOUR_API_KEY")
loaded = client.load_data(
protein_uuid="0d64fb6a-8a66-50ad-82b6-fabee8bb1516",
binders=50000,
nonbinder_multiplier=5,
sample_seed=42,
)
binders = loaded["binders"]
nonbinders = loaded["nonbinders"]
print("Loaded shapes:", binders.shape, nonbinders.shape)
binders.show(top_n=24) # defaults: smiles + binding_scoreLoad binder and non-binder pools directly (recommended)Python
from omtx import OmClient
client = OmClient(api_key="YOUR_API_KEY")
binders = client.load_binders(
protein_uuid="0d64fb6a-8a66-50ad-82b6-fabee8bb1516",
n=1000,
sample_seed=42,
)
nonbinders = client.load_nonbinders(
protein_uuid="0d64fb6a-8a66-50ad-82b6-fabee8bb1516",
n=10000,
sample_seed=42,
)
# Omit n (or set n=None) to load the full pool.
print("Loaded shapes:", binders.shape, nonbinders.shape)
binders.show(top_n=24) # defaults: smiles + binding_scoreAdvanced: request signed URL listsPython
from omtx import OmClient
client = OmClient(api_key="YOUR_API_KEY")
urls = client.binders.urls(
protein_uuid="0d64fb6a-8a66-50ad-82b6-fabee8bb1516",
)
print("Binder shards:", len(urls["binder_urls"]))
print("First binder URL:", urls["binder_urls"][0])Parquet Columns
smilesgeneuniprot_idsequenceMWlogPPSAHBDHBAbinding_scoreselectivity_score