Skip to main content

Manage datasets

import decentriq_platform as dq
user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.latest()
Setup Script

If you want to test this functionality and don't have a clean room already set up, you can use this script to create an appropriate environment to test the rest of this guide with.

import decentriq_platform as dq
from decentriq_platform.analytics import *
from decentriq_platform.lookalike_media import LookalikeMediaDcrBuilder

USER_EMAIL = "@@ YOUR EMAIL HERE @@"
OTHER_EMAIL = "@@ OTHER EMAIL HERE @@"
API_TOKEN = "@@ YOUR TOKEN HERE @@"

client = dq.create_client(USER_EMAIL, API_TOKEN)

# build an example DCR
builder = AnalyticsDcrBuilder(client=client)
dcr_definition = builder.\
with_name("My DCR").\
with_owner(USER_EMAIL).\
with_description("My test DCR").\
add_node_definitions([
RawDataNodeDefinition(name="my-raw-data-node", is_required=True),
TableDataNodeDefinition(
name="my-table-data-node",
columns=[
Column(
name="name",
format_type=FormatType.STRING,
is_nullable=False,
),
Column(
name="salary",
format_type=FormatType.INTEGER,
is_nullable=False,
),
],
is_required=True,
),
]).\
add_participant(
USER_EMAIL,
data_owner_of=[
"my-raw-data-node",
"my-table-data-node",
]
).\
build()
dcr = client.publish_analytics_dcr(dcr_definition)
# Generate an encryption key
encryption_key = dq.Key()

# Read dataset locally, encrypt, upload and provision it to DCR
with open("/path/to/dataset.csv", "rb") as dataset:
DATASET_ID = client.upload_dataset(
dataset,
encryption_key,
"dataset_name",
)

List datasets and get details

datasets = client.get_available_datasets()
# Each such dataset is a dictionary:
dataset_name = datasets[0]["name"]
manifest_hash = datasets[0]["manifestHash"]
client.get_dataset(manifest_hash)

Deprovision and delete datasets via SDK

Deprovision a Dataset

A deprovisioned dataset can no longer be used in a specific clean room, but remains available to be reprovisioned.

# Deprovision the dataset from the Table or File node
dcr.get_node("my-raw-data-node").remove_published_dataset()

Delete a Dataset

A deleted dataset can no longer be provisioned to any clean rooms. Clean rooms that previously had the dataset provisioned may still have a copy cached. Therefore, before deleting a dataset from the Decentriq platform, it is best practice to deprovision it from any clean rooms using it first.

client.delete_dataset(DATASET_ID)

Copy IDs from the Decentriq UI to use in the SDK

To obtain a DCR ID

Access the DCR, click on the … icon in the top-right corner, then Copy ID.

Copy DCR ID

To obtain a Table or File node Name

Use the same node name as you see in the UI.

To obtain a dataset ID

Access the Datasets page, locate the dataset and copy the ID displayed at the bottom of the details panel. Copy dataset ID