Skip to main content

Manage datasets

import decentriq_platform as dq
user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.latest()
Setup Script

If you want to test this functionality and don't have a clean room already set up, you can use this script to create an appropriate environment to test the rest of this guide with.

import decentriq_platform as dq
from decentriq_platform.analytics import *
from decentriq_platform.lookalike_media import LookalikeMediaDcrBuilder

USER_EMAIL = "@@ YOUR EMAIL HERE @@"
OTHER_EMAIL = "@@ OTHER EMAIL HERE @@"
API_TOKEN = "@@ YOUR TOKEN HERE @@"
KEYCHAIN_PASSWORD = "@@ YOUR KEYCHAIN PASSWORD HERE @@"

client = dq.create_client(USER_EMAIL, API_TOKEN)

# build an example DCR
builder = AnalyticsDcrBuilder(client=client)
dcr_definition = builder.\
with_name("My DCR").\
with_owner(USER_EMAIL).\
with_description("My test DCR").\
add_node_definitions([
RawDataNodeDefinition(name="my-raw-data-node", is_required=True),
TableDataNodeDefinition(
name="my-table-data-node",
columns=[
Column(
name="name",
format_type=FormatType.STRING,
is_nullable=False,
),
Column(
name="salary",
format_type=FormatType.INTEGER,
is_nullable=False,
),
],
is_required=True,
),
]).\
add_participant(
USER_EMAIL,
data_owner_of=[
"my-raw-data-node",
"my-table-data-node",
]
).\
build()
dcr = client.publish_analytics_dcr(dcr_definition)
# Generate an encryption key
encryption_key = dq.Key()

# Read dataset locally, encrypt, upload and provision it to DCR
with open("/path/to/dataset.csv", "rb") as dataset:
DATASET_ID = client.upload_dataset(
dataset,
encryption_key,
"dataset_name",
)
my_keychain = Keychain.get_or_create_unlocked_keychain(client, bytes(KEYCHAIN_PASSWORD, 'utf8'))
my_keychain.insert(KeychainEntry("dataset_key", DATASET_ID, encryption_key.material))

## List datasets and get details ```python datasets = client.get_available_datasets() # Each such dataset is a dictionary: dataset_name = datasets[0]["name"] manifest_hash = datasets[0]["manifestHash"] client.get_dataset(manifest_hash) ```

Deprovision and delete datasets via SDK

Deprovision a Dataset

A deprovisioned dataset can no longer be used in a specific clean room, but remains available to be reprovisioned.

# Deprovision the dataset from the Table or File node
dcr.get_node("my-raw-data-node").remove_published_dataset()

Delete a Dataset

A deleted dataset can no longer be provisioned to any clean rooms. Clean rooms that previously had the dataset provisioned may still have a copy cached. Therefore, before deleting a dataset from the Decentriq platform, it is best practice to deprovision it from any clean rooms using it first.


client.delete_dataset(DATASET_ID)

The encryption key will remain in the Keychain and must be removed separately. Please check the Keychain guide for more details.

Delete an encryption key

When deleting a dataset from the Decentriq platform, using the client.delete_dataset(DATASET_ID), the encryption key will remain in the Keychain and must be removed separately.

To delete an encryption key from the Keychain, call the remove() method specifying the referenced dataset ID. This will not delete the dataset, nor deprovision it from any DCR. To do so, check the instructions above.

my_keychain = Keychain.get_or_create_unlocked_keychain(client, bytes(KEYCHAIN_PASSWORD, 'utf8'))
my_keychain.remove("dataset_key", DATASET_ID)

Copy IDs from the Decentriq UI to use in the SDK

To obtain a DCR ID

Access the DCR, click on the … icon in the top-right corner, then Copy ID.

Copy DCR ID

To obtain a Table or File node Name

Use the same node name as you see in the UI.

To obtain a dataset ID

Access the Datasets page, locate the dataset and copy the ID displayed at the bottom of the details panel. Copy dataset ID