Skip to main content

Uploading and provisioning data

import decentriq_platform as dq
user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.latest()
Setup Script

If you want to test this functionality and don't have a clean room already set up, you can use this script to create an appropriate environment to test the rest of this guide with.

import decentriq_platform as dq
from decentriq_platform.analytics import AnalyticsDcrBuilder
from decentriq_platform.analytics import RawDataNodeDefinition
from decentriq_platform.analytics import PythonComputeNodeDefinition
import io

user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"


client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.latest()

builder = AnalyticsDcrBuilder(client=client)
builder.\
with_name("My DCR").\
with_owner(user_email).\
with_description("My test DCR")

# Create a `RawDataNodeDefinition` and add it to the DCR right away:
builder.add_node_definition(
RawDataNodeDefinition(name="my-raw-data-node", is_required=True)
)

builder.add_node_definition(
PythonComputeNodeDefinition(
name="python-node",
script="""
import shutil
shutil.copyfile("/input/my-raw-data-node", "/output/result.txt")
""",
dependencies=["my-raw-data-node"]
)
)
builder.add_participant(
user_email,
data_owner_of=["my-raw-data-node"],
analyst_of=["python-node"]
)
dcr_definition = builder.build()
dcr = client.publish_analytics_dcr(dcr_definition)

Import datasets

Making data accessible to the DCR

A dataset is available for use by a DCR once it has been uploaded to the Decentriq Platform and published (or "connected") to a Data Node within a DCR. Using the AnalyticsDcr object we just received, we can obtain a handle on the data node and upload data to it as follows:

import io
from decentriq_platform import Key

key = Key() # generate an encryption key with which to encrypt the dataset
raw_data_node = dcr.get_node("my-raw-data-node")
data = io.BytesIO(b"my-dataset")
raw_data_node.upload_and_publish_dataset(data, key, "my-data.txt")

# For demo purposes we used a BytesIO wrapper around a string.
# In a real world use case, however, you would probably want to read some local file instead.
# In this case, use the following syntax (note the "rb" when reading the file):
#
# with open("local-file.txt", "rb") as data:
# raw_data_node.upload_and_publish_dataset(data, key, "my-data.txt")

Often it is useful to upload a dataset in a separate step. It can then simply be published to the Data Node using its publish_dataset method:

data = io.BytesIO(b"some-new-data-dataset")
key = Key()

# Upload the dataset to the Decentriq Platform in a separate step
manifest_hash = client.upload_dataset(data, key, "my-new-data.txt")

# Make the dataset available within a DCR.
raw_data_node.publish_dataset(manifest_hash, key)