Skip to main content

Provisioning a new dataset

This doc gives an example of how to provision a dataset to a Media Data Clean Room (DCR), using a direct upload from a local source.

import decentriq_platform as dq
user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.latest()
Setup Script

If you want to test this functionality and don't have a clean room already set up, you can use this script to create an appropriate environment to test the rest of this guide with.

import decentriq_platform as dq
from decentriq_platform.media import MediaDcrBuilder

advertiser_email = "@@ YOUR EMAIL HERE @@"
advertiser_api_token = "@@ YOUR TOKEN HERE @@"
publisher_email = "@@ EMAIL OF PUBLISHER PARTICIPANT @@"

advertiser_client = dq.create_client(advertiser_email, advertiser_api_token)

builder = MediaDcrBuilder(client=advertiser_client)
dcr_definition = builder.\
with_name("My DCR").\
with_insights().\
with_lookalike().\
with_retargeting().\
with_matching_id_format(dq.types.MatchingId.STRING).\
with_publisher_emails(publisher_email).\
with_advertiser_emails(advertiser_email).\
with_agency_emails(["test@agency.com"]).\
with_observer_emails(["test@observer.com"]).\
build()
media_dcr = advertiser_client.publish_media_dcr(dcr_definition)
dcr_id = media_dcr.id

Direct Upload

Advertisers can use the Session function publish_dataset to provision their data. While technically optional, using a Keychain will make any future operations with this dataset easier. This example upload script also places the key into the Keychain. You will not need to remember the key to reprovision this data. You only need the password for the Keychain.

import decentriq_platform as dq
from decentriq_platform import Keychain, KeychainEntry

user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
keychain_password = "@@ YOUR KEYCHAIN PASSWORD HERE @@"
dataset_name = "audiences.csv"
dataset_path = "/path/to/advertiser_data.csv"

client = dq.create_client(user_email, api_token)
user_keychain = Keychain.get_or_create_unlocked_keychain(client, bytes(keychain_password, 'utf8'))

data_room_descriptions = {description['id']: description for description in client.get_data_room_descriptions()}
data_room_description = data_room_descriptions[dcr_id]
session = client.create_session_from_data_room_description(data_room_description)

key = dq.Key()
with open(dataset_path, "rb") as f:
dataset_id = client.upload_dataset(
f,
key,
dataset_name,
store_in_keychain=user_keychain
)
user_keychain.remove("dataset_key", dataset_id)
user_keychain.insert(KeychainEntry("dataset_key", dataset_id, key.material))

session.publish_dataset(dcr_id, dataset_id, "audiences", key)

Upload from Keychain

This is an example of provisioning data to a Media DCR. You will not need to remember the key to reprovision this data. You only need the password for the Keychain.

import decentriq_platform as dq
from decentriq_platform import Keychain, KeychainEntry

user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
keychain_password = "@@ YOUR KEYCHAIN PASSWORD HERE @@"
dataset_id = "@@ YOUR DATASET ID HERE @@"

client = dq.create_client(user_email, api_token)
user_keychain = Keychain.get_or_create_unlocked_keychain(client, bytes(keychain_password, 'utf8'))
key = dq.Key(user_keychain.get("dataset_key", dataset_id).value)

data_room_descriptions = {description['id']: description for description in client.get_data_room_descriptions()}
data_room_description = data_room_descriptions[dcr_id]
session = client.create_session_from_data_room_description(data_room_description)
session.publish_dataset(dcr_id, dataset_id,"audiences", key)

# altnerate way to call without session
# dcr = client.retrieve_media_dcr(dcr_id)
# dcr.get_node("audiences").publish_dataset(
# dataset_id,
# dq.Key(key.value)
# )

No Keychain

For completeness, this example upload script does the same operation without using the Keychain. You will need to use the same key again to reprovision this data.

import decentriq_platform as dq

user_email = "@@ YOUR EMAIL HERE @@"
api_token = "@@ YOUR TOKEN HERE @@"
dataset_name = "audiences.csv"
dataset_path = "/path/to/advertiser_data.csv"

client = dq.create_client(user_email, api_token)
data_room_descriptions = {description['id']: description for description in client.get_data_room_descriptions()}

data_room_description = data_room_descriptions[dcr_id]
session = client.create_session_from_data_room_description(data_room_description)

key = dq.Key()
with open(dataset_path, "rb") as f:
dataset_id = client.upload_dataset(f, key, dataset_name)
session.publish_dataset(dcr_id, dataset_id, "audiences", key)