Skip to main content

decentriq_platform.data_lab

Sub-modules

  • decentriq_platform.data_lab.check_data_lab_type

Functions

parse_statistics_result

def parse_statistics_result(
json_str: str,
)> Union[decentriq_platform.data_lab.api.StatisticsResult, decentriq_platform.data_lab.api.StatisticsResultV2]

Parse a statistics result JSON string into the appropriate version.

Automatically detects whether the result is V1 or V2 format based on the presence of distinguishing fields.

Parameters:

  • json_str: The JSON string to parse

Returns:

  • Either StatisticsResult (V1) or StatisticsResultV2 depending on the format

Classes

ComputeJob

ComputeJob(
dcr_id: str,
action: DataRoomComputeAction,
result_zip_file_name: str,
client: Client,
)

Abstract base class for compute jobs in archv2 DCRs.

Initialize a compute job.

Parameters:

  • dcr_id: The identifier for the DCR or DataLab
  • action: The compute action to perform
  • result_zip_file_name: The name of the file in the zip to get the result from
  • client: The client instance for API communication

Ancestors (in MRO)

  • abc.ABC Descendants
  • decentriq_platform.data_lab.compute_job.DataLabStatisticsJob
  • decentriq_platform.data_lab.compute_job.DataLabValidationStatusJob
  • decentriq_platform.data_lab.compute_job.ValidationReportJob
  • decentriq_platform.media.compute_job.MediaAudienceUserListJob
  • decentriq_platform.media.compute_job.MediaDataAttributesJob
  • decentriq_platform.media.compute_job.MediaEstimateAudienceSizeJob
  • decentriq_platform.media.compute_job.MediaGetAudiencesJob
  • decentriq_platform.media.compute_job.MediaGetCustomAudiencesJob
  • decentriq_platform.media.compute_job.MediaGetSeedAudiencesJob
  • decentriq_platform.media.compute_job.MediaInsightsJob
  • decentriq_platform.media.compute_job.MediaLookalikeAudienceStatisticsJob
  • decentriq_platform.media.compute_job.MediaMatchingValidationReportJob
  • decentriq_platform.media.compute_job.MediaModelQualityReportJob
  • decentriq_platform.media.compute_job.MediaOverlapStatisticsJob
  • decentriq_platform.media.compute_job.ValidationReportJob

download_result

def download_result(
self,
task_result_hash: str,
)> io.RawIOBase

Download the result of the compute job.

Parameters:

  • task_result_hash: The hash of the task result to download

Returns:

  • The result of the compute job

get_result

def get_result(
self,
)> bytes

Get the result of the compute job.

Returns:

  • The content of the result file as bytes or raises an exception if the job failed

get_result_as_zipfile

def get_result_as_zipfile(
self,
)> zipfile.ZipFile

Get the result of the compute job as a zipfile.ZipFile.

Returns:

  • The content of the result file as a zipfile.ZipFile. If the job failed, an Exception will be raised.

is_complete

def is_complete(
self,
)> bool

Check if the compute job is complete.

Returns:

  • True if the job is complete, False otherwise

result

def result(
self,
)> Any

Get the parsed result of the compute job.

Each job subclass should implement this method and return the appropriate type which represents the job result.

run

def run(
self,
)> None

Run the compute job.

Raises:

  • Exception: If the computation has already been run

wait_for_completion

def wait_for_completion(
self,
timeout: Optional[int] = None,
sleep_interval: int = 1,
)> Self

Wait for the compute job to complete.

Parameters:

  • timeout: The maximum time to wait for the job to complete, in seconds
  • sleep_interval: The interval to wait between checks, in seconds

Returns:

  • The compute job

DataLab

DataLab(
client: decentriq_platform.client.Client,
cfg: decentriq_platform.data_lab.data_lab.DataLabConfig,
existing_data_lab: Optional[decentriq_platform.data_lab.data_lab.ExistingDataLab] = None,
)

DataLab v2 implementation using the archv2 architecture.

This class inherits from DataLabInterface, ensuring API compatibility with the legacy DataLab implementation.

Ancestors (in MRO)

  • decentriq_platform.data_lab.data_lab_interface.DataLabInterface
  • abc.ABC Static methods

is_data_lab_validated

def is_data_lab_validated(
client_v2: decentriq_platform.archv2.client.ClientV2,
data_lab_id: str,
verification_key: bytes,
)> bool

Check if the DataLab has been validated (all required jobs completed successfully).

Instance variables

id: str : Alias for data_lab_id for convenience.

deprovision_dataset

def deprovision_dataset(
self,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)

Deprovision a single dataset from the DataLab.

Parameters

  • dataset_type: The type of the dataset.

get

def get(
self,
)> decentriq_platform.archv2.client.DataLabV2

Get the DataLab definition from the backend.

Returns:

  • The DataLab definition

get_statistics_report

def get_statistics_report(
self,
timeout: Optional[int] = None,
)> Dict[str, Any]

Get the statistics report, waiting for completion if necessary.

If run() was called previously, this will wait for and return results from that job. Otherwise, it will start a new statistics job.

Parameters:

  • timeout: The maximum time to wait for the job to complete, in seconds.

Returns:

  • The statistics result for the DataLab datasets

get_validation_report

def get_validation_report(
self,
timeout: Optional[int] = None,
)> Dict[str, Any]

Get the validation report, waiting for completion if necessary.

If run() was called previously, this will wait for and return results from those jobs. Otherwise, it will start new validation jobs.

Parameters:

  • timeout: The maximum time to wait for the jobs to complete, in seconds.

Returns:

  • The validation reports for all datasets as a dictionary

get_validation_status

def get_validation_status(
self,
timeout: Optional[int] = None,
)> Dict[str, Any]

Get the validation status, which checks that all validation reports passed and that identifiers and segments datasets are not empty after validation.

Parameters:

  • timeout: The maximum time to wait for the job to complete, in seconds.

Returns:

  • A dictionary with "status" ("SUCCESS" or "FAILED") and per-dataset details.

is_statistics_shared_with_dq

def is_statistics_shared_with_dq(
self,
)> bool

Check whether or not the statistics are shared with DQ.

is_validation_passed

def is_validation_passed(
self,
validation_report: Dict[str, Any],
)> bool

Check whether or not DataLab validation has passed.

Parameters:

  • validation_report: Result of calling get_validation_report on this DataLab.

provision_dataset

def provision_dataset(
self,
manifest_hash: str,
key: decentriq_platform.storage.Key,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)

Provision a single dataset to the DataLab.

Parameters

  • manifest_hash: The manifest hash of the uploaded dataset.
  • key: The key used to encrypt the dataset.
  • dataset_type: The type of the dataset.

provision_local_datasets

def provision_local_datasets(
self,
key: decentriq_platform.storage.Key,
matching_data_path: str,
segments_data_path: Optional[str] = None,
demographics_data_path: Optional[str] = None,
embeddings_data_path: Optional[str] = None,
*,
secret_store_options: Optional[decentriq_platform.client.SecretStoreOptions] = None,
)

Upload local datasets and provision to the DataLab.

Parameters

  • key: The key used to encrypt the dataset.
  • match: The file path to the "match" dataset.
  • segments: The file path to the "segments" dataset.
  • demographics: The file path to the "demographics" dataset.
  • embeddings: The file path to the "embeddings" dataset.

run

def run(
self,
)

Running the DataLab results in the validation jobs and statistics job being kicked off. This function does not block waiting for the results. Instead the user should call the get_validation_report or get_statistics_report function to wait for completion and retrieve results.

DataLabBuilder

DataLabBuilder(
client: decentriq_platform.client.Client,
)

A helper class to build a Data Lab.

build

def build(
self,
)> decentriq_platform.data_lab.data_lab_interface.DataLabInterface

Build the DataLab.

from_existing

def from_existing(
self,
data_lab_id: str,
)> Self

Construct a new DataLab from an existing DataLab with the given ID.

Parameters:

  • data_lab_id: The ID of the existing DataLab.

with_collaboration_types

def with_collaboration_types(
self,
collaboration_types: list[decentriq_platform.media.media.CollaborationType],
)> Self

Set the collaboration types for the DataLab.

with_demographics

def with_demographics(
self,
)> Self

Enable demographics in the DataLab.

with_disable_drop_invalid_rows

def with_disable_drop_invalid_rows(
self,
)

Disable dropping of invalid rows in the Data Lab.

with_embeddings

def with_embeddings(
self,
num_embeddings: int,
)> Self

Enable embeddings in the DataLab.

Parameters:

  • num_embeddings: The number of embeddings the DataLab should use.

with_identifiers_config

def with_identifiers_config(
self,
identifiers_config: list[decentriq_platform.data_lab.configs.IdentifiersConfig],
)> Self

Set the identifiers config.

Parameters:

  • identifiers_config: The identifiers config to use.

with_matching_id_format

def with_matching_id_format(
self,
matching_id: decentriq_platform.types.MatchingId,
)> Self

Set the matching ID format.

Parameters:

  • matching_id: The type of matching ID to use.

with_name

def with_name(
self,
name: str,
)> Self

Set the name of the DataLab.

Parameters:

  • name: Name to be used for the DataLab.

with_num_identifiers_columns

def with_num_identifiers_columns(
self,
num_identifiers_columns: int,
)> Self

Set the expected number of columns in the identifiers dataset.

Parameters:

  • num_identifiers_columns: The number of columns in the identifiers dataset.

with_segments

def with_segments(
self,
)> Self

Enable segments in the DataLab.

DataLabConfig

DataLabConfig(
name: str,
has_demographics: bool,
has_embeddings: bool,
num_embeddings: int,
has_segments: bool,
num_identifiers_columns: int,
identifiers_config: Optional[list[decentriq_platform.data_lab.configs.IdentifiersConfig]] = None,
matching_id: Optional[decentriq_platform.types.MatchingId] = None,
collaboration_types: Optional[list[decentriq_platform.media.media.CollaborationType]] = None,
force_spark_validation: bool = False,
drop_invalid_rows: bool = True,
share_statistics: bool = False,
)

DataLabDemographicsValidationReportJob

DataLabDemographicsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)

A compute job for retrieving demographics validation reports from a DataLab v2.

Initialize a DataLabDemographicsValidationReportJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.data_lab.compute_job.ValidationReportJob
  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC

result

def result(
self,
)> decentriq_platform.media.api.ValidationReport

Get the result of the DataLabDemographicsValidationReportJob.

Returns:

  • The validation report

DataLabEmbeddingsValidationReportJob

DataLabEmbeddingsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)

A compute job for retrieving embeddings validation reports from a DataLab v2.

Initialize a DataLabEmbeddingsValidationReportJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.data_lab.compute_job.ValidationReportJob
  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC

result

def result(
self,
)> decentriq_platform.media.api.ValidationReport

Get the result of the DataLabEmbeddingsValidationReportJob.

Returns:

  • The validation report

DataLabIdentifiersValidationReportJob

DataLabIdentifiersValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)

A compute job for retrieving identifiers validation reports from a DataLab v2.

Initialize a DataLabIdentifiersValidationReportJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.data_lab.compute_job.ValidationReportJob
  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC

result

def result(
self,
)> decentriq_platform.media.api.IdentifiersValidationReport

Get the result of the DataLabIdentifiersValidationReportJob.

Returns:

  • The validation report

DataLabInterface

DataLabInterface(
)

Abstract base class defining the public API for DataLab implementations.

Both the new DataLab v2 and the legacy DataLab inherit from this interface, ensuring API compatibility between implementations.

Ancestors (in MRO)

  • abc.ABC Descendants
  • decentriq_platform.data_lab.data_lab.DataLab

deprovision_dataset

def deprovision_dataset(
self,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)> None

Deprovision a single dataset from the DataLab.

Parameters:

  • dataset_type: The type of the dataset.

get_statistics_report

def get_statistics_report(
self,
timeout: Optional[int] = None,
)> Dict[str, Any]

Retrieve the statistics report.

This function will block until the report is ready unless a timeout is specified.

Parameters:

  • timeout: Maximum time to wait (in seconds) for the statistics report to become available.

Returns:

  • The statistics result for the DataLab datasets.

get_validation_report

def get_validation_report(
self,
timeout: Optional[int] = None,
)> Dict[str, Any]

Retrieve the validation report.

This function will block until the report is ready unless a timeout is specified.

Parameters:

  • timeout: Maximum time to wait (in seconds) for the validation report to become available.

Returns:

  • The validation reports for all datasets as a dictionary.

is_validation_passed

def is_validation_passed(
self,
validation_report: Dict[str, Any],
)> bool

Check whether DataLab validation has passed.

Parameters:

  • validation_report: Result of calling get_validation_report on this DataLab.

Returns:

  • True if validation passed, False otherwise.

provision_dataset

def provision_dataset(
self,
manifest_hash: str,
key: decentriq_platform.storage.Key,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)> None

Provision a single dataset to the DataLab.

Parameters:

  • manifest_hash: The manifest hash of the uploaded dataset.
  • key: The key used to encrypt the dataset.
  • dataset_type: The type of the dataset.

provision_local_datasets

def provision_local_datasets(
self,
key: decentriq_platform.storage.Key,
matching_data_path: str,
segments_data_path: Optional[str] = None,
demographics_data_path: Optional[str] = None,
embeddings_data_path: Optional[str] = None,
*,
secret_store_options: Optional[decentriq_platform.client.SecretStoreOptions] = None,
)> None

Upload local datasets and provision to the DataLab.

Parameters:

  • key: The key used to encrypt the dataset.
  • matching_data_path: The file path to the "matching" dataset.
  • segments_data_path: The file path to the "segments" dataset.
  • demographics_data_path: The file path to the "demographics" dataset.
  • embeddings_data_path: The file path to the "embeddings" dataset.
  • secret_store_options: Optional secret store configuration.

run

def run(
self,
/,
*,
dry_run: Optional[decentriq_platform.types.DryRunOptions] = None,
parameters: Optional[Mapping[str, str]] = None,
)> None

Run the DataLab validation and statistics jobs.

This function kicks off the validation and statistics computation jobs. It does not block waiting for the results. Instead, call get_validation_report or get_statistics_report to retrieve results.

Parameters:

  • dry_run: Optional dry run configuration.
  • parameters: Optional parameters for the computation.

DataLabSegmentsValidationReportJob

DataLabSegmentsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)

A compute job for retrieving segments validation reports from a DataLab v2.

Initialize a DataLabSegmentsValidationReportJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.data_lab.compute_job.ValidationReportJob
  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC

result

def result(
self,
)> decentriq_platform.media.api.ValidationReport

Get the result of the DataLabSegmentsValidationReportJob.

Returns:

  • The validation report

DataLabStatisticsJob

DataLabStatisticsJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)

A compute job for retrieving publisher data statistics from a DataLab v2.

Initialize a DataLabStatisticsJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC

result

def result(
self,
)> Union[decentriq_platform.data_lab.api.StatisticsResult, decentriq_platform.data_lab.api.StatisticsResultV2]

Get the result of the DataLabStatisticsJob.

Automatically returns the appropriate version (V1 or V2) based on the format of the result data.

Returns:

  • The statistics result (either StatisticsResult or StatisticsResultV2)

DataLabValidationReportsJob

DataLabValidationReportsJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
has_segments: bool = True,
has_demographics: bool = False,
has_embeddings: bool = False,
)

A class to encapsulate the various validation report jobs for a DataLab v2.

This class does not inherit from ComputeJob because it encapsulates multiple jobs rather than being a single job itself.

Initialize a DataLabValidationReportsJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • client: The client instance for API communication
  • has_segments: Whether the DataLab has segments data
  • has_demographics: Whether the DataLab has demographics data
  • has_embeddings: Whether the DataLab has embeddings data

is_complete

def is_complete(
self,
)> bool

Check if all validation jobs are complete.

Returns:

  • True if all jobs are complete, False otherwise

result

def result(
self,
)> decentriq_platform.data_lab.validation_reports.ValidationReports

Get the result of the DataLabValidationReportsJob.

Returns:

  • The validation reports

run

def run(
self,
)> None

Run all validation jobs.

wait_for_completion

def wait_for_completion(
self,
timeout: Optional[int] = None,
sleep_interval: int = 1,
)> Self

Wait for all validation jobs to complete.

Parameters:

  • timeout: The maximum time to wait for the jobs to complete, in seconds
  • sleep_interval: The interval to wait between checks, in seconds

Returns:

  • The compute job

IdType

IdType(
*args,
**kwds,
)

The type of identifier.

Members:

- STRING
- EMAIL
- HASHED_EMAIL
- PHONE_NUMBER
- HASHED_PHONE_NUMBER
- MAID
- HASHED_MAID
- UTIQ_MARTECH
- UTIQ_ADTECH
- FIRST_ID
- ID5
- NET_ID
- ONE_ID
- IPV4_ADDRESS

Ancestors (in MRO)

  • builtins.str
  • enum.Enum

IdentifiersConfig

IdentifiersConfig(
id_name: str,
id_type: decentriq_platform.types.IdType,
is_matching_id: bool,
is_activation_id: bool,
)

IdentifiersConfig(id_name: str, id_type: decentriq_platform.types.IdType, is_matching_id: bool, is_activation_id: bool)

to_dict

def to_dict(
self,
)> Dict[str, Any]

StatisticsResult

StatisticsResult(
**data: Any,
)

Result of the compute statistics job (Legacy/V1).

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors (in MRO)

  • pydantic.main.BaseModel

StatisticsResultV2

StatisticsResultV2(
**data: Any,
)

Result of the compute statistics job (V2).

This is the new structure that provides comprehensive statistics for data lab analysis including identifiers, segments, demographics, and embeddings.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors (in MRO)

  • pydantic.main.BaseModel

ValidationReport

ValidationReport(
**data: Any,
)

!!! abstract "Usage Documentation" Models

A base class for creating Pydantic models.

Attributes: class_vars: The names of the class variables defined on the model. private_attributes: Metadata about the private attributes of the model. signature: The synthesized __init__ [Signature][inspect.Signature] of the model.

__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models.
The `origin` and `args` items map to the [`__origin__`][genericalias.__origin__]
and [`__args__`][genericalias.__args__] attributes of [generic aliases][types-genericalias],
and the `parameter` item maps to the `__parameter__` attribute of generic classes.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.

__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.

__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.

Create a new model by parsing and validating input data from keyword arguments.

Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.

self is explicitly positional-only to allow self as a field name.

Ancestors (in MRO)

  • pydantic.main.BaseModel

ValidationReportJob

ValidationReportJob(
data_lab_id: str,
action: decentriq_dcr_compiler._schemas.data_room_compute_action.DataRoomComputeAction,
client: decentriq_platform.client.Client,
)

A compute job for retrieving validation reports from a DataLab v2.

Initialize a ValidationReportJob instance.

Parameters:

  • data_lab_id: The identifier for the DataLab
  • action: The compute action to perform
  • client: The client instance for API communication

Ancestors (in MRO)

  • decentriq_platform.archv2.compute_job.ComputeJob
  • abc.ABC Descendants
  • decentriq_platform.data_lab.compute_job.DataLabDemographicsValidationReportJob
  • decentriq_platform.data_lab.compute_job.DataLabEmbeddingsValidationReportJob
  • decentriq_platform.data_lab.compute_job.DataLabIdentifiersValidationReportJob
  • decentriq_platform.data_lab.compute_job.DataLabSegmentsValidationReportJob

get_validation_report

def get_validation_report(
self,
)> decentriq_platform.media.api.ValidationReport

Get the validation report.

Returns:

  • The validation report

ValidationReports

ValidationReports(
identifiers: decentriq_platform.media.api.IdentifiersValidationReport,
segments: Optional[decentriq_platform.media.api.ValidationReport] = None,
demographics: Optional[decentriq_platform.media.api.ValidationReport] = None,
embeddings: Optional[decentriq_platform.media.api.ValidationReport] = None,
)

A class to represent the validation reports for a DataLab v2.

Initialize a ValidationReports instance.

Parameters:

  • identifiers: The identifiers validation report (required)
  • segments: The segments validation report (optional)
  • demographics: The demographics validation report (optional)
  • embeddings: The embeddings validation report (optional)

is_passed

def is_passed(
self,
)> bool

Check if all validation reports passed.

Returns:

  • True if all validation reports passed, False otherwise

model_dump_json

def model_dump_json(
self,
)> str

Return a JSON string representation of the validation reports.

This is added to mimic the behaviour of Pydantic types to provide a consistent interface for the user.