decentriq_platform.data_lab
Sub-modules
- decentriq_platform.data_lab.check_data_lab_type
Functions
parse_statistics_result
def parse_statistics_result(
json_str: str,
) ‑> Union[decentriq_platform.data_lab.api.StatisticsResult, decentriq_platform.data_lab.api.StatisticsResultV2]
Parse a statistics result JSON string into the appropriate version.
Automatically detects whether the result is V1 or V2 format based on the presence of distinguishing fields.
Parameters:
json_str: The JSON string to parse
Returns:
- Either StatisticsResult (V1) or StatisticsResultV2 depending on the format
Classes
ComputeJob
ComputeJob(
dcr_id: str,
action: DataRoomComputeAction,
result_zip_file_name: str,
client: Client,
)
Abstract base class for compute jobs in archv2 DCRs.
Initialize a compute job.
Parameters:
dcr_id: The identifier for the DCR or DataLabaction: The compute action to performresult_zip_file_name: The name of the file in the zip to get the result fromclient: The client instance for API communication
Ancestors (in MRO)
- abc.ABC Descendants
- decentriq_platform.data_lab.compute_job.DataLabStatisticsJob
- decentriq_platform.data_lab.compute_job.DataLabValidationStatusJob
- decentriq_platform.data_lab.compute_job.ValidationReportJob
- decentriq_platform.media.compute_job.MediaAudienceUserListJob
- decentriq_platform.media.compute_job.MediaDataAttributesJob
- decentriq_platform.media.compute_job.MediaEstimateAudienceSizeJob
- decentriq_platform.media.compute_job.MediaGetAudiencesJob
- decentriq_platform.media.compute_job.MediaGetCustomAudiencesJob
- decentriq_platform.media.compute_job.MediaGetSeedAudiencesJob
- decentriq_platform.media.compute_job.MediaInsightsJob
- decentriq_platform.media.compute_job.MediaLookalikeAudienceStatisticsJob
- decentriq_platform.media.compute_job.MediaMatchingValidationReportJob
- decentriq_platform.media.compute_job.MediaModelQualityReportJob
- decentriq_platform.media.compute_job.MediaOverlapStatisticsJob
- decentriq_platform.media.compute_job.ValidationReportJob
download_result
def download_result(
self,
task_result_hash: str,
) ‑> io.RawIOBase
Download the result of the compute job.
Parameters:
task_result_hash: The hash of the task result to download
Returns:
- The result of the compute job
get_result
def get_result(
self,
) ‑> bytes
Get the result of the compute job.
Returns:
- The content of the result file as bytes or raises an exception if the job failed
get_result_as_zipfile
def get_result_as_zipfile(
self,
) ‑> zipfile.ZipFile
Get the result of the compute job as a zipfile.ZipFile.
Returns:
- The content of the result file as a zipfile.ZipFile. If the job failed, an Exception will be raised.
is_complete
def is_complete(
self,
) ‑> bool
Check if the compute job is complete.
Returns:
- True if the job is complete, False otherwise
result
def result(
self,
) ‑> Any
Get the parsed result of the compute job.
Each job subclass should implement this method and return the appropriate type which represents the job result.
run
def run(
self,
) ‑> None
Run the compute job.
Raises:
Exception: If the computation has already been run
wait_for_completion
def wait_for_completion(
self,
timeout: Optional[int] = None,
sleep_interval: int = 1,
) ‑> Self
Wait for the compute job to complete.
Parameters:
timeout: The maximum time to wait for the job to complete, in secondssleep_interval: The interval to wait between checks, in seconds
Returns:
- The compute job
DataLab
DataLab(
client: decentriq_platform.client.Client,
cfg: decentriq_platform.data_lab.data_lab.DataLabConfig,
existing_data_lab: Optional[decentriq_platform.data_lab.data_lab.ExistingDataLab] = None,
)
DataLab v2 implementation using the archv2 architecture.
This class inherits from DataLabInterface, ensuring API compatibility with the legacy DataLab implementation.
Ancestors (in MRO)
- decentriq_platform.data_lab.data_lab_interface.DataLabInterface
- abc.ABC Static methods
is_data_lab_validated
def is_data_lab_validated(
client_v2: decentriq_platform.archv2.client.ClientV2,
data_lab_id: str,
verification_key: bytes,
) ‑> bool
Check if the DataLab has been validated (all required jobs completed successfully).
Instance variables
id: str
: Alias for data_lab_id for convenience.
deprovision_dataset
def deprovision_dataset(
self,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)
Deprovision a single dataset from the DataLab.
Parameters
dataset_type: The type of the dataset.
get
def get(
self,
) ‑> decentriq_platform.archv2.client.DataLabV2
Get the DataLab definition from the backend.
Returns:
- The DataLab definition
get_statistics_report
def get_statistics_report(
self,
timeout: Optional[int] = None,
) ‑> Dict[str, Any]
Get the statistics report, waiting for completion if necessary.
If run() was called previously, this will wait for and return results from
that job. Otherwise, it will start a new statistics job.
Parameters:
timeout: The maximum time to wait for the job to complete, in seconds.
Returns:
- The statistics result for the DataLab datasets
get_validation_report
def get_validation_report(
self,
timeout: Optional[int] = None,
) ‑> Dict[str, Any]
Get the validation report, waiting for completion if necessary.
If run() was called previously, this will wait for and return results from
those jobs. Otherwise, it will start new validation jobs.
Parameters:
timeout: The maximum time to wait for the jobs to complete, in seconds.
Returns:
- The validation reports for all datasets as a dictionary
get_validation_status
def get_validation_status(
self,
timeout: Optional[int] = None,
) ‑> Dict[str, Any]
Get the validation status, which checks that all validation reports passed and that identifiers and segments datasets are not empty after validation.
Parameters:
timeout: The maximum time to wait for the job to complete, in seconds.
Returns:
- A dictionary with "status" ("SUCCESS" or "FAILED") and per-dataset details.
is_statistics_shared_with_dq
def is_statistics_shared_with_dq(
self,
) ‑> bool
Check whether or not the statistics are shared with DQ.
is_validation_passed
def is_validation_passed(
self,
validation_report: Dict[str, Any],
) ‑> bool
Check whether or not DataLab validation has passed.
Parameters:
validation_report: Result of callingget_validation_reporton this DataLab.
provision_dataset
def provision_dataset(
self,
manifest_hash: str,
key: decentriq_platform.storage.Key,
dataset_type: decentriq_platform.types.DataLabDatasetType,
)
Provision a single dataset to the DataLab.
Parameters
manifest_hash: The manifest hash of the uploaded dataset.key: The key used to encrypt the dataset.dataset_type: The type of the dataset.
provision_local_datasets
def provision_local_datasets(
self,
key: decentriq_platform.storage.Key,
matching_data_path: str,
segments_data_path: Optional[str] = None,
demographics_data_path: Optional[str] = None,
embeddings_data_path: Optional[str] = None,
*,
secret_store_options: Optional[decentriq_platform.client.SecretStoreOptions] = None,
)
Upload local datasets and provision to the DataLab.
Parameters
key: The key used to encrypt the dataset.match: The file path to the "match" dataset.segments: The file path to the "segments" dataset.demographics: The file path to the "demographics" dataset.embeddings: The file path to the "embeddings" dataset.
run
def run(
self,
)
Running the DataLab results in the validation jobs and statistics job being kicked off.
This function does not block waiting for the results. Instead the user should call the
get_validation_report or get_statistics_report function to wait for completion
and retrieve results.
DataLabBuilder
DataLabBuilder(
client: decentriq_platform.client.Client,
)
A helper class to build a Data Lab.
build
def build(
self,
) ‑> decentriq_platform.data_lab.data_lab_interface.DataLabInterface
Build the DataLab.
from_existing
def from_existing(
self,
data_lab_id: str,
) ‑> Self
Construct a new DataLab from an existing DataLab with the given ID.
Parameters:
data_lab_id: The ID of the existing DataLab.
with_collaboration_types
def with_collaboration_types(
self,
collaboration_types: list[decentriq_platform.media.media.CollaborationType],
) ‑> Self
Set the collaboration types for the DataLab.
with_demographics
def with_demographics(
self,
) ‑> Self
Enable demographics in the DataLab.
with_disable_drop_invalid_rows
def with_disable_drop_invalid_rows(
self,
)
Disable dropping of invalid rows in the Data Lab.
with_embeddings
def with_embeddings(
self,
num_embeddings: int,
) ‑> Self
Enable embeddings in the DataLab.
Parameters:
num_embeddings: The number of embeddings the DataLab should use.
with_identifiers_config
def with_identifiers_config(
self,
identifiers_config: list[decentriq_platform.data_lab.configs.IdentifiersConfig],
) ‑> Self
Set the identifiers config.
Parameters:
identifiers_config: The identifiers config to use.
with_matching_id_format
def with_matching_id_format(
self,
matching_id: decentriq_platform.types.MatchingId,
) ‑> Self
Set the matching ID format.
Parameters:
matching_id: The type of matching ID to use.
with_name
def with_name(
self,
name: str,
) ‑> Self
Set the name of the DataLab.
Parameters:
name: Name to be used for the DataLab.
with_num_identifiers_columns
def with_num_identifiers_columns(
self,
num_identifiers_columns: int,
) ‑> Self
Set the expected number of columns in the identifiers dataset.
Parameters:
num_identifiers_columns: The number of columns in the identifiers dataset.
with_segments
def with_segments(
self,
) ‑> Self
Enable segments in the DataLab.
DataLabConfig
DataLabConfig(
name: str,
has_demographics: bool,
has_embeddings: bool,
num_embeddings: int,
has_segments: bool,
num_identifiers_columns: int,
identifiers_config: Optional[list[decentriq_platform.data_lab.configs.IdentifiersConfig]] = None,
matching_id: Optional[decentriq_platform.types.MatchingId] = None,
collaboration_types: Optional[list[decentriq_platform.media.media.CollaborationType]] = None,
force_spark_validation: bool = False,
drop_invalid_rows: bool = True,
share_statistics: bool = False,
)
DataLabDemographicsValidationReportJob
DataLabDemographicsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)
A compute job for retrieving demographics validation reports from a DataLab v2.
Initialize a DataLabDemographicsValidationReportJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.data_lab.compute_job.ValidationReportJob
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC
result
def result(
self,
) ‑> decentriq_platform.media.api.ValidationReport
Get the result of the DataLabDemographicsValidationReportJob.
Returns:
- The validation report
DataLabEmbeddingsValidationReportJob
DataLabEmbeddingsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)
A compute job for retrieving embeddings validation reports from a DataLab v2.
Initialize a DataLabEmbeddingsValidationReportJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.data_lab.compute_job.ValidationReportJob
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC
result
def result(
self,
) ‑> decentriq_platform.media.api.ValidationReport
Get the result of the DataLabEmbeddingsValidationReportJob.
Returns:
- The validation report
DataLabIdentifiersValidationReportJob
DataLabIdentifiersValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)
A compute job for retrieving identifiers validation reports from a DataLab v2.
Initialize a DataLabIdentifiersValidationReportJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.data_lab.compute_job.ValidationReportJob
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC
result
def result(
self,
) ‑> decentriq_platform.media.api.IdentifiersValidationReport
Get the result of the DataLabIdentifiersValidationReportJob.
Returns:
- The validation report
DataLabInterface
DataLabInterface(
)
Abstract base class defining the public API for DataLab implementations.
Both the new DataLab v2 and the legacy DataLab inherit from this interface, ensuring API compatibility between implementations.
Ancestors (in MRO)
- abc.ABC Descendants
- decentriq_platform.data_lab.data_lab.DataLab
deprovision_dataset
def deprovision_dataset(
self,
dataset_type: decentriq_platform.types.DataLabDatasetType,
) ‑> None
Deprovision a single dataset from the DataLab.
Parameters:
dataset_type: The type of the dataset.
get_statistics_report
def get_statistics_report(
self,
timeout: Optional[int] = None,
) ‑> Dict[str, Any]
Retrieve the statistics report.
This function will block until the report is ready unless a timeout is specified.
Parameters:
timeout: Maximum time to wait (in seconds) for the statistics report to become available.
Returns:
- The statistics result for the DataLab datasets.
get_validation_report
def get_validation_report(
self,
timeout: Optional[int] = None,
) ‑> Dict[str, Any]
Retrieve the validation report.
This function will block until the report is ready unless a timeout is specified.
Parameters:
timeout: Maximum time to wait (in seconds) for the validation report to become available.
Returns:
- The validation reports for all datasets as a dictionary.
is_validation_passed
def is_validation_passed(
self,
validation_report: Dict[str, Any],
) ‑> bool
Check whether DataLab validation has passed.
Parameters:
validation_report: Result of callingget_validation_reporton this DataLab.
Returns:
- True if validation passed, False otherwise.
provision_dataset
def provision_dataset(
self,
manifest_hash: str,
key: decentriq_platform.storage.Key,
dataset_type: decentriq_platform.types.DataLabDatasetType,
) ‑> None
Provision a single dataset to the DataLab.
Parameters:
manifest_hash: The manifest hash of the uploaded dataset.key: The key used to encrypt the dataset.dataset_type: The type of the dataset.
provision_local_datasets
def provision_local_datasets(
self,
key: decentriq_platform.storage.Key,
matching_data_path: str,
segments_data_path: Optional[str] = None,
demographics_data_path: Optional[str] = None,
embeddings_data_path: Optional[str] = None,
*,
secret_store_options: Optional[decentriq_platform.client.SecretStoreOptions] = None,
) ‑> None
Upload local datasets and provision to the DataLab.
Parameters:
key: The key used to encrypt the dataset.matching_data_path: The file path to the "matching" dataset.segments_data_path: The file path to the "segments" dataset.demographics_data_path: The file path to the "demographics" dataset.embeddings_data_path: The file path to the "embeddings" dataset.secret_store_options: Optional secret store configuration.
run
def run(
self,
/,
*,
dry_run: Optional[decentriq_platform.types.DryRunOptions] = None,
parameters: Optional[Mapping[str, str]] = None,
) ‑> None
Run the DataLab validation and statistics jobs.
This function kicks off the validation and statistics computation jobs.
It does not block waiting for the results. Instead, call
get_validation_report or get_statistics_report to retrieve results.
Parameters:
dry_run: Optional dry run configuration.parameters: Optional parameters for the computation.
DataLabSegmentsValidationReportJob
DataLabSegmentsValidationReportJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)
A compute job for retrieving segments validation reports from a DataLab v2.
Initialize a DataLabSegmentsValidationReportJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.data_lab.compute_job.ValidationReportJob
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC
result
def result(
self,
) ‑> decentriq_platform.media.api.ValidationReport
Get the result of the DataLabSegmentsValidationReportJob.
Returns:
- The validation report
DataLabStatisticsJob
DataLabStatisticsJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
)
A compute job for retrieving publisher data statistics from a DataLab v2.
Initialize a DataLabStatisticsJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC
result
def result(
self,
) ‑> Union[decentriq_platform.data_lab.api.StatisticsResult, decentriq_platform.data_lab.api.StatisticsResultV2]
Get the result of the DataLabStatisticsJob.
Automatically returns the appropriate version (V1 or V2) based on the format of the result data.
Returns:
- The statistics result (either StatisticsResult or StatisticsResultV2)
DataLabValidationReportsJob
DataLabValidationReportsJob(
data_lab_id: str,
client: decentriq_platform.client.Client,
has_segments: bool = True,
has_demographics: bool = False,
has_embeddings: bool = False,
)
A class to encapsulate the various validation report jobs for a DataLab v2.
This class does not inherit from ComputeJob because it encapsulates multiple jobs rather than being a single job itself.
Initialize a DataLabValidationReportsJob instance.
Parameters:
data_lab_id: The identifier for the DataLabclient: The client instance for API communicationhas_segments: Whether the DataLab has segments datahas_demographics: Whether the DataLab has demographics datahas_embeddings: Whether the DataLab has embeddings data
is_complete
def is_complete(
self,
) ‑> bool
Check if all validation jobs are complete.
Returns:
- True if all jobs are complete, False otherwise
result
def result(
self,
) ‑> decentriq_platform.data_lab.validation_reports.ValidationReports
Get the result of the DataLabValidationReportsJob.
Returns:
- The validation reports
run
def run(
self,
) ‑> None
Run all validation jobs.
wait_for_completion
def wait_for_completion(
self,
timeout: Optional[int] = None,
sleep_interval: int = 1,
) ‑> Self
Wait for all validation jobs to complete.
Parameters:
timeout: The maximum time to wait for the jobs to complete, in secondssleep_interval: The interval to wait between checks, in seconds
Returns:
- The compute job
IdType
IdType(
*args,
**kwds,
)
The type of identifier.
Members:
- STRING
- EMAIL
- HASHED_EMAIL
- PHONE_NUMBER
- HASHED_PHONE_NUMBER
- MAID
- HASHED_MAID
- UTIQ_MARTECH
- UTIQ_ADTECH
- FIRST_ID
- ID5
- NET_ID
- ONE_ID
- IPV4_ADDRESS
Ancestors (in MRO)
- builtins.str
- enum.Enum
IdentifiersConfig
IdentifiersConfig(
id_name: str,
id_type: decentriq_platform.types.IdType,
is_matching_id: bool,
is_activation_id: bool,
)
IdentifiersConfig(id_name: str, id_type: decentriq_platform.types.IdType, is_matching_id: bool, is_activation_id: bool)
to_dict
def to_dict(
self,
) ‑> Dict[str, Any]
StatisticsResult
StatisticsResult(
**data: Any,
)
Result of the compute statistics job (Legacy/V1).
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Ancestors (in MRO)
- pydantic.main.BaseModel
StatisticsResultV2
StatisticsResultV2(
**data: Any,
)
Result of the compute statistics job (V2).
This is the new structure that provides comprehensive statistics for data lab analysis including identifiers, segments, demographics, and embeddings.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Ancestors (in MRO)
- pydantic.main.BaseModel
ValidationReport
ValidationReport(
**data: Any,
)
!!! abstract "Usage Documentation" Models
A base class for creating Pydantic models.
Attributes:
class_vars: The names of the class variables defined on the model.
private_attributes: Metadata about the private attributes of the model.
signature: The synthesized __init__ [Signature][inspect.Signature] of the model.
__pydantic_complete__: Whether model building is completed, or if there are still undefined fields.
__pydantic_core_schema__: The core schema of the model.
__pydantic_custom_init__: Whether the model has a custom `__init__` function.
__pydantic_decorators__: Metadata containing the decorators defined on the model.
This replaces `Model.__validators__` and `Model.__root_validators__` from Pydantic V1.
__pydantic_generic_metadata__: A dictionary containing metadata about generic Pydantic models.
The `origin` and `args` items map to the [`__origin__`][genericalias.__origin__]
and [`__args__`][genericalias.__args__] attributes of [generic aliases][types-genericalias],
and the `parameter` item maps to the `__parameter__` attribute of generic classes.
__pydantic_parent_namespace__: Parent namespace of the model, used for automatic rebuilding of models.
__pydantic_post_init__: The name of the post-init method for the model, if defined.
__pydantic_root_model__: Whether the model is a [`RootModel`][pydantic.root_model.RootModel].
__pydantic_serializer__: The `pydantic-core` `SchemaSerializer` used to dump instances of the model.
__pydantic_validator__: The `pydantic-core` `SchemaValidator` used to validate instances of the model.
__pydantic_fields__: A dictionary of field names and their corresponding [`FieldInfo`][pydantic.fields.FieldInfo] objects.
__pydantic_computed_fields__: A dictionary of computed field names and their corresponding [`ComputedFieldInfo`][pydantic.fields.ComputedFieldInfo] objects.
__pydantic_extra__: A dictionary containing extra values, if [`extra`][pydantic.config.ConfigDict.extra]
is set to `'allow'`.
__pydantic_fields_set__: The names of fields explicitly set during instantiation.
__pydantic_private__: Values of private attributes set on the model instance.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be
validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
Ancestors (in MRO)
- pydantic.main.BaseModel
ValidationReportJob
ValidationReportJob(
data_lab_id: str,
action: decentriq_dcr_compiler._schemas.data_room_compute_action.DataRoomComputeAction,
client: decentriq_platform.client.Client,
)
A compute job for retrieving validation reports from a DataLab v2.
Initialize a ValidationReportJob instance.
Parameters:
data_lab_id: The identifier for the DataLabaction: The compute action to performclient: The client instance for API communication
Ancestors (in MRO)
- decentriq_platform.archv2.compute_job.ComputeJob
- abc.ABC Descendants
- decentriq_platform.data_lab.compute_job.DataLabDemographicsValidationReportJob
- decentriq_platform.data_lab.compute_job.DataLabEmbeddingsValidationReportJob
- decentriq_platform.data_lab.compute_job.DataLabIdentifiersValidationReportJob
- decentriq_platform.data_lab.compute_job.DataLabSegmentsValidationReportJob
get_validation_report
def get_validation_report(
self,
) ‑> decentriq_platform.media.api.ValidationReport
Get the validation report.
Returns:
- The validation report
ValidationReports
ValidationReports(
identifiers: decentriq_platform.media.api.IdentifiersValidationReport,
segments: Optional[decentriq_platform.media.api.ValidationReport] = None,
demographics: Optional[decentriq_platform.media.api.ValidationReport] = None,
embeddings: Optional[decentriq_platform.media.api.ValidationReport] = None,
)
A class to represent the validation reports for a DataLab v2.
Initialize a ValidationReports instance.
Parameters:
identifiers: The identifiers validation report (required)segments: The segments validation report (optional)demographics: The demographics validation report (optional)embeddings: The embeddings validation report (optional)
is_passed
def is_passed(
self,
) ‑> bool
Check if all validation reports passed.
Returns:
- True if all validation reports passed, False otherwise
model_dump_json
def model_dump_json(
self,
) ‑> str
Return a JSON string representation of the validation reports.
This is added to mimic the behaviour of Pydantic types to provide a consistent interface for the user.