Data
In the Data tab, participants can provide the seed and base audiences and see the overlap results.
As soon as both seed and base audiences have been provided, the computations to calculate the overlap, insights statistics and audiences are started. Depending on the dataset sizes, this may take several hours. Note that changes to the seed and base audiences also trigger recomputations.
Providing the seed audience
To provide the seed audience, you need the Provide seed audience permission.
The seed audience is a list of users that should be matched with the base audience. The seed audience can include multiple audiences separated by an audienceType field.
Datasets must be prepared in CSV format, UTF-8 encoded, and without column headers or index columns. The seed audience data should include two columns in the correct order
- matchingId: This is the identifier to be matched with the base audience. This column must be of the matching ID type of this Media DCR (you can find this information in the Configuration tab). See the supported matching ID types and the formatting requirements therein.
- audienceType: Multiple seed audiences can be uploaded by grouping them in the audienceType column. This column is used to differentiate between different audience segments based on their interactions with advertisers or based on segments provided by data partners. This column is optional when uploading via the UI (the platform will fill it with a default value), but required when provisioning datasets using the SDK. Note that a given matchingId can appear in multiple audienceType rows, but that the combination of matchingId and audienceType must be unique; duplicates are removed automatically.
If the audienceType column is being used, empty values and 0 will be considered as separate segments.
Example seed audience dataset
| matchingId (matching ID type: Hashed email) | audienceType |
|---|---|
| e4191e26a5d04a3d8ae9008189a0db7168aba7e8b92944ae90c6… | Sneaker Campaign |
| fb3f59dbe51d42f3bf4f826ac5c22f1f13d2d9e464804539ad59… | Sneaker Campaign |
| fb3f59dbe51d42f3bf4f826ac5c22f1f13d2d9e464804539ad59… | Athleisure Campaign |
| .... | .... |
For the Media DCR to function, the overlap between the seed audience and the publisher (or data partner) must be at least 100 users. The quality of the lookalike model and the insights statistics improves greatly with an overlap larger than 1,500.
The overlap strongly depends on the size of the publisher's matchable audience. As a rule of thumb, aim for more than 10,000 users in the seed audience. This minimum requirement can be significantly lower with publishers that have large market penetration, even down to 1,000.
Provisioning the seed audience
When clicking on Provision dataset, you can upload a file directly or provision a dataset from the Dataset Portal. It is recommended to upload your dataset to the Dataset Portal first as it supports larger file sizes, automated pipelines, and allows provisioning the same dataset to multiple Media DCRs. See Data management for details.

Refreshing the seed audience
To refresh seed audience data:
- Click on the dropdown next to the dataset name you provisioned and select Deprovision dataset (you can also Delete dataset)
- Reupload via the UI or provision from Datasets as described above.
Providing the base audience
To provide the base audience, you need the Provide base audience permission.
To provide the base audience, you need to provision an existing compatible Datalab that contains your prepared datasets. See the Datalab documentation for information about dataset requirements, formatting rules, and examples.
A Datalab is compatible if:
- The datalab is validated
- The datalab includes the Matching table
- If the Media DCR supports the Insights collaboration type, the datalab must include the Segments table
- If the Media DCR supports the AI lookalike audiences collaboration type, the datalab must include either the Segments table or the Embeddings table
- If the Media DCR supports the Rule-based audiences collaboration type, the datalab must include the Segments table
Provisioning the base audience
- Click Provision Datalab
- Select a compatible and validated Datalab. To be compatible, the Datalab must have the same matching ID type as the Media DCR (see the Configuration tab for this information).

Your Datalab can be provisioned to multiple Media DCRs simultaneously, and you can create multiple Datalabs for different use cases or matching ID types.
Refreshing the base audience
To refresh base audience data in Datalabs:
- Upload new datasets to the Dataset Portal
- Optionally delete existing datasets from the Dataset Portal
- Deprovision existing datasets from your Datalab(s) and provision the newly uploaded datasets
- Deprovision the existing Datalab from the Media DCR and reprovision the same Datalab (this propagates the new datasets to the Media DCR)
For frequent data refresh (both seed and base audiences), Decentriq recommends using the Python SDK to automate these steps.
View overlap
To see the overlap, you need the View overlap permission.
After base and seed audiences have been provided (see below), the overlap calculation is triggered. After it has completed, the overlap chart is shown. It is possible to select the seed audience (audienceType) for which the chart is displayed. Overlap statistics are only shown if they include at least 100 users. Seed audiences (audienceType) which are too small cannot be selected.
