Skip to main content

Data

In the Data tab, participants can provide the seed and base audiences and see the overlap results.

As soon as both seed and base audiences have been provided, the computations to calculate the overlap, insights statistics and audiences are started. Depending on the dataset sizes, this may take several hours. Note that changes to the seed and base audiences also trigger recomputations.

Providing the seed audience

note

To provide the seed audience, you need the Provide seed audience permission.

The seed audience is a list of users that should be matched with the base audience. The seed audience can include multiple audiences separated by an audienceType field.

Datasets must be prepared in CSV format, UTF-8 encoded, and without column headers or index columns. The seed audience data should include two columns in the correct order

  • matchingId: This is the identifier to be matched with the base audience. This column must be of the matching ID type of this Media DCR (you can find this information in the Configuration tab). See the supported matching ID types and the formatting requirements therein.
  • audienceType: Multiple seed audiences can be uploaded by grouping them in the audienceType column. This column is used to differentiate between different audience segments based on their interactions with advertisers or based on segments provided by data partners. This column is optional when uploading via the UI (the platform will fill it with a default value), but required when provisioning datasets using the SDK. Note that a given matchingId can appear in multiple audienceType rows, but that the combination of matchingId and audienceType must be unique; duplicates are removed automatically.
danger

If the audienceType column is being used, empty values and 0 will be considered as separate segments.

Example seed audience dataset
matchingId (matching ID type: Hashed email)audienceType
e4191e26a5d04a3d8ae9008189a0db7168aba7e8b92944ae90c6…Sneaker Campaign
fb3f59dbe51d42f3bf4f826ac5c22f1f13d2d9e464804539ad59…Sneaker Campaign
fb3f59dbe51d42f3bf4f826ac5c22f1f13d2d9e464804539ad59…Athleisure Campaign
........
Seed audience sizing

For the Media DCR to function, the overlap between the seed audience and the publisher (or data partner) must be at least 100 users. The quality of the lookalike model and the insights statistics improves greatly with an overlap larger than 1,500.

The overlap strongly depends on the size of the publisher's matchable audience. As a rule of thumb, aim for more than 10,000 users in the seed audience. This minimum requirement can be significantly lower with publishers that have large market penetration, even down to 1,000.

Provisioning the seed audience

When clicking on Provision dataset, you can upload a file directly or provision a dataset from the Dataset Portal. It is recommended to upload your dataset to the Dataset Portal first as it supports larger file sizes, automated pipelines, and allows provisioning the same dataset to multiple Media DCRs. See Data management for details.

Provision seed audience

Refreshing the seed audience

To refresh seed audience data:

  1. Click on the dropdown next to the dataset name you provisioned and select Deprovision dataset (you can also Delete dataset)
  2. Reupload via the UI or provision from Datasets as described above.

Providing the base audience

note

To provide the base audience, you need the Provide base audience permission.

To provide the base audience, you need to provision an existing compatible Datalab that contains your prepared datasets. See the Datalab documentation for information about dataset requirements, formatting rules, and examples.

A Datalab is compatible if:

  • The datalab is validated
  • The datalab includes the Matching table
  • If the Media DCR supports the Insights collaboration type, the datalab must include the Segments table
  • If the Media DCR supports the AI lookalike audiences collaboration type, the datalab must include either the Segments table or the Embeddings table
  • If the Media DCR supports the Rule-based audiences collaboration type, the datalab must include the Segments table

Provisioning the base audience

  1. Click Provision Datalab
  2. Select a compatible and validated Datalab. To be compatible, the Datalab must have the same matching ID type as the Media DCR (see the Configuration tab for this information).
Provision base audience

Your Datalab can be provisioned to multiple Media DCRs simultaneously, and you can create multiple Datalabs for different use cases or matching ID types.

Refreshing the base audience

To refresh base audience data in Datalabs:

  1. Upload new datasets to the Dataset Portal
    • Optionally delete existing datasets from the Dataset Portal
  2. Deprovision existing datasets from your Datalab(s) and provision the newly uploaded datasets
  3. Deprovision the existing Datalab from the Media DCR and reprovision the same Datalab (this propagates the new datasets to the Media DCR)
tip

For frequent data refresh (both seed and base audiences), Decentriq recommends using the Python SDK to automate these steps.

View overlap

note

To see the overlap, you need the View overlap permission.

After base and seed audiences have been provided (see below), the overlap calculation is triggered. After it has completed, the overlap chart is shown. It is possible to select the seed audience (audienceType) for which the chart is displayed. Overlap statistics are only shown if they include at least 100 users. Seed audiences (audienceType) which are too small cannot be selected.

View overlap