Get started with Decentriq UI
Below, you will go through a practical example showing the following use-case:
A bank and an insurance provider want to know what is the overlap of their customer base, but they cannot share the CRM data with the other party. This is an example of a collaboration workflow that the Decentriq platform can make possible. Using the Decentriq platform, the parties can securely connect sensitive customer data while keeping them private, and run the overlap computation on them with a straightforward workflow
In this example, we will define the computations in SQL. You can use the following files to reproduce the steps below and create your first Data Clean Room:
Step 1 - Access the platform
- Navigate to https://platform.decentriq.com/
- Log in with your credentials. If you do not have any credentials yet, please contact your reference person of the Decentriq team
Step 2 - Create a Data Clean Room
- Click on New Data Clean Room button.
- Give a name. The example Data Clean Room will be on a 'confidential computing overlap' between a bank and an insurance provider.
- Here you can decide to start from scratch or to import a JSON template like the one in the 'simple example material' folder linked at the beginning of this page.
Step 3 - Define the datasets
Define the datasets to be provisioned by Data Owners:
- By default, datasets must be provisioned before allowing running computations that depend on them. This can be toggled via the checkbox.
- Add a new table when using structured datasets (CSV) and define the expected schema by adding columns with types.
- Add a new file when working with unstructured datasets (JSON, TXT, ZIP or any other kind).
Step 4 - Define the computations
These can be SQL, Python, R or Synthetic Data. In this example, we will use a SQL query to define a computation that calculates the overlap. For a list of supported data types and SQL clauses, check the SQL Computation section.
- Here, you can also set up the privacy settings. The purpose of the privacy settings is guaranteeing that the output does not leak sensitive data. In the example, the privacy filter is activated, which guarantees that a minimum amount of rows is aggregated when the output is shown.
- Type in the query content
- Use the Table browser for a quick reference of tables and columns available. Click the
copyicon in front of each item and paste it directly into the editor for a faster experience.
Additionally, you could create a new Synthetic Data computation, that takes a sensitive table as source and produces artificial data with the same data schema as the source:
- Mask the columns where the value should not appear in the results - these will be replaced with a random value of each type.
- All other columns will be synthesized using differential privacy while keeping similar statistical properties.
Add as many computations as you wish, combining different languages and referencing results from each other.
Once completed press the Test all computations button to make sure it will work once the Data Clean Room is published.
This will test the computation with empty datasets and only return the expected result schema.
After publishing, Data Owners can be provision datasets and the computation can be run.
Step 5 - Set permissions
Define the participants that will be invited to the collaboration, and assign them permissions to interact with the tables, files and/or computations:
- Enable Data Clean Room interactivity to allow participants to request the new computations (to be approved by the affected Data Owners) after it is published. Otherwise, the Data Clean Room will be immutable by default.
- Enable development environment to give participants access to a tab where they can run arbitrary computations based on data and computation results where they have permissions.
- Use the dropdown boxes to assign Data Owner and Analyst permissions to each participant on each dataset and computation.
- Add a new participant by typing in their email - an invitation will be sent as soon as the Data Clean Room is published.
Step 6 - Encrypt and publish the Data Clean Room
- Click the Encrypt and publish button at the top-right side.
- The Data Clean Room definition will be enforced in our confidential computing environment once published, and cannot be changed unless the interactivity feature is enabled.
- Note that you can duplicate the DCR, or export its definition in JSON format to save it offline at any moment.
- Now, participants can start collaborating in the published Data Clean Room.
Step 7 - Provision datasets and run computations
The Actions tab contains all datasets where you are a Data Owner of, and all computations where you have Analyst permissions. To see the entire DCR definition, please refer to the Overview tab.
- The Data Owners can provision datasets in CSV format to the tables, by following the guided wizard. Analysts can then run the computations and get the results back.
- You can find the necessary CSV's to run the example in the ZIP folder provided above. Once datasets are provisioned, you can run your computation.
- It is also possible to provision unstructured datasets if a file was defined in the Data tab when drafting the DCR.
- Once all necessary data is available, click the Run button of each computation and get the results.
Step 8 - Browse provisioned datasets
From the sidebar, access the overview of all your provisioned data in the Datasets page:
Here you can see the full list of the datasets you have uploaded, see to which Data Clean Rooms they are provisioned and also see some metadata:
- Size and number of rows
- Columns of the dataset
- Summary statistics that have been computed during the upload, if available.
Sharing summary statistics with other Data Clean Room participants is optional.
Step 9 - Check the tamper-proof audit log
All participants of DCR created via the UI have auditing permissions, for full transparency and to build trust among them:
This means that all of them can:
- Inspect the Data Clean Room definition
- Be aware of who uploaded data
- Access and download the audit log, that is the register of all the activities of the Data Clean Room with the user that performed it