Decentriq’s Data Clean Rooms support running arbitrary R scripts in confidential computing, guaranteeing a high level of security. You can use R to perform complex statistical analyses on sensitive data that is never revealed to anyone. It's also possible to use R Markdown and generate LaTeX documents.
The list of available libraries is not exhaustive.
How it works
The R enclave worker is available as a confidential computing containerized environment inside a Data Clean Room, that takes datasets as input, executes an arbitrary script and generates an output. The container does not have any port open, therefore performing HTTP requests or accessing external resources is not possible. All relevant files must be mounted in advance, as explained below.
At the moment only CPU processing is supported. Please make sure your script does not require GPU to execute.
Accessible read-only via the
Results of computations available in the Data Clean Room can be mounted to this directory. Once mounted, files are located at a specific path depending on the computation type.
- Python and R:
- SQL and Synthetic data:
SQL Computation named
salary_sum can be accessed at
dataset.csv file does not include headers.
Optionally, you can also mount your own static files to support your script.
Learn how to mount input files in the sections below, either via the Decentriq UI or the Python SDK.
Bring your existing R script.
Example logic to read, process and output data:
# Import libraries
# Read content from input file
table_data <- read.csv(file = "/input/table_name/dataset.csv", sep = ",", header=FALSE)
names(table_data) <- c("column_name_1", "column_name_2")
# Process data
results <- table_data %>%
column_name_2 = mean(column_name_2)
# Write resulting files to output folder
The input files are only available after the Data Clean Room is published and a dataset is provisioned. Therefore, when validating this script (before publishing) the input files will be empty.
To overcome errors during validation due to empty dataset, it's recommended to wrap the data processing logic into a
try/except statement to handle expected issues, or mount a test dataset to the container and have an
if clause to use it instead, in case the main dataset is empty.
/tmp directory is made available read/write during the script execution to support your logic. It will be wiped once the execution is completed, and will not be available in the output.
Accessible write-only via the
Write all resulting files of your computation to this directory. Sub-directories are also supported.
Once the execution is completed, the output becomes available as
<computation_name>.zip to be downloaded by users who have the required permissions.
Create a R computation in the Decentriq UI
Access platform.decentriq.com with your credentials
Create a Data Clean Room
In the Computations tab, add a new computation of type R and give it a name:
In the File browser on the right-side, mount the necessary input files (which will become available in the
/inputdirectory) by selecting existing computations, tables or files in the Data Clean Room:
In the Main script tab, paste your existing script and adapt the file paths based on the mounted files:
When clicking the
copyicon in front of each file in the file browser, you will get a snippet that imports it into a dataframe or file. Just paste it directly to your script.note
If necessary, add static text files to the container by clicking the
+icon next to the Main script tab. These files will be available in the
Test all computationsbutton to check for eventual errors in the script.
Once the Data Clean Room is configured with data, computations and permissions, press the
Encrypt and publishbutton.
As soon as the Data Clean Room is published, your computation should be available in the Overview tab, where you can press
Runand get the results:
Create a R computation using the Python SDK
decentriq_platform.container module provides functionality to run computations within containers. It enables, for example, the execution of R scripts within the trusted execution environment and the processing of both structured and unstructured input data.
How to use this functionality is illustrated in this example. It is also recommended to read through the Python SDK tutorial, as it introduces certain important concepts and terminology used in this example.
Assume we want to create a Data Clean Room that simply converts some text in an input file to uppercase. Using the Python SDK to accomplish this task could look as follows:
First, we set up a connection to an enclave and create a
import decentriq_platform as dq
import decentriq_platform.container as dqc
user_email = "firstname.lastname@example.org"
api_token = "@@ YOUR TOKEN HERE @@"
client = dq.create_client(user_email, api_token)
enclave_specs = dq.enclave_specifications.versions([
auth, _ = client.create_auth_using_decentriq_pki(enclave_specs)
session = client.create_session(auth, enclave_specs)
builder = dq.DataRoomBuilder(
We request the enclave specification named
"decentriq.r-latex-worker-32-32:v12" in addition to the driver enclave.
This enclave allows you to run R code and provides common libraries as part of its R environment.
The script to uppercase text contained in an input file could look like this:
my_script_content = b"""
df_lowercase <- read.csv("/input/lowercase.csv", header=FALSE)
df_uppercase <- lapply(df_lowercase, toupper)
write.table(df_uppercase, "/output/uppercase.csv", sep=",", col.names=FALSE, row.names = FALSE, quote = FALSE)
Here we defined the script within a multi-line string. For larger scripts, however, defining them in a file would likely be easier.
To use this script in a Data Clean Room, it first has to be loaded into a
StaticContent node, which is then added to the DCR configuration. This makes the R script visible to all the participants in the DCR.
# If you wrote your script in a separate file, you can simply open
# the file using `with open` (note that we specify the "b" flag to read
# the file as a binary string), like so:
# with open("my_script.py", "rb") as data:
# my_script_content = data.read()
script_node = dq.StaticContent("script_node", my_script_content)
script_node_id = builder.add_compute_node(script_node)
StaticContent node will not be tasked with running the computation, it simply provides the script to be executed. Before worrying about execution, however, we need to add a data node to which we can upload the input file whose content should be converted to uppercase:
data_node_id = builder.add_data_node("input_data_node", is_required=True)
Whenever we add a data or compute node, the builder object will assign an identifier to the newly added node. This identifier needs to be provided to the respective method whenever we want to interact with this node.
Now we can add the node that will actually execute our script. The compute node class capable of executing such scripts is called
When creating this node, we need to specify at which path the script should be made available (called "mounting") so that we can refer to it from within the enclave. The same also holds true for any input data that we provide either with additional
StaticContent nodes (note that such static content is visible to all participants of a DCR) or with data nodes, as we do here. This is achieved using
MountPoint objects, that live in the
proto namespace (these are low-level objects used in client-enclave communication). We also specify the output path, i.e. the directory in which we will store all of our output files. The Decentriq platform will automatically zip all the files in this location and provide them as the result of this computation.
Finally, we tell the platform what particular enclave to use for executing our script (remember that we requested the corresponding enclave specification earlier when creating the Data Clean Room builder object).
from decentriq_platform.container.proto import MountPoint
uppercase_csv_node = dqc.StaticContainerCompute(
The name given to the compute node has no meaning to the enclave and only serves as a human-readable name.
The enclave addresses computations (and data nodes) using identifiers that are automatically generated by the Data Clean Room builder object when adding the node to the Data Clean Room. These ids are required when we want to interact with the node (e.g. triggering the computation or referring to the computation when adding user permissions).
uppercase_csv_node_id = builder.add_compute_node(uppercase_csv_node)
data_room = builder.build()
data_room_id = session.publish_data_room(data_room)
After building and publishing the DCR, we can upload data and connect it to our input node.
key = dq.Key()
# Here again, you can use the Python construct `with open(path, "rb") as data`
# to read the data in the right format from a file.
data = io.BytesIO(b"hello,world")
dataset_id = client.upload_dataset(data, key, "myfile")
session.publish_dataset(data_room_id, dataset_id, data_node_id, key)
When retrieving results for the computation, you will get a binary file that represents a
zipfile.ZipFile object containing all the files you wrote to the specified
raw_result = session.run_computation_and_get_results(data_room_id, uppercase_csv_node_id)
zip_result = dqc.read_result_as_zipfile(raw_result)
result = zip_result.read("uppercase.csv").decode()
assert "HELLO,WORLD" in result
When the referenced Data Clean Room was created using the Decentriq UI:
compute_node_id argument of the
run_computation_and_get_results() will have the format
<NODE_ID> corresponds to the value that you see when hovering your mouse pointer over
the name of that computation.