Skip to main content

Supported data sources

Decentriq supports importing data from the following external data sources:

  • Amazon S3
  • Azure Blob Storage
  • Google Cloud Storage
  • Snowflake
  • Salesforce
  • Permutive

Amazon S3, Azure Blob and Google Cloud Storage

Storage Bucket configuration such as name, region and object are required, as well as the credentials. The imported object will be stored as is, without any data manipulation. Once imported, the dataset can be provisioned to any DCR for further collaboration.

Import from Amazon S3 via the Decentriq UI

After accessing the Datasets page from the sidebar and clicking the Import from external source button on the top right corner, select Amazon S3 from the available options:


Import dataset sources
Fill in all fields with the required configuration and access credentials to the S3 Bucket:
S3 import configuration
The importing process will start. Once in progress, the browser window can be closed as the task will still be running on the server side.
S3 import in progress
Once the process is succeeded, the dataset is available in the Datasets page and can be provisioned the multiple Data Clean Rooms.
S3 imported dataset

Import from Amazon S3 via the Python SDK

To import datasets programmatically using the Decentriq Python SDK, please follow the Import Datasets Cookbook.

Snowflake

The following steps must be performed on the Snowflake platform prior to importing the data:

  1. Create a file format compatible with Decentriq datasets.
CREATE OR REPLACE FILE FORMAT decentriq_csv
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY='0X22'
COMPRESSION = NONE
  1. Create a Snowflake internal stage. It is important that Snowflake server side encryption (SNOWFLAKE_SSE) is enabled.
CREATE OR REPLACE STAGE decentriq_export_stage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
  1. Unload the data from a database table into the previously created stage.
COPY INTO @decentriq_export_stage from DB_NAME.DB_SCHEMA.TABLE_NAME FILE_FORMAT = decentriq_csv;
  1. Access the stage and verify that the data has been unloaded. One or more CSV files should be present. An example list looks like:
data_0_0_0.csv
data_0_1_0.csv
data_0_1_1.csv

Once the above steps are completed, fill in the required fields in the Decentriq UI and start the importing process. This will create a single dataset containing the contents of all the staged files. Once complete, the dataset is ready to be provisioned to any DCR for further collaboration.

Salesforce

This will import any object from Salesforce and store it as a dataset in the Decentriq platform. Note: The imported file/dataset will be stored as is, without any data manipulation.

Permutive

This will import Publisher datasets from Permutive via a Cloud Storage service and store them as datasets in the Decentriq platform. These datasets can then be provisioned directly to Decentriq Data Labs for Publishers. Please prepare your datasets in advance for importing.