Supported data sources
Decentriq supports importing data from the following external data sources:
- Amazon S3
- Azure Blob Storage
- Google Cloud Storage
- Snowflake
- Salesforce
- Permutive
Amazon S3, Azure Blob and Google Cloud Storage
Storage Bucket configuration such as name, region and object are required, as well as the credentials. The imported object will be stored as is, without any data manipulation. Once imported, the dataset can be provisioned to any DCR for further collaboration.
Import from Amazon S3 via the Decentriq UI
After accessing the Datasets
page from the sidebar and clicking the Import from external source
button on the top right corner, select Amazon S3
from the available options:
Fill in all fields with the required configuration and access credentials to the S3 Bucket:
The importing process will start. Once in progress, the browser window can be closed as the task will still be running on the server side.
Once the process is succeeded, the dataset is available in the Datasets page and can be provisioned the multiple Data Clean Rooms.
Import from Amazon S3 via the Python SDK
To import datasets programmatically using the Decentriq Python SDK, please follow the Import Datasets Cookbook.
Snowflake
The following steps must be performed on the Snowflake platform prior to importing the data:
- Create a file format compatible with Decentriq datasets.
CREATE OR REPLACE FILE FORMAT decentriq_csv
TYPE = 'CSV'
FIELD_DELIMITER = ','
FIELD_OPTIONALLY_ENCLOSED_BY='0X22'
COMPRESSION = NONE
- Create a Snowflake internal stage. It is important that Snowflake server side encryption (
SNOWFLAKE_SSE
) is enabled.
CREATE OR REPLACE STAGE decentriq_export_stage ENCRYPTION = (TYPE = 'SNOWFLAKE_SSE');
- Unload the data from a database table into the previously created stage.
COPY INTO @decentriq_export_stage from DB_NAME.DB_SCHEMA.TABLE_NAME FILE_FORMAT = decentriq_csv;
- Access the stage and verify that the data has been unloaded. One or more CSV files should be present. An example list looks like:
data_0_0_0.csv
data_0_1_0.csv
data_0_1_1.csv
Once the above steps are completed, fill in the required fields in the Decentriq UI and start the importing process. This will create a single dataset containing the contents of all the staged files. Once complete, the dataset is ready to be provisioned to any DCR for further collaboration.
Salesforce
This will import any object from Salesforce and store it as a dataset in the Decentriq platform. Note: The imported file/dataset will be stored as is, without any data manipulation.
Permutive
This will import Publisher datasets from Permutive via a Cloud Storage service and store them as datasets in the Decentriq platform. These datasets can then be provisioned directly to Decentriq Data Labs for Publishers. Please prepare your datasets in advance for importing.