SQL computation
Decentriq’s Data Clean Rooms are equipped with a own SQL engine, powered by confidential computing to guarantee a high level of security. You can query on provisioned datasets, and only results will be available. Sensitive data is never revealed to anyone. Results of SQL computations can be used as input to other computations such as Python, R and Synthetic Data Generation.
Current version
decentriq.sql-worker:v10
Datatypes
VARCHAR
INT
-BIGINT
FLOAT
NULL
values are supportedNULL
literal in expressions not yet supported
NOT NULL
annotation supported
Aggregation functions
SUM
AVG
MAX
MIN
COUNT
STDEV
STDEVP
Window and ranking functions
ROW_NUMBER
RANK
PERCENT_RANK
PERCENTILE_CONT
(can be used for MEAN)NTILE
RANDOM() OVER()
(To generate a random value)
Scalar functions
LENGTH(string)
-> To get the number of characters of a stringROUND( float_number , n_of_decimals[OPTIONAL] )
→ To round a float number to a give n. of decimals. If the second parameter is not given, it returns an integer. Please note that it can be applied only on columns and cannot be used on aggregation functions
Supported syntax clauses table
Function | Description | Syntax example |
---|---|---|
SELECT & FROM | Full support | SELECT a, b, c FROM TEST |
WILDCARD operator | supports "*" | SELECT * FROM test |
WHERE clause | Full support for WHERE clause | SELECT a FROM test WHERE b >= 4 |
GROUP BY | GROUP BY full support | SELECT column1, column 2, COUNT(*) FROM table GROUP BY column1, column 2 |
ORDER BY | You can now order the output rows by one or more columns | SELECT column1, column2, ... FROM table_name ORDER BY column1, column2, ... ASC|DESC |
JOINs: INNER, LEFT, RIGHT, FULL OUTER, CROSS | INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, CROSS JOIN for joining tables. only supports equality conditions for the ON operator. Rows with NULL keys are treated as if there was no match. | SELECT table1.column, table2.column FROM table1 LEFT JOIN table2 ON table1.key = table2.key |
Fuzzy matching | fuzzystrmatch(column1, column2, length, unique = false) where length is the Levenshtein distance and the optional unique controls if all or only the best match are returned. Supported on FROM statements. Supports AND clause. - If unique = false, all rows match where column1 and column2 have Levenshtein distance of at most length. - If unique = true, then for each element in column1 only the element in column2 with the lowest Levenshtein distance matches. If multiple have the same lowest Levenshtein distance, only the first match is returned. If all are larger than length, no element matches. | SELECT * FROM table1 INNER JOIN table2 ON table1.last_name = table2.last_name AND fuzzystrmatch(table1.first_name, table2.first_name, 4, true) |
UNION & UNION ALL | It is possible to combine different SELECT statement with same column number and similar column type | SELECT FROM table1 UNION SELECT FROM table 2 |
Aliasing of columns, tables and subqueries | Columns, tables and queries can be renamed with the 'AS' clause | SELECT a AS alias FROM TEST |
CAST operator | Cast operator - limited for: BIG_INT, DOUBLE and VARCHAR | SELECT CAST(10 as DOUBLE) FROM test |
CASE operator | CASE operator in SELECT clauses (must include an ELSE clause) | SELECT CASE WHEN a = '1' THEN 'it is one' WHEN a = '2' THEN 'it is two' ELSE 'i do not know' END FROM test |
IN operator | IN operator in the where clause - no support for subquery | SELECT a FROM test WHERE b IN (1, 2, 3) |
BETWEEN operator | BETWEEN operator in the where clause | SELECT a FROM test WHERE b BETWEEN 10 AND 20 |
Subquery in FROM statement | SELECT a FROM (SELECT b FROM test) | |
Subquery | Subquery using WITH clause | WITH subquery_table as (... SUBQUERY ....) SELECT ... FROM subquery_table |
DISTINCT | Support for the DISTINCT clause on the SELECT | SELECT DISTINCT a FROM table_A |
Privacy filter
Decentriq supports a k-anonymity filter feature for the SQL queries that are run in the Data Clean Rooms.
The K-anonymity technology forces the queries to output only aggregated values over a minimum amount of rows (the K parameter). All the other results will not be returned to the users and will remain unaccessible within the confidential computing environment.
Querying computation results
In the Data Clean Room, SQL queries can reference the output of other computations in the FROM
statement to compose nested queries. You can create independent computations or chains of them, using the output of one computation as the input of the next. This approach also allows for combining SQL and Python, as well as other types like Synthetic Data Generation. To use nested queries within the same block, you can use the WITH..AS()
approach, which will run the entire block at once.
# Computation_1 querying table1.
SELECT sum(col_a) AS sum_col_a
FROM table1
# Computation_2 querying Computation_1.
SELECT sum_col_a
FROM Computation_1
WHERE sum_col_a>5
Create a SQL computation in the Decentriq UI
Follow the Decentriq UI tutorial
Create a SQL computation using the Python SDK
Follow the Python SDK tutorial