Skip to main content

SQL computation

Decentriq’s Data Clean Rooms are equipped with a own SQL engine, powered by confidential computing to guarantee a high level of security. You can query on provisioned datasets, and only results will be available. Sensitive data is never revealed to anyone. Results of SQL computations can be used as input to other computations such as Python, R and Synthetic Data Generation.

Current version

decentriq.sql-worker:v10

Datatypes

  • VARCHAR
  • INT - BIGINT
  • FLOAT
  • NULL values are supported
    • NULL literal in expressions not yet supported
  • NOT NULL annotation supported

Aggregation functions

  • SUM
  • AVG
  • MAX
  • MIN
  • COUNT
  • STDEV
  • STDEVP

Window and ranking functions

  • ROW_NUMBER
  • RANK
  • PERCENT_RANK
  • PERCENTILE_CONT (can be used for MEAN)
  • NTILE
  • RANDOM() OVER() (To generate a random value)

Scalar functions

  • LENGTH(string) -> To get the number of characters of a string
  • ROUND( float_number , n_of_decimals[OPTIONAL] ) → To round a float number to a give n. of decimals. If the second parameter is not given, it returns an integer. Please note that it can be applied only on columns and cannot be used on aggregation functions

Supported syntax clauses table

FunctionDescriptionSyntax example
SELECT & FROMFull supportSELECT a, b, c FROM TEST
WILDCARD operatorsupports "*"SELECT * FROM test
WHERE clauseFull support for WHERE clauseSELECT a
FROM test
WHERE b >= 4
GROUP BYGROUP BY full supportSELECT column1, column 2, COUNT(*)
FROM table
GROUP BY column1, column 2
ORDER BYYou can now order the output rows by one or more columnsSELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ...
ASC|DESC
JOINs: INNER, LEFT, RIGHT, FULL OUTER, CROSSINNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, CROSS JOIN for joining tables. only supports equality conditions for the ON operator. Rows with NULL keys are treated as if there was no match.SELECT table1.column, table2.column
FROM table1 LEFT JOIN table2
ON table1.key = table2.key
Fuzzy matchingfuzzystrmatch(column1, column2, length, unique = false) where length is the Levenshtein distance and the optional unique controls if all or only the best match are returned. Supported on FROM statements. Supports AND clause.
- If unique = false, all rows match where column1 and column2 have Levenshtein distance of at most length.
- If unique = true, then for each element in column1 only the element in column2 with the lowest Levenshtein distance matches. If multiple have the same lowest Levenshtein distance, only the first match is returned. If all are larger than length, no element matches.
SELECT *
FROM table1 INNER JOIN table2
ON table1.last_name = table2.last_name AND
fuzzystrmatch(table1.first_name, table2.first_name, 4, true)
UNION & UNION ALLIt is possible to combine different SELECT statement with same column number and similar column typeSELECT FROM table1
UNION
SELECT
FROM table 2
Aliasing of columns, tables and subqueriesColumns, tables and queries can be renamed with the 'AS' clauseSELECT a AS alias
FROM TEST
CAST operatorCast operator - limited for: BIG_INT, DOUBLE and VARCHARSELECT CAST(10 as DOUBLE)
FROM test
CASE operatorCASE operator in SELECT clauses (must include an ELSE clause)SELECT CASE WHEN a = '1' THEN 'it is one' WHEN a = '2' THEN 'it is two' ELSE 'i do not know' END
FROM test
IN operatorIN operator in the where clause - no support for subquerySELECT a
FROM test
WHERE b IN (1, 2, 3)
BETWEEN operatorBETWEEN operator in the where clauseSELECT a
FROM test
WHERE b BETWEEN 10 AND 20
Subquery in FROM statementSELECT a FROM
(SELECT b FROM test)
SubquerySubquery using WITH clauseWITH subquery_table as (... SUBQUERY ....)
SELECT ... FROM subquery_table
DISTINCTSupport for the DISTINCT clause on the SELECTSELECT DISTINCT a
FROM table_A

Privacy filter

Decentriq supports a k-anonymity filter feature for the SQL queries that are run in the Data Clean Rooms.

The K-anonymity technology forces the queries to output only aggregated values over a minimum amount of rows (the K parameter). All the other results will not be returned to the users and will remain unaccessible within the confidential computing environment.

Querying computation results

In the Data Clean Room, SQL queries can reference the output of other computations in the FROM statement to compose nested queries. You can create independent computations or chains of them, using the output of one computation as the input of the next. This approach also allows for combining SQL and Python, as well as other types like Synthetic Data Generation. To use nested queries within the same block, you can use the WITH..AS() approach, which will run the entire block at once.

# Computation_1 querying table1.
SELECT sum(col_a) AS sum_col_a
FROM table1
# Computation_2 querying Computation_1.
SELECT sum_col_a
FROM Computation_1
WHERE sum_col_a>5

Create a SQL computation in the Decentriq UI

Follow the Decentriq UI tutorial

Create a SQL computation using the Python SDK

Follow the Python SDK tutorial