Skip to main content

decentriq_util.sql

Functions

read_sql_data

def read_sql_data(
data_path: str,
schema_path: str,
)> pandas.core.frame.DataFrame

Read data as output by the SQL validation node.

The input file must not contain a header row!

Empty values for columns of type TEXT are replaced with empty strings.

Empty values in columns of type INT64 are read as the "pandas.NA" value. For more information, please refer to the pandas documentation.

read_sql_data_from_dir

def read_sql_data_from_dir(
path: str,
)> pandas.core.frame.DataFrame

Note: this function is deprecated, use the function read_tabular_data.

This function uses the previous read_sql_data function to convert the standard format of validated tabular input datasets and the output of SQL computations into a pandas dataframe, automatically importing the headers the are contained in the types file as well. One example below:

import decentriq_util.sql as dqu
import pandas as pd

dataframe_1 = dqu.read_sql_data_from_dir("/input/mySQLComputation")
dataframe_2 = dqu.read_sql_data_from_dir("/input/myTabularDataset")

read_tabular_data

def read_tabular_data(
path: str,
)> pandas.core.frame.DataFrame

Read data from a tabular input node or as written by an SQL computation.

The input file must not contain a header row!

Empty values for columns of type TEXT are replaced with empty strings.

Empty values in columns of type INT64 are read as the "pandas.NA" value. For more information, please refer to the pandas documentation.

write_sql_data

def write_sql_data(
df: pandas.core.frame.DataFrame,
data_path: str,
schema_path: str,
)

Write a Pandas dataframe to the given path, and store its schema including column names at schema_path with the protobuf standard used in the cross-workers interactions.

write_sql_data_to_dir

def write_sql_data_to_dir(
df: pandas.core.frame.DataFrame,
path: str,
)

Note: this function is deprecated, use the function write_tabular_data.

Write a Pandas dataframe into the given directory in a format compatible with downstream SQL computations. This function allows you to use the output of a python computation as input to a SQL computation. To do so, you will need to use /output as path.

For example:

import decentriq_util.sql as dqu
import pandas as pd

my_dataframe = pd.DataFrame()
...
dqu.write_sql_data_to_dir(my_dataframe, "/output")

write_tabular_data

def write_tabular_data(
df: pandas.core.frame.DataFrame,
path: str = '/output',
)

Write a Pandas dataframe into the given directory in a format compatible with downstream SQL computation. This function allows you to use the output of a python computation as input to a SQL computation. For example:

import decentriq_util.sql as dqu
import pandas as pd

my_dataframe = pd.DataFrame()
...
dqu.write_tabular_data(my_dataframe)

Classes

ColumnType

ColumnType(
*args,
**kwargs,
)

A ProtocolMessage

Ancestors (in MRO)

  • google.protobuf.pyext._message.CMessage
  • google.protobuf.message.Message Instance variables

nullable : Field compute_sql.ColumnType.nullable

primitiveType : Field compute_sql.ColumnType.primitiveType

NamedColumn

NamedColumn(
*args,
**kwargs,
)

A ProtocolMessage

Ancestors (in MRO)

  • google.protobuf.pyext._message.CMessage
  • google.protobuf.message.Message Instance variables

columnType : Field compute_sql.NamedColumn.columnType

name : Field compute_sql.NamedColumn.name

TableSchema

TableSchema(
*args,
**kwargs,
)

A ProtocolMessage

Ancestors (in MRO)

  • google.protobuf.pyext._message.CMessage
  • google.protobuf.message.Message Instance variables

namedColumns : Field compute_sql.TableSchema.namedColumns