decentriq_util.sql
Functions
read_sql_data
def read_sql_data(
data_path: str,
schema_path: str,
) ‑> pandas.core.frame.DataFrame
Read data as output by the SQL validation node.
The input file must not contain a header row!
Empty values for columns of type TEXT are replaced with empty strings.
Empty values in columns of type INT64 are read as the "pandas.NA" value. For more information, please refer to the pandas documentation.
read_sql_data_from_dir
def read_sql_data_from_dir(
path: str,
) ‑> pandas.core.frame.DataFrame
Note: this function is deprecated, use the function read_tabular_data
.
This function uses the previous read_sql_data
function to convert
the standard format of validated tabular input datasets and the output
of SQL computations into a pandas dataframe, automatically importing the
headers the are contained in the types
file as well. One example below:
import decentriq_util.sql as dqu
import pandas as pd
dataframe_1 = dqu.read_sql_data_from_dir("/input/mySQLComputation")
dataframe_2 = dqu.read_sql_data_from_dir("/input/myTabularDataset")
read_tabular_data
def read_tabular_data(
path: str,
) ‑> pandas.core.frame.DataFrame
Read data from a tabular input node or as written by an SQL computation.
The input file must not contain a header row!
Empty values for columns of type TEXT are replaced with empty strings.
Empty values in columns of type INT64 are read as the "pandas.NA" value. For more information, please refer to the pandas documentation.
write_sql_data
def write_sql_data(
df: pandas.core.frame.DataFrame,
data_path: str,
schema_path: str,
)
Write a Pandas dataframe to the given path, and store its schema including column names at
schema_path
with the protobuf standard used in the cross-workers interactions.
write_sql_data_to_dir
def write_sql_data_to_dir(
df: pandas.core.frame.DataFrame,
path: str,
)
Note: this function is deprecated, use the function write_tabular_data
.
Write a Pandas dataframe into the given directory in a format compatible
with downstream SQL computations. This function allows you to use the output of
a python computation as input to a SQL computation.
To do so, you will need to use /output
as path.
For example:
import decentriq_util.sql as dqu
import pandas as pd
my_dataframe = pd.DataFrame()
...
dqu.write_sql_data_to_dir(my_dataframe, "/output")
write_tabular_data
def write_tabular_data(
df: pandas.core.frame.DataFrame,
path: str = '/output',
)
Write a Pandas dataframe into the given directory in a format compatible with downstream SQL computation. This function allows you to use the output of a python computation as input to a SQL computation. For example:
import decentriq_util.sql as dqu
import pandas as pd
my_dataframe = pd.DataFrame()
...
dqu.write_tabular_data(my_dataframe)
Classes
ColumnType
ColumnType(
*args,
**kwargs,
)
A ProtocolMessage
Ancestors (in MRO)
- google.protobuf.pyext._message.CMessage
- google.protobuf.message.Message Instance variables
nullable
: Field compute_sql.ColumnType.nullable
primitiveType
: Field compute_sql.ColumnType.primitiveType
NamedColumn
NamedColumn(
*args,
**kwargs,
)
A ProtocolMessage
Ancestors (in MRO)
- google.protobuf.pyext._message.CMessage
- google.protobuf.message.Message Instance variables
columnType
: Field compute_sql.NamedColumn.columnType
name
: Field compute_sql.NamedColumn.name
TableSchema
TableSchema(
*args,
**kwargs,
)
A ProtocolMessage
Ancestors (in MRO)
- google.protobuf.pyext._message.CMessage
- google.protobuf.message.Message Instance variables
namedColumns
: Field compute_sql.TableSchema.namedColumns