datto.CleanDataframe
- class datto.CleanDataframe[source]
Clean data using NLP, regex, calculations, etc.
- __init__()
Methods
__init__
()batch_merge_operation
(df_1, df_2, ...)Merge two Pandas DataFrame in chunks for faster processing.
batch_pandas_operation
(df, num_splits, ...)Use a function on a Pandas DataFrame in chunks for faster processing.
Rename all columns to use underscores to reference columns without bracket formatting
compress_df
(df)Compresses each dataframe column as much as possible depending on type and values.
fill_nulls_using_regression_model
(X_train, ...)Trains a regression model on non-null data and predicts values to fill in nulls
fix_col_data_type
(df, col, desired_dt)Change column datatype using the best method for each type.
lematize
(text)- param text
make_uuid
(id_num)Make a UUid_num from a text string
Remove columns with the same name
In order to obtain the main text of an email only, this method removes greetings, signoffs, and signatures by identifying sentences with less than 5% verbs to drop.
remove_links
(text)- param text
remove_names
(text)- param text
remove_pii
(text)Remove common patterns of personally identifiable information (PII) :param text: :type text: str