pyeasyeda.summary_suggestions
Module Contents
Functions
|
Takes in a pandas dataframe and returns a list object comprising |
- pyeasyeda.summary_suggestions.summary_suggestions(df, threshold=0.8)[source]
Takes in a pandas dataframe and returns a list object comprising of 3 dataframes and a list. The dataframes correspond to the summary statistics of numeric and categorical variables each and the proportion of unique values for categorical variables. The nested list is of the categorical variables that exceed the threshold for considering dropping variables with high unique values.
- Parameters
df (pandas dataframe) – Dataframe to be examined
threshold (float) – threshold for considering dropping variables with high unique values
- Returns
results – List of summary dataframes
- Return type
list
Examples
>>> summary_suggestions(df)
[ (summary statistics for numeric variables), (summary statistics for categorical variables), (percentage of unique values for categorical variables), [list of variables with percentage of unique values higher than the threshold] ]