pyeasyeda.summary_suggestions

Module Contents

Functions

summary_suggestions(df, threshold=0.8)

Takes in a pandas dataframe and returns a list object comprising

pyeasyeda.summary_suggestions.summary_suggestions(df, threshold=0.8)[source]

Takes in a pandas dataframe and returns a list object comprising of 3 dataframes and a list. The dataframes correspond to the summary statistics of numeric and categorical variables each and the proportion of unique values for categorical variables. The nested list is of the categorical variables that exceed the threshold for considering dropping variables with high unique values.

Parameters
  • df (pandas dataframe) – Dataframe to be examined

  • threshold (float) – threshold for considering dropping variables with high unique values

Returns

results – List of summary dataframes

Return type

list

Examples

>>> summary_suggestions(df)

[ (summary statistics for numeric variables), (summary statistics for categorical variables), (percentage of unique values for categorical variables), [list of variables with percentage of unique values higher than the threshold] ]