You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# • axis: axis takes an int or string value for rows/columns. Input can be 0 or 1 for Integer and ‘index’ or ‘columns’ for String.
# • how: how takes string value of two kinds only (‘any’ or ‘all’). ‘any’ drops the row/column if ANY value is Null and ‘all’ drops only if ALL values are null.
# • thresh: thresh takes integer value which tells minimum amount of na values to drop.
# • subset: It’s an array which limits the dropping process to passed rows/columns through list.
# • inplace: It is a boolean which makes the changes in data frame itself if True.
# Pandas dropna() method allows the user to analyze and drop Rows/Columns with Null values in different ways.
df.dropna(inplace=True)
# Example 01: Dropping rows with atleast one null value. It returns a new dataframe with the results.
newDF=df.dropna(axis=0, how='any')
# Example 02: Changing axis and using 'how' and 'inplace' parameters.
# Drops columns with all null values only.
df.dropna(axis=1, how='all', inplace=True) # instead of number as value for axis, you can specify 'columns', which does the same.
# Example 03: Using the threshold value.
# Keep only the rows with atleast 2 non NA values.
# Pairwise deletion analyses all cases in which the variables of interest are present and thus maximizes all data available by an analysis basis.
# A strength to this technique is that it increases power in your analysis but it has many disadvantages. It assumes that the missing data are MCAR.
# If you delete pairwise then you’ll end up with different numbers of observations contributing to different parts of your model, which can make interpretation difficult.
# Example 01: Using the columns that are needed for analysis.
# Define the list of columns to look for missing values. Here we are using two columns for analysis, which we specify as parameter.
#
# Pairwise deletion is an alternative to listwise deletion to mitigate the loss of data.
# Hence for your analysis in this example, all cases with available data on Age and Political affiliation will be included regardless of the missing values for other variables like gender, income or education.
# It is always better to keep data than to discard it. Sometimes you can drop variables if the data is missing for more than 60% observations but only if that variable is insignificant.
# Having said that, imputation is always a preferred choice over dropping variables
deldf['Name'] # Delete the name variables
df.drop('Age', axis=1, inplace=True) # Same as above. We are dropping or deleting the Age variable in this case.