FILLING MISSING ROWS AND COLUMNS | HANDLING MISSING DATA | NOOB CODE PRO

Hey fellow programmer! Welcome back! This is a continuation to a previous lesson, where we learnt how to handle missing data by dropping rows and columns. In this lesson, we will handle missing data by filling rows and columns.

A quick note, if you haven’t read the last article on handling missing data, you might wanna do that first because it will introduce you to Python’s ‘Pandas’ library that we will be using here. Now, let us learn how to handle missing data by filling those missing spaces with data:

REPLACING MISSING VALUES WITH VALUES OF YOUR CHOICE:

One way to get rid of missing values is to fill those spaces with values of your choice. You can use the fillna()method to replace missing values with any value you want. Let us say you have a dataframe with columns ‘Age’ and ‘Name’. You can use the fillna()method to fill missing values in these two columns with some specific value. 

df.fillna({‘Age’ : 0, ‘Name’ : ‘Unknown’}, inplace = True)


The above line of code will replace the missing values in the ‘Age’ column with ‘0’ and the missing values in the ‘Name’ column will be replaced with ‘Unknown’.

This way you can ensure that every row in these columns has some value and there are no missing or NaN values in these columns.

FORWARD AND BACKWARD FILLING:

If you have four rows and the second row is missing a value. You can fill the second row with the same value as the first or the third row. Take a look at the pictures below:


In the first picture, we filled the second row in the ‘Name’ column with ‘Adams’ which is the value from the first row/the row preceding it. This is called Forward Filling.

In the second picture, we filled it with ‘Jacobs’ from the third row/the row succeeding it, this is what we call Backward Filling.

Forward and Backward filling data is a great way to handle missing data. You can do this with the fillna()method in the following way:

df.Name.fillna(method = ‘ffill’, inplace = True)


We have specified to use the ffill’ (Forward Fill) method to fill the missing values in the ‘Name’ column. Similarly, we can use bfillfor Backward Filling. 

df.Name.fillna(method = ‘bfill’, inplace = True)


CONCLUSION:

Handling missing data by filling values of your choice is just great. This way you can actually control how much or less of an impact unknown/missing data can have on your model/algorithm.

Which method of handling missing data do you think is better, dropping entire rows and columns or filling rows and columns with values? Share your opinion in the comments below! 

Leave us a like if you found this article helpful and do share it with your friends who hate missing values in their gathered data ;) 

Want to join a community of programmers? Join Noob Code Pro’s Official Telegram Group and learn from other programmers’ experiences and share your own experiences with them. Programmers of all levels and languages are welcome, feel free to invite your programming buddies as well. A community works better with people in it ;)

See you next week with a new lesson to share with you, until then….

HAVE AN AWESOME DAY !!!


Post a Comment

0 Comments