Hey, how are you doing? I hope you
are doing great😊. Welcome to another lesson my
friend. Today we will be discussing about a simplest yet a very efficient way
to find and eliminate duplicates from our list/database. Let’s get right into
it.
INTRODUCTION:
You may often want to locate duplicate items in a database or list. The method that pops up in our mind right away is to use nested loops and iterate through the list/database and check if the item already exists in the list. This method is definitely not wrong, but it increases the complexity of the program and also makes the execution of the program slower. As programmers, our job is to get the work done as efficiently as possible.
So, a more efficient and less complex way to solve this problem is using a set. In python, a set is a container like lists and tuples that contain various elements. A major characteristic of a set is that every element in the set is unique, which basically means it cannot store duplicate items. For example, if we initialize a set using the set() method:
my_set = set()
And add some elements to it:
my_set.add(‘Noob Code Pro’)
my_set.add(‘Self-taught’)
my_set.add(‘Programmer’)
This is what our set will look like:
{‘Noob Code Pro’, ‘Self-taught’, ‘Programmer’}
But, if we try to add an element that is already present in the set it will not be added:
my_set.add(‘Noob Code Pro’)
Our set still looks like before:
{‘Noob Code Pro’, ‘Self-taught’, ‘Programmer’}
USING A SET TO FIND DUPLICATES IN LIST/DATABASE:
Since, you have understood the “unique element” trait of sets, let us now discuss about how we can use this to find duplicates in a list or database.
The basic idea behind our algorithm would be to iterate through a list of elements and add each element to a set while checking the length of the set in every iteration. If the length of the set doesn’t change in some iteration, it means we have come across a duplicate item in the list which is why it wasn’t added to our set. We will add this item to a different list that will be printed in the end to show us the duplicates found in the list.
Let’s create a function to perform this task:
def find_dups(my_list):
a_set = set()
dups = []
for item in my_list:
initial_length = len(a_set)
a_set(item)
final_length = len(a_set)
if final_length == initial_length:
dups.append(item)
print(dups)
print(a_set)
In this function, we have initialized a set named ‘a_set’ to store the unique elements from the list provided to us. We have also defined an empty list, ‘dups’ to store the duplicate elements from the list. This list will be printed at the end to show us the duplicate items found in the list.
Now, we will simply iterate through the list to add the unique elements to our set and the duplicates to our ‘dups’ list. We will store the length of ‘a_set’ before the addition of a new element in variable, ‘initial_length’. The variable ‘final_length’ stores the length of ‘a_set’ after the new addition. If the length of the set remains unchanged, we will recognize that item as a duplicate and add it to our ‘dups’ list. Otherwise, the item is realized to be unique and will be added to ‘a_set’.
At the end, our ‘dups’ list will be printed containing the duplicate items and ‘a_set’ will be printed with all the unique items from the list.
CONCLUSION:
You can use this method to find duplicates in large databases while keeping the efficiency of the program intact.
Do you know a different and a more efficient way to solve this problem? Let me know in the comments below. Leave us a like if you found this method helpful, I would really appreciate J.
Have some queries or questions? You can always find me in the comments section, telegram channel or my Pinterest profile where you can personally talk to me and ask me questions about anything we have learnt so far.
If you are looking to join a
community of programmers, you can join Noob Code Pro’s official telegram channel for free.
Stay tuned for
another article next week, same time, where we will discuss about a new
topic/concept in programming, what they are, how they work and where you use
them. More cool stuff coming your way, DON’T MISS IT !! And I'll see you
next week. Goodbye and Good Luck :)
If you are a beginner, intermediate, advanced or just someone interested in programming, feel free to join our telegram channel and be among people like you:
And do you know the best part? Joining it is FREE !!!
So go ahead click on the link and I will see you there.
You can contact me personally through my email: code2learnofficial@gmail.com
or
HOPE YOU HAVE AN AWESOME DAY AHEAD !!!
0 Comments
Welcome to the comments section, this is where you can contact me personally for any doubts or feedback