Data science, data analysis, and other fields in programming might sometimes require you to download some data from the internet. It could be downloading a file or an entire webpage, and if you don’t know how to do it, your project could suffer a huge loss in time and efficiency as you’ll have to resort to manually gathering that data one at a time.
But obviously, if you wield the power of programming and if you have been a programmer for even a week, you know we try to do a task in the most efficient way possible. So, in this lesson, we’ll learn to write a python program that can download any file/web page directly off the internet.
Let’s start coding….
‘requests’ MODULE
This module lets us easily download files from the web without complicated issues such as Network Errors, Action Problems and Data Compressions.
It is a third-party module so it needs to be installed first using ‘pip’ just like any other third-party module. If you need help with installing third-party modules, here is the help.
HOW TO DOWNLOAD A FILE?
To download a file, use the following code:
res = requests.get(‘<url>’)
In the actual code, replace <url>, with the link to the webpage you want to download. In the picture above, I used ‘https://www.google.com’, you can use any url you want.
The ‘get()’ function returns a response object, which is why it should be used through a variable.
A response object is the response that the web server gives you for the request.
We can check if the request succeeded by using the ‘status_code’ attribute of the response object.
print(res.status_code)
If the code prints ‘200’, it means everything went fine. If it returns ‘404’, that is because the file you requested could not be found.
If the request was successful the downloaded webpage is stored as a string in the response object’s test variable.
res.text
OPENING A DOWNLOADED FILE
Opening a file follows the same procedure as opening any other file, but you must open the file in ‘Write Binary’ mode, so you must pass ‘wb’ as the second argument.
We open the file in the ‘Write Binary’ mode instead of normal text mode to preserve the Unicode encoding.
So, the syntax to opening a downloaded file will be as follows:
playfile = open(‘Hello.txt’, ‘wb’)
The above code will create a new file named ‘Hello.txt’ (if it doesn’t exist already), and open it in ‘Write Binary’ mode.
Now, we want to transfer the content downloaded from the web to this file using response object’s ‘iter_content’ attribute in the following way:
for chunk in res.iter_content(100000):
playful.write(chunk)
playfile.close()
‘100000’ is the number of bytes that should be covered per iteration.
Now the file ‘Hello.txt’ has all the contents of the downloaded file and you can access that content by opening ‘Hello.txt’ in read mode any time you want.
This is how the ‘Hello.txt’ file looks like after the above code is executed:
CONCLUSION
This way you can download any webpage you want and store it in your computer for future use. We’ll use this extracted data in more advanced lessons of web scraping and data analysis.
Give us a LIKE 👍 and FOLLOW, if you like our content and want more. We won’t let you down😉
If you have a doubt, query or any feedback, I would love to learn about it in the comments below👇
You can chat with me and other programmers from our community by joining our telegram group Follow the link and become an important and valued part of our programming community today. See you there!!!
I’ll be back with some new knowledge very soon, until then…..
HAVE AN AMAZING DAY !!!!
0 Comments
Welcome to the comments section, this is where you can contact me personally for any doubts or feedback