Welcome Viewers, here is another tutorial of the basic project in Python. In this tutorial, we are going to scrap Flipkart.
WEB SCRAPING
Contents:
- Ways to Extract Data(API vs WebScraping)
- What is web scraping?
- Simple steps in extracting data
- -How to extract data from a website?
When we want to extract the data, we might hear some of the terms like Web Scraping and API. Yes, the main goal of API and web scraping is to extract data from web pages or websites.
Web Scraping allows you to extract the required data from a specific webpage. This can be done manually or by using web scraper software. Manually means we need to develop our own Web Scrapping software.
You can extract the required information and you can use it in your project or applications.
On the other API can provide access to extract data from applications, operating systems and other services.
for better understanding, Google Maps API is used by Uber and Swiggy to access location services, There are many weather API is used to extract the weather information.
Some API may cost-free, some API provides limited services and some may charge you for access to their Web pages.
Web Scraping:
Web Scraping is extracting the data that are available on webpages. It is a very simple process when you understand how the website is structured in terms of CSS and all.
Writing code to scrape the data is easily done but when understanding the structure like what part of the webpage we want to extract, you need to spend a lot of time on that.
For the first time, it may be difficult by practice you can easily scrape any webpages.
Packages:
The packages you need for web scraping is
requests
BeautifulSoup4
Here, requests are to perform all HTTP requests.
BeautifulSoup4 is known as the bs4 package used to handle all HTML processing.
Note: I recommend you to check the documentation of bs4, you can simply search for bs4 documentation.
To install these packages, open your terminal
pip install bs4
pip install requests
First of all, I have randomly taken one webpage that is Redmi Note 9 Pro Flipkart results as it is latest phone in the market
.Let check the snap, I want to extract Mobile name, Mobile Ratings & Reviews, and Mobile Price.Program :
from bs4 import BeautifulSoup as Soup
import requests
rl1='https://www.flipkart.com/search?q=redmi+note+9+pro&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_2_6_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_2_6_na_na_na&as-pos=2&as-type=RECENT&suggestionId=redmi+note+9+pro%7CMobiles&requestId=69a8d2c8-b769-4d0d-a46e-9fd399e5a39e&as-searchtext=redmi%20'
res=requests.get(rl1)
page_soup = Soup(res.text, "lxml")
con = page_soup.select('._1UoZlX')
for i in range(len(con)):
container = con[i]
print(container.div.img["alt"])
price = container.findAll("div", {"class": "niH0FQ"})
print(price[0].text)
rate = container.findAll("div", {"class": "_1vC4OE _2rQ-NK"})
print(rate[0].text)
Looking like a very small program?
Post a Comment