Web Scraping simple tutorial (Scraping Flipkart.com)

Welcome Viewers, here is another tutorial of the basic project in Python. In this tutorial, we are going to scrap Flipkart.

WEB SCRAPING

Contents:

Ways to Extract Data(API vs WebScraping)
What is web scraping?
Simple steps in extracting data
-How to extract data from a website?

When we want to extract the data, we might hear some of the terms like Web Scraping and API. Yes, the main goal of API and web scraping is to extract data from web pages or websites.

Web Scraping allows you to extract the required data from a specific webpage. This can be done manually or by using web scraper software. Manually means we need to develop our own Web Scrapping software.

You can extract the required information and you can use it in your project or applications.

On the other API can provide access to extract data from applications, operating systems and other services.

for better understanding, Google Maps API is used by Uber and Swiggy to access location services, There are many weather API is used to extract the weather information.

Some API may cost-free, some API provides limited services and some may charge you for access to their Web pages.

Web Scraping:

Web Scraping is extracting the data that are available on webpages. It is a very simple process when you understand how the website is structured in terms of CSS and all.

Writing code to scrape the data is easily done but when understanding the structure like what part of the webpage we want to extract, you need to spend a lot of time on that.

For the first time, it may be difficult by practice you can easily scrape any webpages.

Packages:

The packages you need for web scraping is

requests

BeautifulSoup4

Here, requests are to perform all HTTP requests.

BeautifulSoup4 is known as the bs4 package used to handle all HTML processing.

Note: I recommend you to check the documentation of bs4, you can simply search for bs4 documentation.

To install these packages, open your terminal

pip install bs4

pip install requests

First of all, I have randomly taken one webpage that is Redmi Note 9 Pro Flipkart results as it is latest phone in the market

.Let check the snap, I want to extract Mobile name, Mobile Ratings & Reviews, and Mobile Price.

Program :

from bs4 import BeautifulSoup as Soup
import requests
rl1='https://www.flipkart.com/search?q=redmi+note+9+pro&sid=tyy%2C4io&as=on&as-show=on&otracker=AS_QueryStore_OrganicAutoSuggest_2_6_na_na_na&otracker1=AS_QueryStore_OrganicAutoSuggest_2_6_na_na_na&as-pos=2&as-type=RECENT&suggestionId=redmi+note+9+pro%7CMobiles&requestId=69a8d2c8-b769-4d0d-a46e-9fd399e5a39e&as-searchtext=redmi%20'

res=requests.get(rl1)
page_soup = Soup(res.text, "lxml")


con = page_soup.select('._1UoZlX')

for i in range(len(con)):
    container = con[i]
    print(container.div.img["alt"])
    price = container.findAll("div", {"class": "niH0FQ"})

    print(price[0].text)
    rate = container.findAll("div", {"class": "_1vC4OE _2rQ-NK"})
    print(rate[0].text)

Looking like a very small program?

Explanation:

First, you need to import requests and bs4 package.

1. Sending request to the webpage using requests and all the website code is stored in res variable.

2. Using bs4, process all the HTML data

3. Selecting required class, here all the required attributes are present in class "._1UoZIX"

4. From the class, you need to select the <div> in which the required attributes are present. Here the snaps of classes Ratings and price.

By this process, you can scrap any webpages. Web Scraping is an important project for beginners. Comment, if this tutorial is understandable or not, your feedback will help me to write more interesting projects.

If you have any doubts, comment. Please subscribe to Always Code for more interesting projects.

Always Code