When it comes to web scraping, most developers are familiar with libraries like BeautifulSoup and requests for static HTML pages. However, modern websites often rely on JavaScript to dynamically load content, which can make scraping more challenging. In this article, we’ll explore how to scrape data from JavaScript-rendered pages using Selenium in Python, a powerful tool for automating browsers.


Step 1: Install Required Libraries

First, install the required libraries. We’ll need Selenium for interacting with the browser and ChromeDriver (or another driver) for automating Chrome.

To install Selenium, run:

pip install selenium

You'll also need to download ChromeDriver matching your Chrome version: Download ChromeDriver

Make sure chromedriver is in your system's PATH.


Step 2: Set Up Selenium to Load JavaScript Content

Now, let’s set up a simple script to open a webpage with Selenium and extract data after the JavaScript has loaded.

Here’s an example:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

# Set up Chrome options
options = webdriver.ChromeOptions()
options.add_argument('--headless')  # Run in headless mode (without opening browser)

# Initialize the WebDriver
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)

# Load the webpage
url = "https://example.com/dynamic-content"
driver.get(url)

# Wait for the page to load fully (you can use time.sleep or WebDriverWait)
driver.implicitly_wait(10)  # Wait for 10 seconds

# Extract data once JavaScript has rendered
content = driver.find_element(By.CLASS_NAME, "dynamic-class").text

print(content)

# Close the browser window
driver.quit()

Step 3: Handling Dynamic Content

One of the challenges of scraping JavaScript-heavy websites is waiting for the content to load. In the example above, implicitly_wait() is used to give the page time to render before extracting data.

However, for more precise control, you can use WebDriverWait to wait for specific elements to load:

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# Wait for a specific element to be visible
element = WebDriverWait(driver, 10).until(
    EC.visibility_of_element_located((By.CLASS_NAME, "dynamic-class"))
)

# Now extract the data
content = element.text

This approach waits until the element with class dynamic-class is visible before scraping its content.


Step 4: Scraping Multiple Pages

If you need to scrape multiple pages (e.g., paginate through content), you can use Selenium to click through pagination buttons or scroll through the page.

Example of clicking a "Next" button:

# Find the "Next" button and click it
next_button = driver.find_element(By.XPATH, "//button[@class='next']")
next_button.click()

# Wait for new content to load
driver.implicitly_wait(5)

Step 5: Extracting and Saving Data

Once the data is rendered by JavaScript, you can extract it and save it as you would with any other web scraping method. For example, saving the scraped data to a CSV:

import csv

# Assuming `content` is a list of data you scraped
data = [["Column 1", "Column 2"], ["Row 1, Column 1", "Row 1, Column 2"], ["Row 2, Column 1", "Row 2, Column 2"]]

# Write to CSV
with open("scraped_data.csv", mode="w", newline="") as file:
    writer = csv.writer(file)
    writer.writerows(data)

✅ Pros of Using Selenium for Web Scraping

  • 🧠 Handles JavaScript: Unlike BeautifulSoup, Selenium can render JavaScript, making it possible to scrape modern dynamic websites.
  • Full Browser Control: Selenium allows you to interact with the browser like a human, click buttons, submit forms, etc.
  • 💡 Headless Mode: Run your browser in the background without opening the UI, which is ideal for scraping at scale.

⚠️ Cons of Using Selenium for Web Scraping

  • 🐢 Slower than BeautifulSoup: Since it controls an actual browser, Selenium is slower compared to libraries like requests and BeautifulSoup.
  • Not always necessary: If the page content is static, you don't need Selenium. BeautifulSoup + requests will do just fine.
  • 💻 Requires WebDriver: Setting up Selenium requires managing a WebDriver (like ChromeDriver), which can be an extra step.

Summary

Selenium is a great tool when you need to scrape dynamic websites that render content with JavaScript. By automating the browser, you can retrieve and scrape data that would otherwise be inaccessible with traditional static scraping tools. While it’s slower than other options like BeautifulSoup, it’s a necessary tool when dealing with JavaScript-heavy pages.


For a much more extensive guide on web scraping with Python, covering advanced techniques like handling authentication, working with APIs, and managing scraping at scale, check out my full 23-page PDF on Gumroad. It's available for just $10:

Mastering Web Scraping with Python Like a Pro.

If this was helpful, you can support me here: Buy Me a Coffee