Python3爬虫中Selenium的用法详解

Selenium是一个自动化测试工具，可以模拟用户在浏览器中的操作，如点击、输入、滚动等。在Python3爬虫中，Selenium可以用于模拟浏览器行为，实现动态网页的爬取。本文将为您详细讲解Python3爬虫中Selenium的用法，包括Selenium的安装、使用方法、常用API等。过程中提供两个示例说明。

Selenium的安装

在Python3中，可以使用pip命令安装Selenium库。以下是安装Selenium的命令：

pip install selenium

Selenium的使用方法

以下是使用Selenium模拟浏览器行的示例代码：

from selenium import webdriver

# 创建浏览器对象
browser = webdriver.Chrome()

# 打开网页
browser.get('https://www.baidu.com')

# 查找元素并操作
input_box = browser.find_element_by_id('kw')
input_box.send_keys('Python')
submit_button = browser.find_element_by_id('su')
submit_button.click()

# 关闭浏览器
browser.quit()

在上面的代码中，我们使用Selenium库创建了一个Chrome浏览器对象，并使用get()方法打开了百度首页。然后，我们使用find_element_by_id()方法查找了搜索框和搜索按钮，并使用send_keys()方法输入了搜索关键字，最后使用click()方法点击了搜索按钮。最后，我们使用quit()方法关闭了浏览器。

常用API

以下是Selenium库中常用的API：

webdriver.Chrome()：创建Chrome浏览器对象。
browser.get(url)：打开指定的网页。
browser.find_element_by_id(id)：查找指定id的元素。
element.send_keys(text)：在元素中输入指定的文本。
element.click()：点击元素。
browser.quit()：关闭浏览器。

示例说明

示例一

以下是一个Python，它使用Selenium模拟浏览器行为，爬取了豆瓣电影Top250的数据。

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://movie.douban.com/top250')

movies = []
while True:
    # 查找电影列表
    movie_list = browser.find_elements_by_css_selector('.grid_view .item')
    for movie in movie_list:
        title = movie.find_element_by_css_selector('.title').text
        rating = movie.find_element_by_css_selector('.rating_num').text
        movies.append({'title': title, 'rating': rating})

    # 查找下一页按钮
    next_button = browser.find_element_by_css_selector('.next a')
    if 'disabled' in next_button.get_attribute('class'):
        break

    # 点击下一页按钮
    next_button.click()

browser.quit()

for movie in movies:
    print(movie['title'], movie['rating'])

在上面的代码中，我们使用Selenium模拟浏览器行为，打开了豆瓣电影Top250的网页。然后，我们使用find_elements_by_css_selector()方法查找电影列表，并使用find_element_by_css_selector()方法查找电影的标题和评分。我们将电影的标题和评分保存到一个列表中。最后，我们使用print()函数打印了电影的标题和评分。

示例二

以下是一个Python程序，它使用Selenium模拟浏览器行为，爬取了淘宝商品的数据。

from selenium import webdriver

browser = webdriver.Chrome()
browser.get('https://www.taobao.com')

# 查找搜索框并输入关键字
input_box = browser.find_element_by_id('q')
input_box.send_keys('Python')
submit_button = browser.find_element_by_css_selector('.btn-search')
submit_button.click()

products = []
while True:
    # 查找商品列表
    product_list = browser.find_elements_by_css_selector('.J_MouserOnverReq')
    for product in product_list:
        title = product.find_element_by_css_selector('.title').text
        price = product.find_element_by_css_selector('.price').text
        products.append({'title': title, 'price': price})

    # 查找下一页按钮
    next_button = browser.find_element_by_css_selector('.J_Ajax.num.icon-tag')
    if 'J_Disabled' in next_button.get_attribute('class'):
        break

    # 点击下一页按钮
    next_button.click()

browser.quit()

for product in products:
    print(product['title'], product['price'])

在上面的代码中，我们使用Selenium模拟浏览器行为，打开了宝的网页。然后，我们使用find_element_by_id()方法查找搜索框，并使用send_keys()方法输入了搜索关键字。我们使用find_element_by_css_selector()方法查找商品列表，并使用find_element_by_css_selector()方法查找商品的标题和。我们将商品的标题和价格保存到一个列表中。最后，我们使用print()函数打印了商品的标题和价格。

总结

本文为您详细讲解了Python3爬虫中Selenium的用法，包括Selenium的安装、使用方法、常用API等。通过学习本文，您可以更好地掌握Selenium的使用技巧，提高自己的爬虫能力。

Python3爬虫中Selenium的用法详解

Selenium的安装

Selenium的使用方法

常用API

示例说明

示例一

示例二

总结

你可能也喜欢

利用python汇总统计多张Excel

python使用pandas处理大数据节省内存技巧（推荐）

Python查找算法之插补查找算法的实现