BeautifulSoup报"ValueError: invalid literal for int() with base 10: 'None' "异常的原因以及解决办法

在使用BeautifulSoup进行网页解析时，有时会出现报错信息”ValueError: invalid literal for int() with base 10: ‘None’ “，这个错误通常是由于对某个标签进行了属性提取，但实际上该标签并没有该属性而导致的。具体来说，当我们使用bs4中的find_all函数获取网页的某个元素时，如果该元素不存在特定的属性，那么BeautifulSoup处理该标签返回的结果就是None，而如果我们继续对该结果进行操作，就会出现”invalid literal for int()”这样的报错。

下面给出一个实例，用来详细说明这个问题。

假设我们要爬取一个手机展示网站上的商品信息，其中包括商品的名字、价格和链接。我们可以使用如下代码进行爬取：

import requests
from bs4 import BeautifulSoup

url = 'http://www.mobile.com/products'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')

contents = soup.find_all('div', class_='product')

for content in contents:
    name = content.find('div', class_='name').text.strip()
    price = content.find('div', class_='price').text.strip()
    link = content.find('a', href=True)['href']
    print(name, price, link)

这里我们要提取的属性有’name’、’price’和’href’，但有些商品可能没有’price’属性，因此当运行到price = content.find('div', class_='price').text.strip()这行代码时，会出现上述报错信息。为了解决这个问题，我们可以加入一个判断条件，即如果没有找到特定属性，则将改属性的值赋为None，代码如下：

import requests
from bs4 import BeautifulSoup

url = 'http://www.mobile.com/products'
response = requests.get(url)
html = response.text
soup = BeautifulSoup(html, 'html.parser')

contents = soup.find_all('div', class_='product')

for content in contents:
    name = content.find('div', class_='name').text.strip()
    price_tag = content.find('div', class_='price')
    if price_tag is None:
        price = None
    else:
        price = price_tag.text.strip()
    link = content.find('a', href=True)['href']
    print(name, price, link)

这样我们就能避免”invalid literal for int()”的问题，保证程序的正常运行了。在实际使用中，我们需要根据不同情况进行修改代码，以保证程序不会出现类似的报错。

你可能也喜欢

BeautifulSoup报”AttributeError: ‘ResultSet’ object has no attribute ‘strip’ “异常的原因以及解决办法

BeautifulSoup报”TypeError: ‘NoneType’ object has no attribute ‘getitem’ “异常的原因以及解决办法

BeautifulSoup报”TypeError: argument of type ‘NoneType’ is not iterable “异常的原因以及解决办法