python利用多种方式来统计词频（单词个数）

针对题目中提到的问题，我们可以使用多种方法来统计单词个数，下面我会详细讲解一下这个过程并且提供两个示例。

一、读取并分割文本

在开始统计单词个数之前，首先需要读取文本文件并将其划分成单词。这可以通过以下步骤完成：

打开文本文件，例如使用open()函数

python with open('filename.txt', 'r') as f: text = f.read()

将文本转换为小写，并去除标点符号和空格等无关字符

“`python
import re # 导入正则表达式模块

text = text.lower() # 转为小写
text = re.sub(‘[^\w\s]’, ”, text) # 去除标点符号
words = text.split() # 分割单词
“`

二、使用字典统计单词个数

Python中的字典是一种可变的容器，可存储键值对，可以利用字典统计单词出现次数。具体实现步骤如下：

创建一个空字典

python word_counts = {}

遍历单词列表，如果字典中不存在该单词，则添加一个键值对，键为该单词，值为1；否则将该单词对应字典中的值加1

python for word in words: if word not in word_counts: word_counts[word] = 1 else: word_counts[word] += 1

输出结果

python for word, count in word_counts.items(): print(f'{word}: {count}')

三、使用标准库collections.Counter统计单词个数

Python标准库collections中提供了Counter类，可以方便实现单词计数。具体实现步骤如下：

导入Counter类

python from collections import Counter

使用Counter类统计单词

python word_counts = Counter(words)

输出结果

python for word, count in word_counts.items(): print(f'{word}: {count}')

示例一：

我们可以使用上述两种方法分别统计《哈利波特》的英文原版小说中前100个单词的出现次数。以下是代码实现：

# 方法一：使用字典统计单词
with open('harry_potter.txt', 'r') as f:
    text = f.read()
text = text.lower()
text = re.sub('[^\w\s]', '', text)
words = text.split()
word_counts = {}

for word in words:
    if len(word) > 1:
        if word not in word_counts:
            word_counts[word] = 1
        else:
            word_counts[word] += 1

sorted_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)
for word, count in sorted_words[:100]:
    print(f'{word}: {count}')

# 方法二：使用collections.Counter统计单词
from collections import Counter

with open('harry_potter.txt', 'r') as f:
    text = f.read()
text = text.lower()
text = re.sub('[^\w\s]', '', text)
words = text.split()
word_counts = Counter(words)

sorted_words = sorted(word_counts.items(), key=lambda x: x[1], reverse=True)
for word, count in sorted_words[:100]:
    print(f'{word}: {count}')

示例二：

假设我们有一个字符串，需要统计其中某一个单词出现的个数，我们可以使用以下代码实现：

string = 'Python is a high-level programming language'
word = 'Python'
count = string.count(word)

print(f'{word}: {count}')

输出结果为：

Python: 1

一、读取并分割文本

二、使用字典统计单词个数

三、使用标准库collections.Counter统计单词个数

你可能也喜欢

python实现跨excel的工作表sheet之间的复制方法

在Python中对Hermite_e系列进行微分

NumPy数组的基础知识