当前位置：首页 > 网站优化 >

阅读词频统计详解，能快速掌握优化技巧吗？

GG网络技术分享 2025-11-13 04:37 25

，

python import re from collections import Counter import matplotlib.pyplot as plt

def getwordcounts: with open as f: text = f.read words = re.findall) wordcounts = Counter return wordcounts

files =

allwordcounts = {}

for file in files: wordcounts = getwordcounts allwordcounts = wordcounts

for file, wordcounts in allwordcounts.items: print toptenwords = wordcounts.mostcommon for word, count in topten_words: print print

def plotwordcounts: labels, values = zip indexes = range) plt.bar plt.xticks plt.title plt.show

for file, wordcounts in allwordcounts.items: plotword_counts

在这玩意儿脚本中，我们先说说定义了一个get_word_counts函数，它收下一个文件名作为参数，读取文件内容，并用正则表达式re.findall来找到全部单词。然后它用collections.Counter来计算个个单词的出现次数。

接下来我们创建了一个文件列表files，包含了全部需要统计词频的文件名。然后我们遍历这玩意儿列表，调用get_word_counts函数，并将后来啊存储在all_word_counts字典中。

再说说我们遍历这玩意儿字典，输出个个文件中出现频率Zui高大的前10个单词，并且用matplotlib库为个个文件绘制词频统计图。

请注意，此脚本虚假定全部文本文件dou位于与脚本相同的目录中，并且文件编码为UTF-8。Ru果文件编码不同，你兴许需要调整open函数中的encoding参数。

标签：

网站优化