2022年 11月 7日

Python 自然语言处理入门

今天生成的数据中有很大一部分是非结构化的。非结构化数据包括社交媒体评论、浏览历史记录和客户反馈。您是否发现自己处于需要分析大量文本数据的情况,却不知道如何进行?Python 中的自然语言处理可以提供帮助。

本教程的目标是让您能够通过自然语言处理 (NLP) 的概念在 Python 中分析文本数据。您将首先学习如何将文本标记为更小的块,将单词标准化为其根形式,然后删除文档中的任何噪音,为进一步分析做好准备。

让我们开始吧!

先决条件

在本教程中,我们将使用 Python 的nltk库对文本执行所有 NLP 操作。在编写本教程时,我们使用的是 3.4 版本的nltk. 要安装库,您可以pip在终端上使用命令:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">pip install nltk<span style="color:#89ddff">==</span><span style="color:#f78c6c">3.4</span>
  2. </code></span></span>

要检查系统中有哪个版本nltk,可以将库导入 Python 解释器并检查版本:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">import nltk
  2. print<span style="color:#c792ea">(</span>nltk<span style="color:#c792ea">.</span>__version__<span style="color:#c792ea">)</span>
  3. </code></span></span>

要执行nltk本教程中的某些操作,您可能需要下载特定资源。我们将在需要时描述每个资源。

但是,如果您想避免在教程后面下载单个资源并立即获取它们,请运行以下命令:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash">python -m nltk.downloader all
  2. </code></span></span>

第 1 步:转换为代币

计算机系统本身无法在自然语言中找到意义。处理自然语言的第一步是将原始文本转换为标记。记号是连续字符的组合,具有一定的含义。由您决定如何将句子分解为标记。例如,一个简单的方法是用空格分割一个句子以将其分解为单个单词。

在 NLTK 库中,您可以使用该word_tokenize()函数将字符串转换为标记。但是,您首先需要下载punkt资源。在终端中运行以下命令:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">nltk<span style="color:#c792ea">.</span>download<span style="color:#c792ea">(</span><span style="color:#c3e88d">'punkt'</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

接下来,您需要导入word_tokenizefromnltk.tokenize才能使用它:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk<span style="color:#c792ea">.</span>tokenize import word_tokenize
  2. print<span style="color:#c792ea">(</span>word_tokenize<span style="color:#c792ea">(</span><span style="color:#c3e88d">"Hi, this is a nice hotel."</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  3. </code></span></span>

代码的输出如下:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash"><span style="color:#c792ea">[</span><span style="color:#c3e88d">'Hi'</span>, <span style="color:#c3e88d">','</span>, <span style="color:#c3e88d">'this'</span>, <span style="color:#c3e88d">'is'</span>, <span style="color:#c3e88d">'a'</span>, <span style="color:#c3e88d">'nice'</span>, <span style="color:#c3e88d">'hotel'</span>, <span style="color:#c3e88d">'.'</span><span style="color:#c792ea">]</span>
  2. </code></span></span>

您会注意到,word_tokenize它不仅根据空格简单地拆分字符串,而且还将标点符号拆分为标记。如果您想在分析中保留标点符号,这取决于您。

第 2 步:将单词转换为其基本形式

在处理自然语言时,您经常会注意到同一个词有多种语法形式。例如,“go”、“going”和“gone”是同一个动词“go”的形式。

虽然项目的必要性可能要求您保留各种语法形式的单词,但让我们讨论一种将同一单词的各种语法形式转换为其基本形式的方法。您可以使用两种技术将单词转换为其基础。

第一种技术是词干。词干提取是一种从单词中删除词缀的简单算法。NLTK中有多种词干提取算法可供使用。我们将在本教程中使用 Porter 算法。

我们首先PorterStemmernltk.stem.porter. 接下来,我们将词干分析器初始化为stemmer变量,然后使用该.stem()方法找到一个单词的基本形式:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk<span style="color:#c792ea">.</span>stem<span style="color:#c792ea">.</span>porter import PorterStemmer
  2. stemmer <span style="color:#89ddff">=</span> PorterStemmer<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span>
  3. print<span style="color:#c792ea">(</span>stemmer<span style="color:#c792ea">.</span>stem<span style="color:#c792ea">(</span><span style="color:#c3e88d">"going"</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  4. </code></span></span>

上面代码的输出是go. 如果您为上述其他形式的“go”运行词干分析器,您会注意到词干分析器返回相同的基本形式“go”。但是,由于词干提取只是一种基于去除词缀的简单算法,因此当词在语言中不太常用时,它就会失败。

例如,当您尝试对单词“构成”进行词干分析时,它会给出不直观的结果:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">print<span style="color:#c792ea">(</span>stemmer<span style="color:#c792ea">.</span>stem<span style="color:#c792ea">(</span><span style="color:#c3e88d">"constitutes"</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

您会注意到输出是“构成”。

这个问题通过转向一种更复杂的方法来解决给定上下文中单词的基本形式。该过程称为词形还原。词形还原根据文本的上下文和词汇对单词进行规范化。在 NLTK 中,您可以使用WordNetLemmatizer类对句子进行词形还原。

首先,您需要wordnet从 Python 终端中的 NLTK 下载器下载资源:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">nltk<span style="color:#c792ea">.</span>download<span style="color:#c792ea">(</span><span style="color:#c3e88d">'wordnet'</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

下载后,您需要导入WordNetLemmatizer该类并对其进行初始化:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk<span style="color:#c792ea">.</span>stem<span style="color:#c792ea">.</span>wordnet import WordNetLemmatizer
  2. lem <span style="color:#89ddff">=</span> WordNetLemmatizer<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span>
  3. </code></span></span>

要使用词形还原器,请使用.lemmatize()方法。它需要两个参数:单词和上下文。在我们的示例中,我们将使用“v”作为上下文。在查看方法的输出之后,让我们进一步探索上下文.lemmatize()

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">print<span style="color:#c792ea">(</span>lem<span style="color:#c792ea">.</span>lemmatize<span style="color:#c792ea">(</span><span style="color:#c3e88d">'constitutes'</span><span style="color:#c792ea">,</span> <span style="color:#c3e88d">'v'</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

您会注意到该.lemmatize()方法正确地将单词“构成”转换为其基本形式“构成”。您还会注意到词形还原比词干提取花费的时间更长,因为算法更复杂。

.lemmatize()让我们检查如何以编程方式确定方法的第二个参数。NLTK 具有pos_tag()帮助确定句子中单词上下文的功能。但是,您首先需要averaged_perceptron_tagger通过 NLTK 下载器下载资源:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">nltk<span style="color:#c792ea">.</span>download<span style="color:#c792ea">(</span><span style="color:#c3e88d">'averaged_perceptron_tagger'</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

接下来,导入pos_tag()函数并在一句话上运行:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk<span style="color:#c792ea">.</span>tag import pos_tag
  2. sample <span style="color:#89ddff">=</span> <span style="color:#c3e88d">"Hi, this is a nice hotel."</span>
  3. print<span style="color:#c792ea">(</span>pos_tag<span style="color:#c792ea">(</span>word_tokenize<span style="color:#c792ea">(</span>sample<span style="color:#c792ea">)</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  4. </code></span></span>

您会注意到输出是对的列表。每对都由一个标记及其标记组成,它表示整个文本中标记的上下文。请注意,标点符号的标签本身就是:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash"><span style="color:#c792ea">[</span><span style="color:#c792ea">(</span><span style="color:#c3e88d">'Hi'</span>, <span style="color:#c3e88d">'NNP'</span><span style="color:#c792ea">)</span>,
  2. <span style="color:#c792ea">(</span><span style="color:#c3e88d">','</span>, <span style="color:#c3e88d">','</span><span style="color:#c792ea">)</span>,
  3. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'this'</span>, <span style="color:#c3e88d">'DT'</span><span style="color:#c792ea">)</span>,
  4. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'is'</span>, <span style="color:#c3e88d">'VBZ'</span><span style="color:#c792ea">)</span>,
  5. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'a'</span>, <span style="color:#c3e88d">'DT'</span><span style="color:#c792ea">)</span>,
  6. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'nice'</span>, <span style="color:#c3e88d">'JJ'</span><span style="color:#c792ea">)</span>,
  7. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'hotel'</span>, <span style="color:#c3e88d">'NN'</span><span style="color:#c792ea">)</span>,
  8. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'.'</span>, <span style="color:#c3e88d">'.'</span><span style="color:#c792ea">)</span><span style="color:#c792ea">]</span>
  9. </code></span></span>

你如何解码每个令牌的上下文?以下是Web 上所有标签及其对应含义的完整列表。请注意,所有名词的标签都以“N”开头,所有动词的标签都以“V”开头。我们可以在.lemmatize()方法的第二个参数中使用此信息:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">def <span style="color:#82aaff">lemmatize_tokens</span><span style="color:#c792ea">(</span>stentence<span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  2. lemmatizer <span style="color:#89ddff">=</span> WordNetLemmatizer<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span>
  3. lemmatized_tokens <span style="color:#89ddff">=</span> <span style="color:#c792ea">[</span><span style="color:#c792ea">]</span>
  4. for word<span style="color:#c792ea">,</span> tag in pos_tag<span style="color:#c792ea">(</span>stentence<span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  5. if tag<span style="color:#c792ea">.</span>startswith<span style="color:#c792ea">(</span><span style="color:#c3e88d">'NN'</span><span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  6. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'n'</span>
  7. elif tag<span style="color:#c792ea">.</span>startswith<span style="color:#c792ea">(</span><span style="color:#c3e88d">'VB'</span><span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  8. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'v'</span>
  9. else<span style="color:#c792ea">:</span>
  10. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'a'</span>
  11. lemmatized_tokens<span style="color:#c792ea">.</span>append<span style="color:#c792ea">(</span>lemmatizer<span style="color:#c792ea">.</span>lemmatize<span style="color:#c792ea">(</span>word<span style="color:#c792ea">,</span> pos<span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  12. return lemmatized_tokens
  13. sample <span style="color:#89ddff">=</span> <span style="color:#c3e88d">"Legal authority constitutes all magistrates."</span>
  14. print<span style="color:#c792ea">(</span>lemmatize_tokens<span style="color:#c792ea">(</span>word_tokenize<span style="color:#c792ea">(</span>sample<span style="color:#c792ea">)</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span>
  15. </code></span></span>

上面代码的输出如下:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash"><span style="color:#c792ea">[</span><span style="color:#c3e88d">'Legal'</span>, <span style="color:#c3e88d">'authority'</span>, <span style="color:#c3e88d">'constitute'</span>, <span style="color:#c3e88d">'all'</span>, <span style="color:#c3e88d">'magistrate'</span>, <span style="color:#c3e88d">'.'</span><span style="color:#c792ea">]</span>
  2. </code></span></span>

该输出是预期的,其中“构成”和“地方法官”已分别转换为“构成”和“地方法官”。

第三步:数据清理

准备数据的下一步是清理数据并删除任何不会对您的分析增加意义的内容。从广义上讲,我们将着眼于从您的分析中删除标点符号和停用词。

删除标点符号是一项相当容易的任务。该库的punctuation对象string包含所有英文标点符号:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">import string
  2. print<span style="color:#c792ea">(</span>string<span style="color:#c792ea">.</span>punctuation<span style="color:#c792ea">)</span>
  3. </code></span></span>

此代码段的输出如下:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash">'<span style="color:#89ddff">!</span>"#$%<span style="color:#89ddff">&</span>amp<span style="color:#c792ea">;</span><span style="color:#c792ea">\</span>'<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span>*+,-./:<span style="color:#c792ea">;</span><span style="color:#89ddff">&</span>lt<span style="color:#c792ea">;</span><span style="color:#89ddff">=</span><span style="color:#89ddff">&</span>gt<span style="color:#c792ea">;</span>?@<span style="color:#c792ea">[</span><span style="color:#c792ea">\</span><span style="color:#c792ea">\</span><span style="color:#c792ea">]</span>^_`<span style="color:#c792ea">{</span><span style="color:#89ddff">|</span><span style="color:#c792ea">}</span>~'
  2. </code></span></span>

为了从标记中删除标点符号,您可以简单地运行以下命令:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">for token in tokens<span style="color:#c792ea">:</span>
  2. if token in string<span style="color:#c792ea">.</span>punctuation<span style="color:#c792ea">:</span>
  3. <span style="color:#697098"># Do something</span>
  4. </code></span></span>

接下来,我们将专注于删除停用词。停用词是语言中的常用词,如“I”、“a”和“the”,在分析文本时对文本的意义不大。因此,我们将从分析中删除停用词。首先,stopwords从 NLTK 下载器下载资源:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">nltk<span style="color:#c792ea">.</span>download<span style="color:#c792ea">(</span><span style="color:#c3e88d">'stopwords'</span><span style="color:#c792ea">)</span>
  2. </code></span></span>

stopwords下载完成后,导入nltk.corpus并使用.words()以“english”为参数的方法。这是英语中 179 个停用词的列表:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk<span style="color:#c792ea">.</span>corpus import stopwords
  2. stop_words <span style="color:#89ddff">=</span> stopwords<span style="color:#c792ea">.</span>words<span style="color:#c792ea">(</span><span style="color:#c3e88d">'english'</span><span style="color:#c792ea">)</span>
  3. </code></span></span>

我们可以将词形还原示例与本节中讨论的概念结合起来创建以下函数clean_data()。此外,在比较一个词是否是停用词列表的一部分之前,我们将其转换为小写。这样,如果停用词出现在句子的开头并且大写,我们仍然会捕获它:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">def <span style="color:#82aaff">clean_data</span><span style="color:#c792ea">(</span>tokens<span style="color:#c792ea">,</span> stop_words <span style="color:#89ddff">=</span> <span style="color:#c792ea">(</span><span style="color:#c792ea">)</span><span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  2. cleaned_tokens <span style="color:#89ddff">=</span> <span style="color:#c792ea">[</span><span style="color:#c792ea">]</span>
  3. for token<span style="color:#c792ea">,</span> tag in pos_tag<span style="color:#c792ea">(</span>tokens<span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  4. if tag<span style="color:#c792ea">.</span>startswith<span style="color:#c792ea">(</span><span style="color:#c3e88d">"NN"</span><span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  5. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'n'</span>
  6. elif tag<span style="color:#c792ea">.</span>startswith<span style="color:#c792ea">(</span><span style="color:#c3e88d">'VB'</span><span style="color:#c792ea">)</span><span style="color:#c792ea">:</span>
  7. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'v'</span>
  8. else<span style="color:#c792ea">:</span>
  9. pos <span style="color:#89ddff">=</span> <span style="color:#c3e88d">'a'</span>
  10. lemmatizer <span style="color:#89ddff">=</span> WordNetLemmatizer<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span>
  11. token <span style="color:#89ddff">=</span> lemmatizer<span style="color:#c792ea">.</span>lemmatize<span style="color:#c792ea">(</span>token<span style="color:#c792ea">,</span> pos<span style="color:#c792ea">)</span>
  12. if token not in string<span style="color:#c792ea">.</span>punctuation and token<span style="color:#c792ea">.</span>lower<span style="color:#c792ea">(</span><span style="color:#c792ea">)</span> not in stop_words<span style="color:#c792ea">:</span>
  13. cleaned_tokens<span style="color:#c792ea">.</span>append<span style="color:#c792ea">(</span>token<span style="color:#c792ea">)</span>
  14. return cleaned_tokens
  15. sample <span style="color:#89ddff">=</span> <span style="color:#c3e88d">"The quick brown fox jumps over the lazy dog."</span>
  16. stop_words <span style="color:#89ddff">=</span> stopwords<span style="color:#c792ea">.</span>words<span style="color:#c792ea">(</span><span style="color:#c3e88d">'english'</span><span style="color:#c792ea">)</span>
  17. clean_data<span style="color:#c792ea">(</span>word_tokenize<span style="color:#c792ea">(</span>sample<span style="color:#c792ea">)</span><span style="color:#c792ea">,</span> stop_words<span style="color:#c792ea">)</span>
  18. </code></span></span>

该示例的输出如下:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash"><span style="color:#c792ea">[</span><span style="color:#c3e88d">'quick'</span>, <span style="color:#c3e88d">'brown'</span>, <span style="color:#c3e88d">'fox'</span>, <span style="color:#c3e88d">'jump'</span>, <span style="color:#c3e88d">'lazy'</span>, <span style="color:#c3e88d">'dog'</span><span style="color:#c792ea">]</span>
  2. </code></span></span>

如您所见,标点符号和停用词已被删除。

词频分布

现在你已经熟悉了 NLP 中的基本清洗技术,让我们试着找出文本中单词的频率。在本练习中,我们将使用古腾堡上免费提供的童话故事《老鼠、鸟和香肠》的文本。我们将把这个童话故事的文本存储在一个字符串中,text.

首先,我们使用上面定义的函数进行标记text然后清理它:clean_data

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">tokens <span style="color:#89ddff">=</span> word_tokenize<span style="color:#c792ea">(</span>text<span style="color:#c792ea">)</span>
  2. cleaned_tokens <span style="color:#89ddff">=</span> clean_data<span style="color:#c792ea">(</span>tokens<span style="color:#c792ea">,</span> stop_words <span style="color:#89ddff">=</span> stop_words<span style="color:#c792ea">)</span>
  3. </code></span></span>

要查找文本中单词的频率分布,可以使用FreqDistNLTK 类。使用标记作为参数初始化类。然后使用该.most_common()方法找到常见的术语。让我们尝试找出这种情况下的前十项:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-python">from nltk import FreqDist
  2. freq_dist <span style="color:#89ddff">=</span> FreqDist<span style="color:#c792ea">(</span>cleaned_tokens<span style="color:#c792ea">)</span>
  3. freq_dist<span style="color:#c792ea">.</span>most_common<span style="color:#c792ea">(</span><span style="color:#f78c6c">10</span><span style="color:#c792ea">)</span>
  4. </code></span></span>

以下是这个童话故事中最常出现的十个术语:

  1. <span style="background-color:#292d3e"><span style="color:#bfc7d5"><code class="language-bash"><span style="color:#c792ea">[</span><span style="color:#c792ea">(</span><span style="color:#c3e88d">'bird'</span>, <span style="color:#f78c6c">15</span><span style="color:#c792ea">)</span>,
  2. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'sausage'</span>, <span style="color:#f78c6c">11</span><span style="color:#c792ea">)</span>,
  3. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'mouse'</span>, <span style="color:#f78c6c">8</span><span style="color:#c792ea">)</span>,
  4. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'wood'</span>, <span style="color:#f78c6c">7</span><span style="color:#c792ea">)</span>,
  5. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'time'</span>, <span style="color:#f78c6c">6</span><span style="color:#c792ea">)</span>,
  6. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'long'</span>, <span style="color:#f78c6c">5</span><span style="color:#c792ea">)</span>,
  7. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'make'</span>, <span style="color:#f78c6c">5</span><span style="color:#c792ea">)</span>,
  8. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'fly'</span>, <span style="color:#f78c6c">4</span><span style="color:#c792ea">)</span>,
  9. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'fetch'</span>, <span style="color:#f78c6c">4</span><span style="color:#c792ea">)</span>,
  10. <span style="color:#c792ea">(</span><span style="color:#c3e88d">'water'</span>, <span style="color:#f78c6c">4</span><span style="color:#c792ea">)</span><span style="color:#c792ea">]</span>
  11. </code></span></span>

不出所料,三个最常用的名词就是童话里的三个主角。

在分析文本时,单词的频率可能不是很重要。通常,NLP 的下一步是生成一个统计数据——TF-IDF(词频——逆文档频率)——它表示一个词在文档列表中的重要性。

结论

在本教程中,我们首先了解了 Python 中的自然语言处理。我们将文本转换为标记,将单词转换为其基本形式,最后清理文本以删除任何没有为分析增加意义的部分。

如果对Python有兴趣,想了解更多的Python以及AIoT知识,解决测试问题,以及入门指导,帮你解决学习Python中遇到的困惑,我们这里有技术高手。如果你正在找工作或者刚刚学校出来,又或者已经工作但是经常觉得难点很多,觉得自己Python方面学的不够精想要继续学习的,想转行怕学不会的, 都可以加入我们,可领取最新Python大厂面试资料和Python爬虫、人工智能、学习资料!微信公众号【Python大本营】等你来玩奥~