安装NLTK
sudo pip install -U nltk
- 1
在python终端 输入import nltk
nltk.download()
- 1
NLTK 内置语料库的探索.
>> from nltk.corpus import gutenberg
>>> print gutenberg.fileids()
[u'austen-emma.txt', u'austen-persuasion.txt', u'austen-sense.txt', u'bible-kjv.txt', u'blake-poems.txt', u'bryant-stories.txt', u'burgess-busterbrown.txt', u'carroll-alice.txt', u'chesterton-ball.txt', u'chesterton-brown.txt', u'chesterton-thursday.txt', u'edgeworth-parents.txt', u'melville-moby_dick.txt', u'milton-paradise.txt', u'shakespeare-caesar.txt', u'shakespeare-hamlet.txt', u'shakespeare-macbeth.txt', u'whitman-leaves.txt']
" data-snippet-id="ext.8dede6f869492fe983d9b12cf41ba2b1" data-snippet-saved="false" data-codota-status="done">xiaoxilong@ubuntu:~$ python
Python 2.7.6 (default, Oct 26 2016, 20:30:19)
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from nltk.corpus import gutenberg
>>> print gutenberg.fileids()
[u'austen-emma.txt', u'austen-persuasion.txt', u'austen-sense.txt', u'bible-kjv.txt', u'blake-poems.txt', u'bryant-stories.txt', u'burgess-busterbrown.txt', u'carroll-alice.txt', u'chesterton-ball.txt', u'chesterton-brown.txt', u'chesterton-thursday.txt', u'edgeworth-parents.txt', u'melville-moby_dick.txt', u'milton-paradise.txt', u'shakespeare-caesar.txt', u'shakespeare-hamlet.txt', u'shakespeare-macbeth.txt', u'whitman-leaves.txt']
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
问题
AttributeError的:“FreqDist’对象有没有属性”INC“(AttributeError: ‘FreqDist’
>>> from nltk import FreqDist
>>> fd = FreqDist(gutenberg.words('austen-persuasion.txt'))
>>> print fd.N()
98171
>>> print fd.B()
6132
>>> for word in fd.keys()[:10]:
... print word,fd[word]
...
foul 3
four 21
Does 3
hanging 3
woody 1
looking 45
eligible 1
unanswered 1
Western 1
lord 1
>>>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
导入自定义函数
新建一个空 init.py文件,python模块textpro.py
在textpro.py中的函数
def plural(word):
if word.endswith('y'):
return word[:-1]+'ies'
elif word[-1] in 'sx' or word[-2:] in ['sh','ch']:
return word+'es'
elif word.endswith('an'):
return word[:-2]+'en'
else:
return word+'s'
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
导入并使用该函数
>>> import sys
>>> sys.path.append(r"/home/xiaoxilong/share")
>>> import textpro
>>> from textpro import plural
>>> plural('wish')
'wishes'
>>> plural('fan')
'fen'
>>>
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
支持的语言特征division(精确除法)
from __future__ import division
- 1