Porting your code to NLTK 3.0
来源:互联网 发布:自学编程入门基础知识 编辑:程序博客网 时间:2024/04/30 02:28
Original link: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0
NLTK 3.0 contains a number of interface changes. These are being incorporated into a new version of the NLTK book, updated for Python 3 and NLTK 3.
The way NLTK works with unicode is changed: NLTK3 requires all text input to be unicode and always return text as unicode. Previously, some functions and classes worked on unicode and others required encoded bytestrings. Please make sure you're passing unicode to NLTK and expecting unicode output from NLTK - existing code that assumes bytestrings may start to fail.
Here are some changes you may need to make:
grammar
:ContextFreeGrammar
→CFG
,WeightedGrammar
→PCFG
,StatisticalDependencyGrammar
→ProbabilisticDependencyGrammar
,WeightedProduction
→ProbabilisticProduction
draw.tree
:TreeSegmentWidget.node()
→TreeSegmentWidget.label()
,TreeSegmentWidget.set_node()
→TreeSegmentWidget.set_label()
- parsers:
nbest_parse()
→parse()
ccg.parse.chart
:EdgeI.next()
→EdgeI.nextsym()
- Chunk parser:
top_node
→root_label
;chunk_node
→chunk_label
- WordNet properties are now access methods, e.g.
Synset.definition
→Synset.definition()
sem.relextract
:mk_pairs()
→_tree2semi_rel()
,mk_reldicts()
→semi_rel2reldict()
,show_clause()
→clause()
,show_raw_rtuple()
→rtuple()
corpusname.tagged_words(simplify_tags=True)
→corpusname.tagged_words(tagset='universal')
util.clean_html()
→BeautifulSoup.get_text()
.clean_html()
is now dropped, install & use BeautifulSoup or some other html parser instead.util.ibigrams()
→util.bigrams()
util.ingrams()
→util.ngrams()
util.itrigrams()
→util.trigrams()
metrics.windowdiff
→metrics.segmentation.windowdiff()
,metrics.windowdiff.demo()
was removed.parse.generate2
was re-written and merged intoparse.generate
Creating objects from strings:
- Many objects now support a
fromstring()
method tree.Tree.parse()
→tree.Tree.fromstring()
tree.Tree()
→tree.Tree.fromstring()
chunk.RegexpChunkRule.parse()
→chunkRegexpChunkRule.fromstring()
grammar.parse_cfg()
→CFG.fromstring()
(same for other types of grammar)sem.LogicParser.parse()
→sem.Expression.fromstring()
sem.DrtParser.parse()
→sem.DrtExpression.fromstring()
sem.parse_valuation()
→sem.Valuation.fromstring()
sem.parse_type()
→sem.Type.fromstring()
Operations on lists of sentences or other items:
tokenize.batch_tokenize()
→tokenize.tokenize_sents()
tag.batch_tag()
→tag.tag_sents()
parse.batch_parse()
→parse.parse_sents()
classify.batch_classify()
→classify.classify_many()
sem.batch_interpret()
→sem.interpret_sents()
sem.batch_evaluate()
→sem.evaluate_sents()
chunk.batch_ne_chunk()
→chunk.ne_chunk_sents()
Changes in probability.FreqDist
:
fdist.keys()
→sorted(fdist)
fdist.inc(x)
→fdist[x] += 1
fdist.samples()
→sorted(fdist.keys())
fdist.Nr(r)
→fdist.Nr()[r]
fdist.Nr_nonzero()
→fdist.Nr().items()
cfdist.conditions()
→sorted(cfdist.conditions())
Porter stemmer changes:
adjust_case()
,cons()
,cvc()
,doublec()
,m()
,step1ab()
,step1c()
,step2()
,step3()
,step4()
,step5()
,vowelinstem()
made privateends()
,r()
,setto()
removed
Removed modules, classes and functions:
classify.svm
was removed. For classification based on support vector machines (SVMs) useclassify.scikitlearn
or scikit-learn directly. Seehttps://github.com/nltk/nltk/issues/450.probability.GoodTuringProbDist
class was removed. Seehttps://github.com/nltk/nltk/issues/381.HiddenMarkovModelTaggerTransformI
and its subclasses are removed. Seehttps://github.com/nltk/nltk/issues/374.classify.maxent
no longer support algorithms backed byscipy.maxentropy
. Seehttps://github.com/nltk/nltk/issues/321.misc.babelfish
was removed. See https://github.com/nltk/nltk/issues/265.sourcedstring
was removed. See https://github.com/nltk/nltk/issues/322.yamltags
was removed. JSON is now preferred instead. Seehttps://github.com/nltk/nltk/issues/540mallet
was removed, including thetag.crf
module. Seehttps://github.com/nltk/nltk/issues/104tag.simplify
was removed. See https://github.com/nltk/nltk/issues/483model
was removed. See https://github.com/nltk/nltk/issues?labels=modelcorpus.reader.wordnet._lcs_by_depth
was removed. Seehttps://github.com/nltk/nltk/issues/422.
Miscellaneous changes:
probability.ConditionalProbDist.default_factory
now inherits fromdict
instead ofdefaultdict
probability.ConditionalProbDistI.default_factory
now inherits fromdict
instead ofdefaultdict
probability.DictionaryConditionalProbDist.default_factory
now inherits fromdict
instead ofdefaultdict
Environment variables for third-party software:
- These have been normalised; please see Installing Third Party Software
More background on Python 3 and NLTK 3:
- http://docs.python.org/2/library/2to3.html
- http://docs.python.org/dev/whatsnew/3.0.html
- http://nltk.org/dev/python3porting.html
- Porting your code to NLTK 3.0
- From Zero to Boot: Porting Android to your ARM platform
- From Zero to Boot: Porting Android to your ARM platform
- Porting GPP code to DSP and Codec Engine
- 二 How to encryption your code
- Use SVN to manage your code
- Smart Pointers to boost your code
- Smart Pointers to boost your code
- Migrating your code to Objective-C ARC
- Adding new Code to your GitHub Repository
- Why need to train your code sense?
- do your test before you deliver your code to community
- Rockie's Android Porting Guide(2)——Add USB WIFI to your system
- Rockie's Android Porting Guide(3)——Add correct keymap to your system
- Rockie's Android Porting Guide(4)——Add SD card to your system
- Rockie's Android Porting Guide(4)——Add SD card to your system
- KB: Sample Code - How to add video to your UI
- 20 issues of porting C++ code to the 64-bit platform
- struct和union的区别
- 寻找自适应元素(10)-透视表的自动拆分数据
- 日常开发遇到问题笔记
- 小团队协作,磨合日志
- Java链接MySQL
- Porting your code to NLTK 3.0
- EFM32的开发板修复 - 固件修复
- SVN解决冲突的几种情况
- 解决蓝屏方法一例
- OPenAM(OPenSSO)URL POST请求
- 12_1
- Online Casinos
- github优秀的 Android 开源项目
- 词组缩写