Porting your code to NLTK 3.0

来源：互联网发布：自学编程入门基础知识编辑：程序博客网时间：2024/04/30 02:28

Original link: https://github.com/nltk/nltk/wiki/Porting-your-code-to-NLTK-3.0

NLTK 3.0 contains a number of interface changes. These are being incorporated into a new version of the NLTK book, updated for Python 3 and NLTK 3.

The way NLTK works with unicode is changed: NLTK3 requires all text input to be unicode and always return text as unicode. Previously, some functions and classes worked on unicode and others required encoded bytestrings. Please make sure you're passing unicode to NLTK and expecting unicode output from NLTK - existing code that assumes bytestrings may start to fail.

Here are some changes you may need to make:

grammar: ContextFreeGrammar → CFG, WeightedGrammar → PCFG,StatisticalDependencyGrammar → ProbabilisticDependencyGrammar,WeightedProduction → ProbabilisticProduction
draw.tree: TreeSegmentWidget.node() → TreeSegmentWidget.label(),TreeSegmentWidget.set_node() → TreeSegmentWidget.set_label()
parsers: nbest_parse() → parse()
ccg.parse.chart: EdgeI.next() → EdgeI.nextsym()
Chunk parser: top_node → root_label; chunk_node → chunk_label
WordNet properties are now access methods, e.g. Synset.definition →Synset.definition()
sem.relextract: mk_pairs() → _tree2semi_rel(), mk_reldicts() →semi_rel2reldict(), show_clause() → clause(), show_raw_rtuple() → rtuple()
corpusname.tagged_words(simplify_tags=True) →corpusname.tagged_words(tagset='universal')
util.clean_html() → BeautifulSoup.get_text(). clean_html() is now dropped, install & use BeautifulSoup or some other html parser instead.
util.ibigrams() → util.bigrams()
util.ingrams() → util.ngrams()
util.itrigrams() → util.trigrams()
metrics.windowdiff → metrics.segmentation.windowdiff(),metrics.windowdiff.demo() was removed.
parse.generate2 was re-written and merged into parse.generate

Creating objects from strings:

Many objects now support a fromstring() method
tree.Tree.parse() → tree.Tree.fromstring()
tree.Tree() → tree.Tree.fromstring()
chunk.RegexpChunkRule.parse() → chunkRegexpChunkRule.fromstring()
grammar.parse_cfg() → CFG.fromstring() (same for other types of grammar)
sem.LogicParser.parse() → sem.Expression.fromstring()
sem.DrtParser.parse() → sem.DrtExpression.fromstring()
sem.parse_valuation() → sem.Valuation.fromstring()
sem.parse_type() → sem.Type.fromstring()

Operations on lists of sentences or other items:

tokenize.batch_tokenize() → tokenize.tokenize_sents()
tag.batch_tag() → tag.tag_sents()
parse.batch_parse() → parse.parse_sents()
classify.batch_classify() → classify.classify_many()
sem.batch_interpret() → sem.interpret_sents()
sem.batch_evaluate() → sem.evaluate_sents()
chunk.batch_ne_chunk() → chunk.ne_chunk_sents()

Changes in probability.FreqDist:

fdist.keys() → sorted(fdist)
fdist.inc(x) → fdist[x] += 1
fdist.samples() → sorted(fdist.keys())
fdist.Nr(r) → fdist.Nr()[r]
fdist.Nr_nonzero() → fdist.Nr().items()
cfdist.conditions() → sorted(cfdist.conditions())

Porter stemmer changes:

adjust_case(), cons(), cvc(), doublec(), m(), step1ab(), step1c(), step2(),step3(), step4(), step5(), vowelinstem() made private
ends(), r(), setto() removed

Removed modules, classes and functions:

classify.svm was removed. For classification based on support vector machines (SVMs) use classify.scikitlearn or scikit-learn directly. Seehttps://github.com/nltk/nltk/issues/450.
probability.GoodTuringProbDist class was removed. Seehttps://github.com/nltk/nltk/issues/381.
HiddenMarkovModelTaggerTransformI and its subclasses are removed. Seehttps://github.com/nltk/nltk/issues/374.
classify.maxent no longer support algorithms backed by scipy.maxentropy. Seehttps://github.com/nltk/nltk/issues/321.
misc.babelfish was removed. See https://github.com/nltk/nltk/issues/265.
sourcedstring was removed. See https://github.com/nltk/nltk/issues/322.
yamltags was removed. JSON is now preferred instead. Seehttps://github.com/nltk/nltk/issues/540
mallet was removed, including the tag.crf module. Seehttps://github.com/nltk/nltk/issues/104
tag.simplify was removed. See https://github.com/nltk/nltk/issues/483
model was removed. See https://github.com/nltk/nltk/issues?labels=model
corpus.reader.wordnet._lcs_by_depth was removed. Seehttps://github.com/nltk/nltk/issues/422.

Miscellaneous changes:

probability.ConditionalProbDist.default_factory now inherits from dict instead of defaultdict
probability.ConditionalProbDistI.default_factory now inherits from dict instead of defaultdict
probability.DictionaryConditionalProbDist.default_factory now inherits from dictinstead of defaultdict

Environment variables for third-party software:

These have been normalised; please see Installing Third Party Software

More background on Python 3 and NLTK 3:

http://docs.python.org/2/library/2to3.html
http://docs.python.org/dev/whatsnew/3.0.html
http://nltk.org/dev/python3porting.html

0 0