NLTK Changelog

What's new in NLTK 2.0.1 rc1

Apr 11, 2011

added interface to the Stanford POS Tagger
updates to sem.Boxer, sem.drt.DRS
allow unicode strings in grammars
allow non-string features in classifiers
modifications to HunposTagger
issues with DRS printing
fixed bigram collocation finder for window_size > 2
doctest paths no longer presume unix-style pathname separators
fixed issue with NLTK's tokenize module colliding with the Python tokenize module
fixed issue with stemming Unicode strings
changed ViterbiParser.nbest_parse to parse
ChaSen and KNBC Japanese corpus readers
preserve case in concordance display
fixed bug in simplification of Brown tags
a version of IBM Model 1 as described in Koehn 2010
new class AlignedSent for aligned sentence data and evaluation metrics
new nltk.util.set_proxy to allow easy configuration of HTTP proxy
improvements to downloader user interface to catch URL and HTTP errors
added CHILDES corpus reader
created special exception hierarchy for Prover9 errors
significant changes to the underlying code of the boxer interface
path-based wordnet similarity metrics use a fake root node for verbs, following the Perl version
added ability to handle multi-sentence discourses in Boxer
added the 'english' Snowball stemmer
simplifications and corrections of Earley Chart Parser rules
several changes to the feature chart parsers for correct unification
bugfixes: FreqDist.plot, FreqDist.max, NgramModel.entropy, CategorizedCorpusReader, DecisionTreeClassifier
removal of Python >2.4 language features for 2.4 compatibility
removal of deprecated functions and associated warnings
added semantic domains to wordnet corpus reader
changed wordnet similarity functions to include instance hyponyms
updated to use latest version of Boxer
Data:
JEITA Public Morphologically Tagged Corpus (in ChaSen format)
KNB Annotated corpus of Japanese blog posts
Fixed some minor bugs in alvey.fcfg, and added number of parse trees in alvey_sentences.txt
added more comtrans data

New in NLTK 2.0 Beta 9 (Jul 26, 2010)

New in NLTK 2.0 Beta 8 (Mar 11, 2010)

New in NLTK 2.0 Beta 6 (Sep 25, 2009)

New in NLTK 2.0 Beta 5 (Jul 20, 2009)

New in NLTK 0.9.9 (May 30, 2009)

New in NLTK 0.9.9 Beta 1 (Mar 16, 2009)

New in NLTK 0.9.8 (Feb 18, 2009)

New in NLTK 0.9.7 (Dec 19, 2008)

New in NLTK 0.9.6 (Dec 9, 2008)

NLTK:
new WordNet corpus reader (contributed by Steven Bethard)
incorporated dependency parsers into NLTK (was NLTK-Contrib) (contributed by Jason Narad)
moved nltk/cfg.py to nltk/grammar.py and incorporated dependency grammars
improved efficiency of unification algorithm
various enhancements to the semantics package
added plot() and tabulate() methods to FreqDist and ConditionalFreqDist
FreqDist.keys() and list(FreqDist) provide keys reverse-sorted by value, to avoid the confusion caused by FreqDist.sorted()
new downloader module to support interactive data download: nltk.download() run using "python -m nltk.downloader all"
fixed WordNet bug that caused min_depth() to sometimes give incorrect result
added nltk.util.Index as a wrapper around defaultdict(list) plus a functional-style initializer
fixed bug in Earley chart parser that caused it to break
added basic TnT tagger nltk.tag.tnt
new corpus reader for CoNLL dependency format (contributed by Kepa Sarasola and Iker Manterola)
misc other bugfixes
Contrib (work in progress):
TIGERSearch implementation by Torsten Marek
extensions to hole and glue semantics modules by Dan Garrette
new coreference package by Joseph Frazee
MapReduce interface by Xinfan Meng
Data:
Corpora are stored in compressed format if this will not compromise speed of access
Swadesh Corpus of comparative wordlists in 23 languages
Split grammar collection into separate packages
New Basque and Spanish grammar samples (contributed by Kepa Sarasola and Iker Manterola)
Brown Corpus sections now have meaningful names (e.g. 'a' is now 'news')
Fixed bug that forced users to manually unzip the WordNet corpus
New dependency-parsed version of Treebank corpus sample
Added movie script "Monty Python and the Holy Grail" to webtext corpus
Replaced words corpus data with a much larger list of English words
New URL for list of available NLTK corpora http://nltk.googlecode.com/svn/trunk/nltk_data/index.xml
Book:
complete rewrite of first three chapters to make the book accessible to a wider audience
new chapter on data-intensive language processing
extensive reworking of most chapters
Dropped subsection numbering; moved exercises to end of chapters
Distributions:
created Portfile to support Mac installation