MALLET Changelog

What's new in MALLET 2.0.7

Aug 3, 2012
  • Fixed a bug in the Generalized Expectation (GE) implementation forMaxEnt models. The old code could give low accuracy when using a small number of constraints. See the note at the top of this page for more information: http://mallet.cs.umass.edu/ge-classification.php
  • Fixed a bug in SVMLight2Vectors that could result in different Alphabets when importing multiple files at once.
  • Fixed a bug in SVMLight2Classify that allowed previously unobserved features to be added to the data Alphabet, possibly resulting in mismatching Classifier and InstanceList Alphabets.
  • Fixed bugs in the search direction computation in ConjugateGradient.
  • Added support for cross-validation in Vectors2Classify (in addition to random subsamples of the data set).
  • Added support for importing SVMLight data with Alphabets for which growth is stopped.
  • Added new options to Optimizers: it is now possible to set the convergence tolerance for GradientAscent, and set the LineOptimizer for LimitedMemoryBFGS, among others.
  • The GE implementation for MaxEnt models is more efficient, has support for multiple types of constraints, and support for implementing new constraints. More information: http://mallet.cs.umass.edu/ge-classification.php
  • The GE implementation for CRFs is much more efficient (O(L^2), where L is the number of labels, rather than O(L^3) or O(L^4)), has support for multiple types of constraints, and support for implementing new constraints. There is also now support for training CRFs with GE from the command line. See: http://mallet.cs.umass.edu/semi-sup-fst.php
  • Added preliminary support for Posterior Regularization (PR) training of both MaxEnt models and CRFs. See http://mallet.cs.umass.edu/ge-classification.php and http://mallet.cs.umass.edu/semi-sup-fst.php
  • Modified RankedFeatureVector to improve efficiency (from David North).
  • New topic model wrapper class: cc.mallet.topics.tui. TopicTrainer This class simplifies training a topic model by focusing solely on standard LDA. Using the same interface for LDA, PAM, hLDA and other models made the command line options unnecessarily complicated and led to confusion over which options are available for which models.
  • We expect this interface to replace the current interface for the "train-topics" command in future versions. For this version, you can access the new trainer with this command:
  • bin/mallet run cc.mallet.topics.tui.TopicTrainer --input ...
  • Topic diagnostics XML. From the new TopicTrainer, use the --diagnostics-file [filename] command line argument.
  • Ability to restore models from gzipped "state" files. From the new TopicTrainer, use the --input-state [filename] argument. Note that you can manually edit this file. Any token with topic set to -1 will be immediately resampled upon loading.
  • The format for the "doc-topics" output file now prints the "Name" field rather than the "Source" field.
  • Bug fixes in likelihood calculation.
  • Made GRMM compatible with MALLET 2.0. GRMM should now work with this version of MALLET
  • Made implementations of piecewise training and piecewise pseudolikelihood available publicly
  • Bug fix to GRMM TableFactor (from John Pate)

New in MALLET 2.0.5 (Apr 8, 2010)

  • Major updates:
  • Better Windows support. In addition to the linux/mac "bin/mallet" script, there is now a functionally identical "bin/mallet.bat" script. Windows support is still limited, but with this batch file you will no longer need to install cygwin to run Mallet from the command line.
  • Configuration files. All "bin/mallet" commands now take an optional configuration file, which allows you to specify command line parameters.
  • Note that configuration files do not support more than one instance of the same parameter (for example specifying multiple classifier trainers) -- to do this you will need to use true command line parameters.
  • A new class cc.mallet.util.MVNormal implements several utilities for working with multivariate normal distributions and symmetric positive definite matrices, represented as one-dimensional arrays.
  • Several topic model package enhancements:
  • Support for aligned corpora in multiple languages (the Polylingual Topic Model). Use the option "--language-inputs en.sequences de.sequences fr.sequences ..." to invoke this option. All languages must be imported in separate serialized instance lists, with empty instances inserted so that each list is the same length and aligned instances are at the same position in each list.
  • An initial version of topic held-out likelihood evaluation has been added. Use the option "--evaluator-filename" when training topics and then the "bin/mallet evaluate-topics" command to estimate the probability of new documents.
  • In addition to optimizing the document-topic hyperparameters, you can also optimize a topic-word hyperparameter. This is triggered automatically by "--optimize-interval".
  • Bug fixes to topic training and topic inference.

New in MALLET 2.0 RC4 (Aug 8, 2009)

  • Major updates:
  • An implementation of generalized expectation criteria training of MaxEnt classifiers and methods for obtaining constraints (c.f. Gregory Druck, Gideon Mann, Andrew McCallum "Learning from Labeled Features using Generalized Expectation Criteria.")
  • PagedInstanceList has been substantially rewritten by Mike Bond.
  • Bug fixes to topic model hyperparameter optimization and topic inference.