Augustus Changelog

What's new in Augustus 0.5.2.0

Apr 6, 2012
  • Naive-Bayes Model
  • Regression Model
  • Tree Parametrization
  • Cluster Model improvements
  • Custom Processing Interface

New in Augustus 0.5.1.0 (Apr 5, 2012)

  • Some of the new features in this release are:
  • PMML:
  • Represent all PMML using our own XML DOM code.
  • Auto-validate PMML models from XSD snippets, including validate-while-reading. This allows us to dramatically simplify the algorithms code, since algorithms can assume that the user-input model is valid and do not need to handle PMML structures, such as choice blocks. The validate-while-reading feature is for performance.
  • Augustus must accept and require completely valid PMML. Any changes or additions must be implemented as extensions (with prefix “X-ODG-”).
  • Configuration files now have an XSD specification as well and use the same mechanism for reading and validating.
  • Data streams are generalized into a common interface: the producer and consumer code is unaware of whether the data came from a file, a pipe, an HTTP server, JSON-based messaging, etc.
  • Data input and model output are in separate threads from the main processor: data can accumulate and wait while the main processor is busy, and the main processor can work while a model’s instantaneous state is written to a PMML file. Routine and emergency PMML output is also a new feature (e.g. every N events and upon encountering an exception).
  • Scores (consumer output) are now extensible structures, rather than 4-tuples. Models can output more than a single numerical value.
  • More consistent and ubiquitous log file output, including a metadata log file whose priority can be set independently.
  • Segmentation:
  • Segmentation, transformations, and event weighting are handled independently of the algorithms. The algorithm code is unaware of whether it is the full model or represents only one segment, it is unaware of whether the fields are computed on the fly or were in the data stream, and it is unaware of which event-weighting algorithm is being used to produce counts, sums, means, variances, etc.
  • Segments are now automatically created as new data are encountered.
  • New segments cycle through “immature” (don’t score) → “mature” (score) → “locked” (optional: never update again) states to avoid scoring with a model created from too few data.
  • Dependence on unnecessary external packages is also being systematically removed.
  • Augustus’s new structure is designed to be accessible to end-users as a Python library, not only as a command line tool.