Duke Changelog

What's new in Duke 1.2

Feb 18, 2014
  • New features:
  • Added longest common substring comparator.
  • LuceneDatabase now uses fuzzy search by default (which is much slower).
  • New default Record implementation, faster and uses less memory.
  • Support for changing CSV value separator.
  • Databases are now pluggable.
  • Improved inference of links in LinkDatabase.
  • API changes:
  • The ModifiableRecord interface was added.
  • Two methods have been added to the Database interface.
  • The DatabaseProperties class has been removed.
  • All Database implementations have moved into the duke.databases package.
  • The Configuration.createDatabase method is replaced by Configuration.getDatabase.
  • The Link and LinkFileWriter interfaces have changed, and now require a confidence value.
  • The TestFileUtils class is deprecated, and will be removed in the next release.

New in Duke 1.1 (Oct 22, 2013)

  • The main new feature in this version is the GeneticAlgorithm, which can be used to automatically tune a configuration.
  • Other new features:
  • Support for geosearch.
  • API changes:
  • Moved Column class to .datasources.
  • Property is now an interface, and no longer a class.
  • Added setDoInference to InMemoryLinkDatabase.
  • Added ConfigWriter, which can export configurations to XML.
  • Added LinkFileWriter.
  • Added Processor.setProfiling and Processor.getProfiler to allow performance profiling via API.
  • Added Processor.removeMatchListener
  • Added Configuration.copy() and Property.copy().
  • Changes to command-line tools:
  • Added the RecordSearch tool.
  • Added --reindex option to DebugCompare.
  • Added --lookups to .Duke
  • Bug fixes:
  • SPARQL data source was broken. Now fixed, and test cases added.
  • Issue 124 : --interactive fails in Eclipse
  • Issue 114 : --noreindex causes matches to disappear

New in Duke 1.0 (Mar 2, 2013)

  • Performance improvements:
  • Support for multi-threading added
  • Using NIOFSDirectory on all platforms except Windows
  • New in-memory backend, faster than Lucene (experimental)
  • Changes to Comparators:
  • Geo-coordinate comparator added.
  • Q-grams comparator added.
  • Levenshtein implementation is now faster
  • Weighted Levenshtein weight estimator now knows position in string ( issue 81 )
  • Changes to Cleaners:
  • Added PhoneNumberCleaner
  • Extended and generalized regexp cleaner
  • Removed sub-cleaner concept, added support for multiple cleaners
  • Other improvements:
  • Implemented user control over lookup props
  • Upgraded to Lucene 4.0
  • Added MatchListener.startProcessing() callback
  • Removed some MatchListener callback methods (weren't thread-safe)
  • InMemoryLinkDatabase now complete and tested
  • LinkDatabaseMatchListener bug fixes
  • Better validation of configurations
  • JDBCEquivalenceClassDatabase added
  • RDBMSLinkDatabase performance improvement
  • Changes to command-line client:
  • Added data debug mode
  • Fixed bug with reusing link file as test file
  • Added pretty-printing of records
  • Better interactive debugging behaviour
  • Improvements to DebugCompare tool
  • Added performance profiling to command-line client
  • Bugs fixed:
  • Issue 83 : Look up record by ID when ID is a URI.
  • Issue 90 : Bug in command-line option parser.
  • Bug in CSV data source fixed

New in Duke 0.6 (Sep 17, 2012)

  • Changes:
  • A change to the calculation of property probabilities when values do not match exactly. This means that you may need to adjust the probabilities and thresholds in your applications.
  • Upgraded to Lucene 3.6.1.
  • Improvements to NorwegianCompanyNameCleaner and NorwegianAddressCleaner.
  • New Features:
  • A weighted Levenshtein comparator.
  • A Metaphone comparator.
  • A Jaccard index comparator.
  • A prototype of a comparator using a Norwegian version of Metaphone.
  • A generic value cleaner.
  • Support for setting objects as parameters of other objects.

New in Duke 0.3 (Sep 12, 2011)

  • Refactored API to make it much more user-friendly.
  • Documented API to same end.
  • New comparators: NumericComparator, DiceCoefficientComparator, SoundexComparator
  • A new record linkage mode which can be used to link records from different data sets.
  • Numerous bug fixes and new test cases.
  • Performance improvements in the Levenshtein comparator.
  • Default cleaner now strips accents.
  • Upgraded to Lucene 3.3.0.
  • Version stamping in manifest file, API, and command-line client.