What's new in Duke 1.2
Feb 18, 2014
- New features:
- Added longest common substring comparator.
- LuceneDatabase now uses fuzzy search by default (which is much slower).
- New default Record implementation, faster and uses less memory.
- Support for changing CSV value separator.
- Databases are now pluggable.
- Improved inference of links in LinkDatabase.
- API changes:
- The ModifiableRecord interface was added.
- Two methods have been added to the Database interface.
- The DatabaseProperties class has been removed.
- All Database implementations have moved into the duke.databases package.
- The Configuration.createDatabase method is replaced by Configuration.getDatabase.
- The Link and LinkFileWriter interfaces have changed, and now require a confidence value.
- The TestFileUtils class is deprecated, and will be removed in the next release.
New in Duke 1.1 (Oct 22, 2013)
- The main new feature in this version is the GeneticAlgorithm, which can be used to automatically tune a configuration.
- Other new features:
- Support for geosearch.
- API changes:
- Moved Column class to .datasources.
- Property is now an interface, and no longer a class.
- Added setDoInference to InMemoryLinkDatabase.
- Added ConfigWriter, which can export configurations to XML.
- Added LinkFileWriter.
- Added Processor.setProfiling and Processor.getProfiler to allow performance profiling via API.
- Added Processor.removeMatchListener
- Added Configuration.copy() and Property.copy().
- Changes to command-line tools:
- Added the RecordSearch tool.
- Added --reindex option to DebugCompare.
- Added --lookups to .Duke
- Bug fixes:
- SPARQL data source was broken. Now fixed, and test cases added.
- Issue 124 : --interactive fails in Eclipse
- Issue 114 : --noreindex causes matches to disappear
New in Duke 1.0 (Mar 2, 2013)
- Performance improvements:
- Support for multi-threading added
- Using NIOFSDirectory on all platforms except Windows
- New in-memory backend, faster than Lucene (experimental)
- Changes to Comparators:
- Geo-coordinate comparator added.
- Q-grams comparator added.
- Levenshtein implementation is now faster
- Weighted Levenshtein weight estimator now knows position in string ( issue 81 )
- Changes to Cleaners:
- Added PhoneNumberCleaner
- Extended and generalized regexp cleaner
- Removed sub-cleaner concept, added support for multiple cleaners
- Other improvements:
- Implemented user control over lookup props
- Upgraded to Lucene 4.0
- Added MatchListener.startProcessing() callback
- Removed some MatchListener callback methods (weren't thread-safe)
- InMemoryLinkDatabase now complete and tested
- LinkDatabaseMatchListener bug fixes
- Better validation of configurations
- JDBCEquivalenceClassDatabase added
- RDBMSLinkDatabase performance improvement
- Changes to command-line client:
- Added data debug mode
- Fixed bug with reusing link file as test file
- Added pretty-printing of records
- Better interactive debugging behaviour
- Improvements to DebugCompare tool
- Added performance profiling to command-line client
- Bugs fixed:
- Issue 83 : Look up record by ID when ID is a URI.
- Issue 90 : Bug in command-line option parser.
- Bug in CSV data source fixed
New in Duke 0.6 (Sep 17, 2012)
- Changes:
- A change to the calculation of property probabilities when values do not match exactly. This means that you may need to adjust the probabilities and thresholds in your applications.
- Upgraded to Lucene 3.6.1.
- Improvements to NorwegianCompanyNameCleaner and NorwegianAddressCleaner.
- New Features:
- A weighted Levenshtein comparator.
- A Metaphone comparator.
- A Jaccard index comparator.
- A prototype of a comparator using a Norwegian version of Metaphone.
- A generic value cleaner.
- Support for setting objects as parameters of other objects.
New in Duke 0.3 (Sep 12, 2011)
- Refactored API to make it much more user-friendly.
- Documented API to same end.
- New comparators: NumericComparator, DiceCoefficientComparator, SoundexComparator
- A new record linkage mode which can be used to link records from different data sets.
- Numerous bug fixes and new test cases.
- Performance improvements in the Levenshtein comparator.
- Default cleaner now strips accents.
- Upgraded to Lucene 3.3.0.
- Version stamping in manifest file, API, and command-line client.