Duke 1.2

A fast deduplication engine

  Add it to your Download Basket!

 Add it to your Watch List!

0/5

Rate it!

What's new in Duke 1.2:

  • New features:
  • Added longest common substring comparator.
  • LuceneDatabase now uses fuzzy search by default (which is much slower).
  • New default Record implementation, faster and uses less memory.
Read full changelog
report
malware
send us
an update
LICENSE TYPE:
Apache 
FILE SIZE:
5.1 MB
USER RATING:
UNRATED
  0.0/5
DEVELOPED BY:
Lars Marius Garshol
CATEGORY:
Home \ Developer Tools
1 Duke Screenshot:
Duke - Duke will report its findings to the MatchListener, you can write your own MatchListeners, or use those which come with Duke.
Duke is a small, free, easy to use, fast and flexible deduplication (entity resolution or record linkage) engine written in Java on top of Lucene.

At the moment Duke can process 1,000,000 records in 11 minutes on a standard laptop in a single thread.

Duke can be used to find duplicate records inside a single table/data source, or it can be used to find records in different tables/sources which most likely represent the same real-world entity.

Duke is written in the Java programming language and it can be used on Mac OS X, Windows and Linux.

Last updated on February 18th, 2014

Runs on: Mac OS X (-)

feature list requirements

#deduplication engine #entity resolution #record linkage #deduplication #engine #resolution #develop

Add your review!

SUBMIT