Results 1 to 1 of 1
Thread: Duke 0.5
- 04-02-2012, 05:17 PM #1
Senior Member
- Join Date
- Sep 2011
- Posts
- 1,605
- Rep Power
- 3
Duke 0.5
Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.
Changes
The internals have been cleaned and refactored, adding some performance tuning parameters. There are new cleaners, support for pluggable backends, a new naïve in-memory backend, and much more.
URL: duke - Fast deduplication engine - Google Project Hosting
Similar Threads
-
Duke 0.4
By java software in forum Java SoftwareReplies: 0Last Post: 01-25-2012, 05:52 PM -
So... What is Duke (java mascot)?
By drone13 in forum Forum LobbyReplies: 1Last Post: 02-09-2010, 09:40 PM -
3D Duke picking his nose and contemplating on Ruby
By Mark in forum Reviews / AdvertisingReplies: 2Last Post: 12-28-2007, 03:00 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks