DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.


This release adds saving, archiving, and sharing of data profiling results, automatic merging of duplicates (golden record creation), checking of contacts in sanction lists (due diligence checks), transformers for NoSQL data structures, specification of datastore connection properties on the commandline, drilling to details in value distribution, more user-friendly database connection configuration, and execution and scheduling of jobs via Pentaho Data Integration/Kettle.

URL: Open source data quality, data profiling, data cleansing, data matching | DataCleaner