DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.


Support for MongoDB databases, both for read and write operations. Integration with, which provides Customer DQ functions in the cloud. Duplicate detection (aka. Deduplication / Fuzzy matching) analyzers. A "Table lookup" component for doing lookups of multiple values from a table. An "Insert into table" component for inserting records into any kind of table (e.g. database tables, CSV files, Excel sheets, or MongoDB collections). Job-level variables which allow for parameterizable jobs that can be instrumented from the command line.

URL: Open source data quality, data profiling, data cleansing, data matching | DataCleaner