cpdetector is a small yet clever framework for codepage detection that integrates different strategies. It may be used as a library for third party software that accesses textual data over network. It also includes a best-practice implementation in form of a command line tool that allows sorting and transforming large collections of documents based on their codepage. Available strategies include: jchardet (exclusion, frequency analysis, and guessing), detection of the HTML charset property, and detection of the XML encoding declaration.

Changes

This major bugfix release fixes two issues in commandline batch mode. The switch to skip moving undetected documents works now again. No attempt will be made to transcode undetected documents (the latter caused exceptional program flow).

URL: cpdetector, free java code page detection.