cpdetector is a small yet clever framework for codepage detection that integrates different strategies. It may be used as a library for third party software that accesses textual data over network. It also includes a best-practice implementation in form of a command line tool that allows sorting and transforming large collections of documents based on their codepage. Available strategies include: jchardet (exclusion, frequency analysis, and guessing), detection of the HTML charset property, and detection of the XML encoding declaration.


This release fixes a crash in command line mode when an invalid declared charset (the "" charset) was found. The return code of the command line tool (CodepageProcessor) does not return 0 in case of an error anymore. A bug that broke the ability to reset input streams after detection was fixed.

URL: cpdetector, free java code page detection.