XSLT (Extensible Stylesheet Language Transformations) is used to transform XML files into other formats like HTML format. There are many XSLT processors (libraries) available to be used in Java for XSLT transformation. These libraries can be used from a Java application like JSP/Servlet to read a XML file and to transform it into a HTML.
An XSLT processor takes two inputs: an XML file and an XSLT stylesheet.
For this post, I have chosen Xalan-Java library for the transformation.
Sometimes you are required to fetch and store data from web pages. If there are too many pages to parse, then obviously this cannot be done manually. Java provides support for web text extraction.
The approach is simple. You have to fetch all the HTML contents of a webpage and then you can write your own parser to extract the required info. For example: you might be asked to only store the text in table data tag with caption Hobbies. So you will store all the HTML contents of web