Connecting to wikipedia
Im trying to learn how to connect to wikipedia and then retrieve articles from a java application.
I have been reading this url connection tutorial from java.sun:
This far i have been able to stream data into a string from the url, but it contained alooot of html code.
So my questions are:
1. witch ways are there to clean up the string from any html code and just keep the text?
2. Is it possible to write to the wikipedia search field through the java application without any permission or such?
I have been googling around alot but havent found anything that made any sence to me, and all im asking for is some tips or maybe a tutorial link or two
And im using netbeans if thats relevant
Best regards / mannez
I'm not much of a Java programmer, but I do know Wikipedia fairly well. If you don't mind wikitext (double brackets for links, and so on), you can get it for any article with a simple GET request by adding "?action=raw" to its URL.
For example, <http://en.wikipedia.org/wiki/Main_Page> becomes <http://en.wikipedia.org/wiki/Main_Page?action=raw>.