Results 1 to 1 of 1
- 05-25-2011, 06:38 PM #1
Member
- Join Date
- May 2011
- Posts
- 1
- Rep Power
- 0
How to automatically classify data in a large database
Hello,
I am supposed to take data from wikipeadia dump or freebase dump or dbpedia.
I am then supposed write code that gives as output what every datum in that database is. eg: name of a person or a business, address,... It does not matter in what language i write the code but, I’m only familiar with C, C++, Java and Python. Java is my preferred language.
Those databases have all types of data: title, person name, address, social security, phone...
I have three questions:
1) Since I have used machine learning a lot, I have decided to use a machine learning approach.
I have started looking into WEKA, a Java machine learning toolbox. It however has only a GPL license. Is there another tool box that i can use in commercial product.
2)The problem I am facing with a machine learning approach is that I don't know what features to use. All I can think of right now is: the length of the datum, the number of string characters it has, the number of integer character it has.
This is very little with all the type of data those databases have. Regular expression seems to not be a solution for this type of project.
2)Is there another approach I can use? I mean, is machine learning the only approach?
Thank you for your help.
Regards,
Herve
Similar Threads
-
What would be the best way to store a large Database with J2me
By father jack hackett in forum CLDC and MIDPReplies: 0Last Post: 04-08-2010, 06:08 PM -
How to generate database schema automatically through Hibernate
By berlindutza in forum Web FrameworksReplies: 26Last Post: 02-25-2010, 11:56 AM -
Large data over RMI
By JavaDesigner in forum New To JavaReplies: 7Last Post: 10-16-2009, 08:48 PM -
How to insert large data into database using one insert query
By sandeepsai39 in forum New To JavaReplies: 3Last Post: 02-28-2009, 09:17 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks