Feature extraction from a text file in java. this is used for scoring the sentences
I have seperated individual sentences out of a text document read from a file using several heuristics. Now i need to index the sentence according to the order of appearance (eg 1st sentence as 1, then 2,3 etc). Then i need to extract features out of these marked sentences. The features include position(if at top of document position is 1 if bottom its 0),length(number of words in the sentence) and thematic words(most frequent words). based on these features i will score the sentences. Please do help me out friends