finding idf for noun phrases
i have a set of 500 articles in my db from which im extracting noun phraes. im using the stanford POS tagger for POS tagging and then a NP chunker to identify noun phrases. both these tools accept text files as input..so at every stage i generate a text file of 500 articles to provide as input to these tools..
now i have a list of all the noun phrases and their frequencies,in 500 articles.
i want to weight the Most frequent NPs using idf.
but i cannot find a way of doing it. can somebody help me with this?