Cassandra Build the index
by , 02-23-2012 at 06:27 PM (454 Views)
When the data is ready, next step is to store it into column family. All the tags that are created in tokenizer can be processed in this step. Tokenizer has provided us a list of tags with document IDs. With the help of this information, we can do the following:
• Check the tags for duplication.
• Write data to column family in Cassandra.
Java Code: This is the code to explain index buildningprivate void tokenize(String doc, String docID) { //remove all none alpha numeric vals doc = doc.replaceAll("[^a-zA-Z0-9\\s]", "|_|"); doc = doc.toLowerCase();//ensure everything is lower case String[] lTerms = doc.split("\\s");//split after each space for (String word : lTerms) { //add the word as a key, the docid as the column value and a ranom number x as column name /** * Inefficient way of doing this because it makes a trip to the DB for everyword. * A better way would be to get all the docs associated to a word (rowkey) * and then create all the columns and do a single batch operation on the db */ SimpleClient.cassandraClient.addTag(word, ("" + Math.random()).replace("0.", ""), docID); System.out.println(word); } System.out.println(); }









Email Blog Entry
License4J 4.0
Today, 12:23 AM in Java Software