Cassandra Index generator

by , 02-23-2012 at 07:23 PM (801 Views)
In this section, we will create a Cassandra index. The below code explains and simulated a simple indexer. In this index, few components are added. In this code, we read text files as resources for this index generator. After reading content into memory, we pass it to tokenizer. This tokenizer is used to remove all none alpha numeric characters using regular expressions. After this it will separate text files using spaces as delimiter. Finally it chooses randomly words which will be used as tags.

Java Code: This is the code to explain Cassandra Index generator
public void init(int docC) {
        String name = "";
            for (int i = 0; i < docC; i++) {
                name = (i + 1) + ".txt";
                tokenize(readFile("data/" + name), name);
In above code, an integer is passed as parameters to indicate no of documents, we want to index. Then it will read the documents and will pass content to tokenizer method.

