Results 1 to 8 of 8

Thread: IndexSearcher

  1. #1
    axenos is offline Member
    Join Date
    Mar 2011
    Posts
    18
    Rep Power
    0

    Default IndexSearcher

    Hello, I am new to Lucene, so I have been dealing with some problems.
    I am creating an Index and I am adding two .rtf documents in it. I suppose the adding part is correct because when the index is created, the numDocs() returns 2, which is right.

    But, when it comes to the search, I am getting 0 hits in return. I am using Lucene 3.0.3 and this is my code.

    Java Code:
    /*
     * To change this template, choose Tools | Templates
     * and open the template in the editor.
     */
    
    package myThesis;
    
    import java.io.File;
    import java.io.FileInputStream;
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.Reader;
    import org.apache.lucene.analysis.Analyzer;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.CorruptIndexException;
    import org.apache.lucene.index.IndexReader;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.index.IndexWriter.MaxFieldLength;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.queryParser.QueryParser;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.search.TopScoreDocCollector;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.LockObtainFailedException;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.util.Version;
    /**
     *
     * @author Andreas
     */
    public class Lucene1{
    
        private Analyzer analyzer;
        private Directory directory;
        private IndexWriter iwriter;
        private Document doc;
        private IndexSearcher isearcher;
        private File dir;
        private String path;
        private FileInputStream is;
        private MaxFieldLength mlf;
        private Boolean b;
        private int originalNumDocs;
    
        public Lucene1(String search) throws FileNotFoundException, CorruptIndexException, LockObtainFailedException, IOException, ParseException{
            // Store the index in memory:
            // To store an index on disk, use this instead:
            //Directory directory = FSDirectory.open("/tmp/testindex");
            directory = new RAMDirectory();
            mlf = IndexWriter.MaxFieldLength.UNLIMITED;
            path = "C:/Users/Andreas/Documents/NetBeansProjects/docs/";
            dir = new File(path);
            StoreIndex(search);
        }
    
            public void StoreIndex(String searchString) throws CorruptIndexException, LockObtainFailedException, IOException, ParseException{
                analyzer = new StandardAnalyzer(Version.LUCENE_30);
                if (iwriter == null) b = true;
                else b = false;
                iwriter = new IndexWriter(directory, analyzer, b, mlf);
                System.out.println("Creating index with the following files ...");
                File[] files = dir.listFiles();
                originalNumDocs = iwriter.numDocs();
                for (File file : files) {
                    //System.out.println(file);
                    is = new FileInputStream(file);
                    doc = new Document();
                    path = file.getCanonicalPath();
    		doc.add(new Field("path", path, Field.Store.YES, Field.Index.ANALYZED));
                    
                    Reader reader = new FileReader(file);
                    doc.add(new Field("contents", reader,Field.TermVector.WITH_POSITIONS_OFFSETS));
    
                    iwriter.addDocument(doc);
                    System.out.println("Added: " + file);
                    //System.out.println(iwriter.numDocs());
                    //System.out.println(iwriter.numRamDocs());
                }           
                iwriter.optimize();
                iwriter.close();
                System.out.println("Index has been created.");
                System.out.println();
                System.out.println((iwriter.numDocs() - originalNumDocs) + " documents added.");
                SearchIndex(searchString);
            }
    
            public void SearchIndex(String searchString) throws IOException, ParseException{
                System.out.println("Searching for '" + searchString + "'");
                IndexReader ireader = IndexReader.open(directory, true);
                isearcher = new IndexSearcher(ireader);
                analyzer = new StandardAnalyzer(Version.LUCENE_30);
                // Parse a simple query that searches for "text":
                QueryParser parser = new QueryParser(Version.LUCENE_30, "content", analyzer);
                // Search for documents that contain the word searchString
                Query query = parser.parse(searchString);
    TopScoreDocCollector collector = TopScoreDocCollector.create(1000, true);
    isearcher.search(query, collector);
                /* First parameter is the query to be executed and
                second parameter indicates the no of search results to fetch */
                //TopDocs topDocs = isearcher.search(query,1000);
                
                // Get an array of references to matched documents
                ScoreDoc[] hits = collector.topDocs().scoreDocs;
                System.out.println("Total hits: " + collector.getTotalHits());
                for (ScoreDoc scoredoc : hits) {
                    //Retrieve the matched document and show relevant details
                    Document hitDoc = isearcher.doc(scoredoc.doc);
                    String path2 = hitDoc.get("path");
                    System.out.println("Hit: " + path2);
                    String path3 = hitDoc.get("content");
                    System.out.println(path3);
      
                }
                isearcher.close();
                directory.close();
            }
    }
    Any help would be useful..thanks
    Last edited by axenos; 03-30-2011 at 09:26 PM.

  2. #2
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    620
    Rep Power
    4

    Default

    Hi. Can you show some example a file, which you index and a search word?
    Skype: petrarsentev
    http://TrackStudio.com

  3. #3
    axenos is offline Member
    Join Date
    Mar 2011
    Posts
    18
    Rep Power
    0

    Default

    Hi,

    the document I want to parse was a .java file and I changed its format to .txt.
    This is the document's contens:

    Java Code:
    package katastimaperifereiakwn;
    
    import java.io.Serializable;
    
    /**
     * A class that legates its variables and functions to the
     * classes Admin, Employee, Manager. Its functions
     * can be used to create and initialize an object, get and set its
     * values.
     *
     * @author Xenos Andreas-1391, Neroutsos Efthimis-1515
     */
    public class User implements Serializable {
    
        protected String firstName;
        protected String lastName;
        protected String username;
        protected String password;
        protected String post;
        protected String phoneNum;
        protected boolean k;
    
        /**
         * This function allows us to set the value of the variable k
         * which is used to control and check the flow of the user
         * (Admin, Employee, Manager) functions in the main class.
         * @param aK The value of the k (true/false)
         */
        public void setK(boolean aK)
        {
            k = aK;
        }
    
        /**
         * This function allows us to set the value of the variable k
         * which is used to control and check the flow of the user
         * (Admin, Employee, Manager) functions in the main class.
         * @return The value of the k.
         */
        public boolean getK()
        {
            return k;
        }
        
        /**
         * This is the constructor of the class User
         * @param first The first name of the user
         * @param last The last name of the user
         * @param user The username of the user
         * @param pass The password of the user
         * @param phone The phone number of the user
         */
        public User (String first,String last,String user,String pass,String phone)
        {
           firstName = first;
           lastName = last;
           username = user;
           password = pass;
           phoneNum = phone;
        }
    
        public User(){};
    
        /**
         * Use this function to set the first name of the user
         * @param first The first name of the user
         */
        public void setFirstName(String first)
        {
            firstName = first;
        }
    
        /**
         * Use this function to get the first name of the user
         * @return The first name of the user
         */
        public String getFirstName()
        {
            return firstName;
        }
    
    
        /**
         * Use this function to set the last name of the user
         * @param last The last name of the user
         */
        public void setLastName(String last)
        {
            lastName = last;
        }
    
        /**
         * Use this function to get the last name of the user
         * @return The last name of the user
         */
        public String getLastName()
        {
            return lastName;
        }
    
        /**
         * Use this function to set the username of the user
         * @param user The username of the user
         */
        public void setUsername(String user)
        {
            username = user;
        }
    
        /**
         * Use this function to get the username of the user
         * @return The username of the user
         */
        public String getUsername()
        {
            return username;
        }
    
        /**
         * Use this function to set the password of the user
         * @param pass The password of the user
         */
        public void setPassword(String pass)
        {
            password = pass;
        }
    
        /**
         * Use this function to get the password of the user
         * @return The password of the user
         */
        public String getPassword()
        {
            return password;
        }
    
        /**
         * Use this function to set the phone number of the user
         * @param phone The phone number of the user
         */
        public void setPhoneNum(String phone)
        {
            phoneNum = phone;
        }
    
        /**
        * Use this function to get the phone number of the user
        * @return The phone number of the user
        */
        public String getPhoneNum()
        {
            return phoneNum;
        }
    
        /**
         * Use this function to set the post of the user
         * @param pos The post of the user
         */
        public void setIdiotita(String pos)
        {
            post = pos;
        }
    
        /**
         * Use this function to get the post of the user
         * @return The post of the user
         */
        public String getIdiotita()
        {
            return post;
        }
    }

    Anyway, this document is an example. I want to be able to parse any document. Especially .java files.

    The search word I am using is 'user' and it gives me 0 hits.
    I am adding another file in the index also, similar to that.
    Have a look if you can and if you need anything else, please let me know.

    ok..thank you again.bye

  4. #4
    axenos is offline Member
    Join Date
    Mar 2011
    Posts
    18
    Rep Power
    0

    Default

    I tried to write the code differently. I can't really understand the difference between this and the original.
    This code finds some hits, not all of them and in the line that I have d.get("content"), it prints null.

    Java Code:
    package myThesis;
    import java.io.File;
    import java.io.FileReader;
    import org.apache.lucene.analysis.standard.StandardAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.index.IndexWriter;
    import org.apache.lucene.queryParser.ParseException;
    import org.apache.lucene.queryParser.QueryParser;
    import org.apache.lucene.search.*;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.RAMDirectory;
    import org.apache.lucene.util.Version;
    
    import java.io.IOException;
    import java.io.Reader;
    import java.util.Scanner;
    
    public class Lucene2 {
    
        private File dir;
        private String path;
        
      public void RunMe() throws IOException, ParseException {
        // 0. Specify the analyzer for tokenizing text.
        //    The same analyzer should be used for indexing and searching
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_30);
    
        // 1. create the index
        Directory index = new RAMDirectory();
        path = "C:/Users/Andreas/Documents/NetBeansProjects/docs/";
            dir = new File(path);
    
        // the boolean arg in the IndexWriter ctor means to
        // create a new index, overwriting any existing index
        IndexWriter iwriter = new IndexWriter(index, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED);
        File[] files = dir.listFiles();
        for (File file : files) {
            Reader reader = new FileReader(file);
            addDoc(iwriter, reader);
        }
        iwriter.optimize();
        iwriter.close();
    
        // 2. query
        System.out.print("Query: ");
        Scanner input = new Scanner(System.in);
        String query = input.next();
        Query q = new QueryParser(Version.LUCENE_30, "content", analyzer).parse(query);
    
        // 3. search
        int hitsPerPage = 1000;
        IndexSearcher searcher = new IndexSearcher(index, true);
        TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
        searcher.search(q, collector);
        ScoreDoc[] hits = collector.topDocs().scoreDocs;
    
        // 4. display results
        System.out.println("Found " + hits.length + " hits.");
        for(int i=0;i<hits.length;++i) {
          Document d = searcher.doc(hits[i].doc);
          System.out.println((i + 1) + ". " + d.get("content"));
        }
    
        // searcher can only be closed when there
        // is no need to access the documents any more.
        searcher.close();
      }
    
      private void addDoc(IndexWriter w, Reader reader) throws IOException {
        Document doc = new Document();
        doc.add(new Field("content", reader,Field.TermVector.WITH_POSITIONS_OFFSETS));
        w.addDocument(doc);
      }
    }

  5. #5
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    620
    Rep Power
    4

    Default

    It's cool. You posted three examples code, But they haven't main method. :) Can you show a example which I can run?
    Skype: petrarsentev
    http://TrackStudio.com

  6. #6
    axenos is offline Member
    Join Date
    Mar 2011
    Posts
    18
    Rep Power
    0

    Default

    ok! you're right. For now, I have a main class that only creates objects of these two (attempts of) lucene classes. So, for the two classes, I have this main:

    Java Code:
    package myThesis;
    
    import java.io.IOException;
    import java.sql.SQLException;
    import java.util.Scanner;
    import org.apache.lucene.index.CorruptIndexException;
    import org.apache.lucene.queryParser.ParseException;
    
    
    /**
     *
     * @author Xenos Andreas
     */
    public class Main {
        /**
         * @param args the command line arguments
         */
        public static void main(String[] args) throws IOException, SQLException, CorruptIndexException, ParseException
        {
    
            
           // System.out.print("Query: ");
            //query = input.next();
            //System.out.println();
            //Lucene1 luc = new Lucene1(query);
    Lucene2 l = new Lucene2();
    l.RunMe();
    
    
    
      }
    }

    I had some more code in the main class, but I erased it from here because it had to do with JDBC anf apache bcel. Here, I am just running lucene.

  7. #7
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    620
    Rep Power
    4

    Default

    Oh I confuse yourself. So I don't understand why it happens, but it can't get full context file from IndexSearch.
    I little changed your code, looks like
    Java Code:
     
     for (File file : files) {
                addDoc(iwriter, file);
            }
    ...
    for(int i=0;i<hits.length;++i) {
                Document d = searcher.doc(hits[i].doc);
                System.out.println((i + 1) + ". " + d.get("id"));
            }
    ...
    private void addDoc(IndexWriter w, File file) throws IOException {
            Document doc = new Document();
            doc.add(new Field("id", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
            doc.add(new Field("content",new FileReader(file), Field.TermVector.WITH_POSITIONS_OFFSETS));
            w.addDocument(doc);
        }
    It's class Lucene2.
    Now It works correct, But I really confuse why it's not work. I mean about this statement
    Java Code:
      System.out.println("Hit: " + document.get(FIELD_CONTENTS));
    Hope this helps.
    Skype: petrarsentev
    http://TrackStudio.com

  8. #8
    axenos is offline Member
    Join Date
    Mar 2011
    Posts
    18
    Rep Power
    0

    Default

    It really helped me Petr, thank you very much!

    I still don't understand what went so wrong in Lucene1 class..

    Anyway, it now works..see ya!

Similar Threads

  1. Replies: 0
    Last Post: 03-10-2011, 09:06 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •