Results 1 to 5 of 5
  1. #1
    Kosala is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default how to extract sentences from a txt book?

    I want to extract sentences from a text file. Sentences are separated from one of "., !, ?, ]".

    The following code separates sentences, but the problem is, it also separates next line as a new sentence. There are line spaces between some sentences in the text file.

    File file = new File("book.txt");
    Scanner scanner = new Scanner(file);
    String sens;

    Pattern p = Pattern.compile("[\\.\\!\\?\\]]");
    scanner.useDelimiter(p);
    while(scanner.hasNext()){
    sens=scanner.next().trim();
    System.out.println(sens);
    }

    I am wondering, if you have a quick answer to this issue using java?

  2. #2
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,318
    Rep Power
    25

    Default Re: how to extract sentences from a txt book?

    the problem is, it also separates next line as a new sentence.
    Please explain what you mean? When I execute the code with this input:
    Java Code:
          Scanner scanner = new Scanner("this\nis on\nthree lines. This is\non two\n");
    I get two lines output. Since the lines contain \n chararcters they print over more than one line.
    Change your println to this to see:
    Java Code:
           System.out.println(">" + sens +"<");

  3. #3
    Kosala is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: how to extract sentences from a txt book?

    Many thanks for the quick reply. I am trying to solve "http://cplus.about.com/od/programmingchallenges/a/challenge26.htm". so as the first step, I want to separate sentences from the txt file. I am trying in java.

    Each sentence I get have spaces in between, such as

    >Frankenstein,
    or the Modern Prometheus

    by
    Mary Wollstonecraft (Godwin) Shelley

    Letter 1
    St<

    the code:
    File file = new File("book.txt");
    Scanner scanner = new Scanner(file);

    Pattern p = Pattern.compile("[\\.\\!\\?\\]]");
    scanner.useDelimiter(p);
    while(scanner.hasNext()){
    sens=scanner.next();
    System.out.println(">"+sens+"<");
    }


    how can i remove these unnecessary space in these sentences? or something wrong with this approach?

    Thanks a million
    Kosala

  4. #4
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    SW Missouri
    Posts
    17,318
    Rep Power
    25

    Default Re: how to extract sentences from a txt book?

    remove these unnecessary space
    Replace every 2 spaces with one space?

  5. #5
    Kosala is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: how to extract sentences from a txt book?

    Hi Norm, Many thanks.
    how to write a regular expression for sentence separators,
    * Period .
    * Exclamation mark !
    * Question mark ?
    * Period plus double quote ."
    * A Closing square bracket ]

    I wrote it as, Pattern p = Pattern.compile("[\\.\\!\\?\\]]"); but I do not know how to include period plus double quote. If someone can give me a hand?

    Thanks in advance.

Similar Threads

  1. How to read a particular parf of an sentences ???
    By qwerty53 in forum New To Java
    Replies: 3
    Last Post: 07-29-2011, 10:00 AM
  2. Randomly Generate Sentences
    By kevorski in forum New To Java
    Replies: 18
    Last Post: 10-27-2010, 04:40 AM
  3. Replies: 8
    Last Post: 09-15-2009, 11:53 AM
  4. Creating random sentences
    By bluekswing in forum New To Java
    Replies: 4
    Last Post: 06-27-2007, 05:45 PM
  5. vars and if sentences in XSL-FO
    By Alan in forum XML
    Replies: 1
    Last Post: 05-31-2007, 02:24 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •