Results 1 to 11 of 11
  1. #1
    Kangaroo128 is offline Member
    Join Date
    Aug 2009
    Posts
    14
    Rep Power
    0

    Default Delimiter question

    I have a quick question about a problem I'm having with the delimiter in the Scanner class. I'm writing a basic concordance program and my output is supposed to display every unique word with it's corresponding line numbers. Everything seems to be working like it should, but I'm having an issue with blank line's line numbers displaying on my output. For example, the output should read:

    a : 2,5,7,10
    bike : 3, 19
    etc...

    But instead I'm getting
    : 3, 9
    a : 2,5,7,10
    bike : 3, 19

    I'm sure this is a simple fix but I just don't know what the problem is. Here is my code:

    Java Code:
    import java.util.*;
    
    /**The Concordance class takes input from 
     * a file and returns a list of unique words
     * and the line number that each unique
     * word can be found.
     * @param args
     */
    public class Concordance 
    {
    	static TreeMap<String,TreeSet<Integer>> wordMap = new TreeMap<String, TreeSet<Integer>>();
    	
    	public static void main(String[] args) 
    	{
    		Scanner input = new Scanner(System.in);
                    input.useDelimiter("\\W+");
    		int lineNum = 0;
    		String words;
    		String[] wordArray;
    		
    		while(input.hasNextLine())
    		{
    			words = input.nextLine();
    			wordArray = words.toLowerCase().split(" ");
    			lineNum++;  // Start a new line.
    			
    			for(int i = 0; i < wordArray.length; i++)
    			{
    				addWord(wordArray[i], lineNum);
    			}
    		}
    		printTreeMap(wordMap);
    	}
    	
    .
    . (Rest of Code)
    .
    
    }
    Thanks in advance!
    Last edited by Kangaroo128; 09-07-2009 at 11:40 PM.

  2. #2
    Kangaroo128 is offline Member
    Join Date
    Aug 2009
    Posts
    14
    Rep Power
    0

    Default

    I forgot to mention that I have a feeling that the delimiter isn't working correctly because I'm sending the entire line to my variable with input.NextLine, instead of each word of input with input.Next, but I need to go through the text one line at a time as I need to keep track of the line number. So therein lies my pickle, how do I delimit the text while also working through a line at a time?

  3. #3
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,172
    Rep Power
    20

    Default

    Quote Originally Posted by Kangaroo128 View Post
    I forgot to mention that I have a feeling that the delimiter isn't working correctly because I'm sending the entire line to my variable with input.NextLine, instead of each word of input with input.Next, but I need to go through the text one line at a time as I need to keep track of the line number. So therein lies my pickle, how do I delimit the text while also working through a line at a time?
    Don't use the delimiter then. Just use String#split()?

    To be honest I have no idea what you're trying to do...could you give us a couple of lines of data and the expected output, and why that should be the expected output?

  4. #4
    phoenix123 is offline Member
    Join Date
    Sep 2009
    Posts
    6
    Rep Power
    0

    Default

    refer example on java2examples.com

  5. #5
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,172
    Rep Power
    20

    Default

    Quote Originally Posted by phoenix123 View Post
    refer example on java2examples.com
    Any particular example, or are you just plugging a website?

  6. #6
    phoenix123 is offline Member
    Join Date
    Sep 2009
    Posts
    6
    Rep Power
    0

    Default

    use StringTokenizer, refer

    java2examples.com/java/Commonly%20Used%20Java%20Classes/StringTokenizer/stringTokenizerList.php

  7. #7
    r035198x is offline Senior Member
    Join Date
    Aug 2009
    Posts
    2,388
    Rep Power
    8

    Default

    Quote Originally Posted by phoenix123 View Post
    use StringTokenizer, refer

    java2examples.com/java/Commonly%20Used%20Java%20Classes/StringTokenizer/stringTokenizerList.php
    Rather don't use StringTokenizer at all. Read the API specs for it to find out why. String.split should be preferred these days.
    And please stop spamming the forums with your site suggestions.

  8. #8
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,172
    Rep Power
    20

    Default

    Quote Originally Posted by phoenix123 View Post
    use StringTokenizer, refer

    java2examples.com/java/Commonly%20Used%20Java%20Classes/StringTokenizer/stringTokenizerList.php
    1. Then why didn't you post that in the first place?

    2. Not a good advert for your site. From the API:

    "StringTokenizer is a legacy class that is retained for compatibility reasons although its use is discouraged in new code."

    Hence my use of split() above.

  9. #9
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    phoenix, all of your posts have such similarity as to suggest that you're spamming answers with plugs to a web site. Please be careful with this.

  10. #10
    Kangaroo128 is offline Member
    Join Date
    Aug 2009
    Posts
    14
    Rep Power
    0

    Default

    Sorry it took a while to get back about this. Basically what a concordance does is read a text file and return the number of unique words in the text file along with the line numbers where each of those words appears. For example:

    Text File:
    This is a test.
    This is another test. This-
    is yet another test.

    Output:
    a : 1
    another: 2, 3
    is : 1, 2, 3
    test: 1, 2, 3
    this: 1, 2
    yet: 3

    I think I got them all covered. But you get the idea. My problem is, my output is returning empty lines as :
    : 2, 4, 9

    And also returning words with characters like a comma or dash as:
    another, : 1, 2, 3
    this- : 1, 2, 3

    When it needs to "delimit" and basically not return any characters other than letters which is what I'm telling it to do.

    As you guys have mentioned, splitting with ("\\W+") works to get rid of the characters such as commas and dashes, but is still spitting out the pesky blank lines for some reason. I have no clue how to fix it?? :confused:

  11. #11
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    12,172
    Rep Power
    20

    Default

    Quote Originally Posted by Kangaroo128 View Post
    Sorry it took a while to get back about this. Basically what a concordance does is read a text file and return the number of unique words in the text file along with the line numbers where each of those words appears. For example:

    Text File:
    This is a test.
    This is another test. This-
    is yet another test.

    Output:
    a : 1
    another: 2, 3
    is : 1, 2, 3
    test: 1, 2, 3
    this: 1, 2
    yet: 3

    I think I got them all covered. But you get the idea. My problem is, my output is returning empty lines as :
    : 2, 4, 9

    And also returning words with characters like a comma or dash as:
    another, : 1, 2, 3
    this- : 1, 2, 3

    When it needs to "delimit" and basically not return any characters other than letters which is what I'm telling it to do.

    As you guys have mentioned, splitting with ("\\W+") works to get rid of the characters such as commas and dashes, but is still spitting out the pesky blank lines for some reason. I have no clue how to fix it?? :confused:
    OK.
    Who here knows regexes?
    Essentially you want to read a line in as you would for any normal file, then split() based on a regex that has commas, spaces etc. One thing, though, is to decide what a hyphenated word counts as...one word or two? In which case you have an added complication in the regex, but I'd ignore that one for the moment.

    A quick peak at regexes and I think it might be something along the lines of split("[,-\\s]"). That might (I'm not terribly good at them) cover splitting on commas, dashes and any whitespace (that's the \s bit).

    ETA: I knew I'd get it wrong...the '-' needs escaping, so the expression is "[,\\-\\s]".
    Last edited by Tolls; 09-09-2009 at 10:32 AM.

Similar Threads

  1. scanning a file and using more than one delimiter
    By thomase in forum New To Java
    Replies: 20
    Last Post: 04-07-2009, 04:04 AM
  2. Replies: 1
    Last Post: 02-20-2009, 03:06 PM
  3. Use of Scanner class and Delimiter
    By tjhodge in forum New To Java
    Replies: 3
    Last Post: 02-12-2009, 06:26 PM
  4. using Delimiter with metacharacters
    By wntdaliv in forum New To Java
    Replies: 10
    Last Post: 12-02-2008, 07:42 AM
  5. delimiter
    By satin in forum New To Java
    Replies: 2
    Last Post: 11-17-2008, 11:50 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •