Results 1 to 6 of 6

Thread: tokens

  1. #1
    Gilgamesh is offline Member
    Join Date
    Nov 2007
    Posts
    20
    Rep Power
    0

    Default tokens

    I am creating tokens using String Tokenizer from a input file (in order to compare these tokens to the elements of a String Array). But I got problem with the words than are hyphenated due to the change of the line (in the file). How can i unite the broken parts of these words (and then treat them as tokens ready to be tested)?

    e.g. Java is a programming language origi-
    nally developed by Sun Microsystems
    and released in 1995 as a core compo-
    nent of Sun's Java platform.

    the '-' will still be part of the delimeters right?
    Last edited by Gilgamesh; 12-01-2007 at 10:29 PM.

  2. #2
    staykovmarin is offline Senior Member
    Join Date
    Nov 2007
    Location
    Newport, WA
    Posts
    141
    Rep Power
    0

    Default

    Can you post some of your code? I dont understand what you are doing.

  3. #3
    Gilgamesh is offline Member
    Join Date
    Nov 2007
    Posts
    20
    Rep Power
    0

    Default

    better now?

  4. #4
    staykovmarin is offline Senior Member
    Join Date
    Nov 2007
    Location
    Newport, WA
    Posts
    141
    Rep Power
    0

    Default

    I am not sure how you read the file and such. Also i just added the output to a vector for testing (output of the file that is).

    Java Code:
    	BufferedReader rd = new BufferedReader(new FileReader(
    			"test/test2.txt"));
    	String s;
    	String temp = "";
    	Vector<String> file = new Vector<String>();
    	while ((s = rd.readLine()) != null) {
    		// if temp has a length, then we want to add it to the file,
    		// with the line
    		/*
    		* this will check temp before executing the rest of the code
    		* (obviously). that means temp will be appened on the next pass
    		* (next line of the output): 
    		* ex: 
    		* foo
    		* foo-
                    * bar
    		* 
    		* will become: 
    		* foo 
    		* foobar
    		*/
    		if (temp.length() > 0) file.add(temp + s);
    
    		// checks if we have a hyphen at the end
    		if (s.endsWith("-")) {
    			// handles multple words. if it contains a space, it will
    			// assume that the last space means its the word it needs to
    			// combiine
    			if (s.trim().contains(" ")) {
    				temp = s.substring(s.trim().lastIndexOf(" ") + 1, s
    						.trim().length() - 1);
    				file.add(s.substring(0, s.trim().lastIndexOf(" ")));
    			} else {
    				// if there is no space, then we will just assume that it is a single word
    				temp = s.substring(0, s.trim().length() - 1);
    			}
    		} else temp = "";
    	}

  5. #5
    Gilgamesh is offline Member
    Join Date
    Nov 2007
    Posts
    20
    Rep Power
    0

    Default

    Using buffered reader, and then while not null the readLine() continues, then String Tokenizer creates the tokens (having as delimeters .,?;: etc).

    So, I want not only the tokens that are defined thanks to the delimeters, but I also want to create tokens that are made by words that have a hyphen at the end of the line and continue into the next line.

    (And then take each and every token (created by either of these two procedures, no matter by which) and do something with them.)
    Last edited by Gilgamesh; 12-02-2007 at 02:09 AM.

  6. #6
    Gilgamesh is offline Member
    Join Date
    Nov 2007
    Posts
    20
    Rep Power
    0

Similar Threads

  1. Getting tokens using Scanner class
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 02-05-2008, 10:11 AM
  2. How to use StringTokenizer for multiple tokens
    By javaplus in forum New To Java
    Replies: 2
    Last Post: 11-29-2007, 10:38 AM
  3. tokens
    By Gilgamesh in forum New To Java
    Replies: 3
    Last Post: 11-25-2007, 03:39 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •