Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 11-25-2007, 12:14 AM
Member
 
Join Date: Nov 2007
Posts: 20
Rep Power: 0
Gilgamesh is on a distinguished road
Default tokens
Ive used StringTokenizer to take the words from a text. I used the delimeters ",", "." etc.

questions

1) I tried to define a final string DELIMETERS="!@#" (etc) but when i type
Code:
StringTokenizer (line, DELIMITERS);
though it recognizes the delimeters, it creates one token made by every line of the text document (without including the delimeters).
if you cant figure out the problem, can you please tell me if there is a different way to set the delimeters? except from this:
Code:
StringTokenizer (line, ".", ",", "?");
?

2)some texts, at the end of the line use a hiven to continue the word to the next line. what can I 'unify' these two tokens that consist one word?

Last edited by Gilgamesh; 11-25-2007 at 12:19 AM.
Bookmark Post in Technorati
Reply With Quote
  #2 (permalink)  
Old 11-25-2007, 12:44 AM
hardwired's Avatar
Senior Member
 
Join Date: Jul 2007
Posts: 1,577
Rep Power: 4
hardwired is on a distinguished road
Default
a different way to set the delimeters
Try including the "space" delimiter.
some texts, at the end of the line use a hiven to continue the word to the next line. what can I 'unify' these two tokens that consist one word
Do you mean to remove the hyphen and concatenate the two words together to become a single token?
Code:
import java.util.StringTokenizer;

public class TokenDelims {
    public static void main(String[] args) {
        String s = "This is#a special test-string for testing " +
                   "deliminators in a StringTokenizer";
        String delims = " #-";
        StringTokenizer st = new StringTokenizer(s, delims);
        while(st.hasMoreTokens())
            System.out.println(st.nextToken());
    }
}
Bookmark Post in Technorati
Reply With Quote
  #3 (permalink)  
Old 11-25-2007, 01:12 AM
Member
 
Join Date: Nov 2007
Posts: 20
Rep Power: 0
Gilgamesh is on a distinguished road
Default
remove the hyphen and concatenate the two words together to become a single token i mean the use of hyphens to show that a word has been broken in order to fit onto a line.

But i am thinking now that the hyphens are also used to join words together to make a compound e.g. 'left-handed'.

thats sounds difficult.. if I use the String.split () (instead of the StringTokenizer) things gonna be easier?

so how can i make this code? 'if there is a hyphen check if the syllables that the hyphen is between them (even if there is a change of line) exist as a compound word in the (arraylist/vector) dictionary and if they do not then eliminate the hyphen and unite the syllables into one word.

pain in the neck lol

ooh and there is no ignoreCase at the StringTokenizer . :-|

Last edited by Gilgamesh; 11-25-2007 at 01:17 AM.
Bookmark Post in Technorati
Reply With Quote
  #4 (permalink)  
Old 11-25-2007, 03:39 AM
hardwired's Avatar
Senior Member
 
Join Date: Jul 2007
Posts: 1,577
Rep Power: 4
hardwired is on a distinguished road
Default
You can save the token in a string and lowerCase it
Code:
String token = st.nextToken();
token = token.toLowerCase();
Code:
import java.util.StringTokenizer;

public class TokenDelims {
    public static void main(String[] args) {
        String s = "This is a test-string for testing delimi-\n" +
                   "nators in a StringTokenizer";
        String delims = " ";
        StringTokenizer st = new StringTokenizer(s, delims);
        while(st.hasMoreTokens()) {
            String token = st.nextToken();
            int dash = token.indexOf("-");
            if(dash != -1) {
                int newLine = token.indexOf("\n");
                if(newLine != -1) {           // hyphen
                    int length = token.length();
                    token = token.substring(0, dash) +
                            token.substring(newLine+1, length);
                }
            }
            System.out.println(token);
        }
    }
}
Bookmark Post in Technorati
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Getting tokens using Scanner class Java Tip Java Tips 0 02-05-2008 10:11 AM
tokens Gilgamesh New To Java 5 12-03-2007 12:30 AM
How to use StringTokenizer for multiple tokens javaplus New To Java 2 11-29-2007 10:38 AM


All times are GMT +2. The time now is 03:33 PM.



VBulletin, Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2009, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org