Results 1 to 6 of 6
  1. #1
    everlast88az is offline Member
    Join Date
    Nov 2008
    Posts
    11
    Rep Power
    0

    Default Testing StringTokenizer Tokens against a text pattern

    Hi,

    So I am pulling a record from a database that is delimited by commas. I parse it using stringtokenizer into tokens. I want to find a way to test each token to see if it contains a certain word or words. I am unfamiliar with reg ex, is there a simple way to test the string and see if it is the word I am looking for?

    This is one of my attempts in code to check the token against the string "zebra" to see if it contains that, but it outputs nothing. I'm not sure how I need to modify this to check inside tokens for words...

    String m = "Zebra";
    for(int j=0; j<149; j++)
    {
    StringTokenizer parse = new StringTokenizer(Overview[j], ";");
    while(parse.hasMoreTokens())
    {
    //System.out.println(parse.nextToken());
    if(parse.nextToken().startsWith(m)){System.out.pri ntln(parse.nextToken());}

    }


    }


    Here is my compiler output

    init:
    deps-jar:
    Compiling 1 source file to
    C:\\NetBeansProjects\MSSQL Connector\build\classes
    compile:
    run:
    BUILD SUCCESSFUL (total time: 0 seconds)



    Everything outputs fine when I just use System.out.println(parse.nextToken());

  2. #2
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,548
    Rep Power
    23

    Default

    Everything outputs fine when I just use
    What doesn't work with your code?

    Try debugging your code by Breaking your code up into individual steps instead of chaining them together. Add println() between each step to see what values you are getting.

    Consider what nextToken() does? How many times are you calling it in your code? What happens to the value returned by the first call to nextToken?

  3. #3
    everlast88az is offline Member
    Join Date
    Nov 2008
    Posts
    11
    Rep Power
    0

    Default

    Well I am working on a data mining project

    Here is a piece of my sample data:

    Manufacturer: Zebra Technologies Corporation ; Manufacturer Part Number: 10500-2001-0400 ; Manufacturer Website Address: sfkjshfkasjhfas ; Product Model: 105SL ; Product Name: 105SL Network Thermal Label Printer ; I

    I have several thousand of these record sets, some contain certain fields lik say "Manufacturer" or "Input Voltage" and other records do not. The data is not uniform. Soo, I have this parsed out into tokens, but I have to load these tokens into a db but I need some way to check each token to see if it contains the keyphrase I am looking for, if it does insert it, if not go to the next token and check that against my list of possible phrases.

    My issue is, looking at this first token "Manufacturer: Zebra Technologies Corporation ;" the only piece I am interested in checking is that first couple of words up until the colon.

    So would a string method from the string class be best to tackle this or would a reg ex. I think reg ex is the way to go but I have no clue where to start.

    Any ideas??
    Last edited by everlast88az; 11-05-2008 at 10:55 PM.

  4. #4
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    16,548
    Rep Power
    23

    Default

    Did you read my previous post with suggestions where you should look at your code? And how to debug it?
    StringTokenizer or String.split could be used to tokenize your input data.
    piece I am interested in ...
    see if it contains the keyphrase I am looking for
    Given a token, how do you determine if it contains the keyphrase?
    Depends on the what the keyphrase looks like. Can you give some examples. Are they case sensitive? Can you use indexOf() to find them or ?

  5. #5
    everlast88az is offline Member
    Join Date
    Nov 2008
    Posts
    11
    Rep Power
    0

    Default

    OK I got it working up to the point where I was with REG EX now my problem is developing a reg ex that takes everything to the right of the keyphrase. For example I have Manufacturer: Toshiba. I want to remove the "Manufacturer: " portion and keep only the toshiba part and store it. But my reg ex is wrong can I get some advice how to write this? Thanks.

    String REGEX = ";";
    String REGEX2 = "Manufacturer:";
    String REGEX3 = "[^Manufacturer:\\s] ";
    String INPUT = Overview[1];


    Pattern p = Pattern.compile(REGEX);
    String[] items = p.split(INPUT);
    //for(String s : items) {
    for(int s=0; s<items.length; s++){
    //System.out.println(items[s]);
    }

    System.out.println();
    System.out.println();
    System.out.println(items[1]);

    if(items[1].matches(REGEX3))
    {

    Pattern q = Pattern.compile(REGEX3);
    String[] items2 = q.split(items[1]);
    System.out.println(items2);

    }


    OUTPUT



    init:
    deps-jar:
    Compiling 1 source file to C:\NetBeansProjects\MSSQL Connector\build\classes
    compile:
    run:


    Manufacturer: Toshiba
    BUILD SUCCESSFUL (total time: 0 seconds)
    Last edited by everlast88az; 11-05-2008 at 10:56 PM.

  6. #6
    everlast88az is offline Member
    Join Date
    Nov 2008
    Posts
    11
    Rep Power
    0

    Default

    haha nevermind I figured it out ;)
    Last edited by everlast88az; 11-06-2008 at 12:03 AM.

Similar Threads

  1. tokens
    By Gilgamesh in forum New To Java
    Replies: 5
    Last Post: 12-02-2007, 11:30 PM
  2. How to use StringTokenizer for multiple tokens
    By javaplus in forum New To Java
    Replies: 2
    Last Post: 11-29-2007, 09:38 AM
  3. tokens
    By Gilgamesh in forum New To Java
    Replies: 3
    Last Post: 11-25-2007, 02:39 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •