Results 1 to 6 of 6
- 11-05-2008, 04:03 PM #1
Member
- Join Date
- Nov 2008
- Posts
- 11
- Rep Power
- 0
Testing StringTokenizer Tokens against a text pattern
Hi,
So I am pulling a record from a database that is delimited by commas. I parse it using stringtokenizer into tokens. I want to find a way to test each token to see if it contains a certain word or words. I am unfamiliar with reg ex, is there a simple way to test the string and see if it is the word I am looking for?
This is one of my attempts in code to check the token against the string "zebra" to see if it contains that, but it outputs nothing. I'm not sure how I need to modify this to check inside tokens for words...
String m = "Zebra";
for(int j=0; j<149; j++)
{
StringTokenizer parse = new StringTokenizer(Overview[j], ";");
while(parse.hasMoreTokens())
{
//System.out.println(parse.nextToken());
if(parse.nextToken().startsWith(m)){System.out.pri ntln(parse.nextToken());}
}
}
Here is my compiler output
init:
deps-jar:
Compiling 1 source file to
C:\\NetBeansProjects\MSSQL Connector\build\classes
compile:
run:
BUILD SUCCESSFUL (total time: 0 seconds)
Everything outputs fine when I just use System.out.println(parse.nextToken());
- 11-05-2008, 06:55 PM #2
What doesn't work with your code?Everything outputs fine when I just use
Try debugging your code by Breaking your code up into individual steps instead of chaining them together. Add println() between each step to see what values you are getting.
Consider what nextToken() does? How many times are you calling it in your code? What happens to the value returned by the first call to nextToken?
- 11-05-2008, 08:19 PM #3
Member
- Join Date
- Nov 2008
- Posts
- 11
- Rep Power
- 0
Well I am working on a data mining project
Here is a piece of my sample data:
Manufacturer: Zebra Technologies Corporation ; Manufacturer Part Number: 10500-2001-0400 ; Manufacturer Website Address: sfkjshfkasjhfas ; Product Model: 105SL ; Product Name: 105SL Network Thermal Label Printer ; I
I have several thousand of these record sets, some contain certain fields lik say "Manufacturer" or "Input Voltage" and other records do not. The data is not uniform. Soo, I have this parsed out into tokens, but I have to load these tokens into a db but I need some way to check each token to see if it contains the keyphrase I am looking for, if it does insert it, if not go to the next token and check that against my list of possible phrases.
My issue is, looking at this first token "Manufacturer: Zebra Technologies Corporation ;" the only piece I am interested in checking is that first couple of words up until the colon.
So would a string method from the string class be best to tackle this or would a reg ex. I think reg ex is the way to go but I have no clue where to start.
Any ideas??Last edited by everlast88az; 11-05-2008 at 10:55 PM.
- 11-05-2008, 09:00 PM #4
Did you read my previous post with suggestions where you should look at your code? And how to debug it?
StringTokenizer or String.split could be used to tokenize your input data.
Given a token, how do you determine if it contains the keyphrase?piece I am interested in ...
see if it contains the keyphrase I am looking for
Depends on the what the keyphrase looks like. Can you give some examples. Are they case sensitive? Can you use indexOf() to find them or ?
- 11-05-2008, 09:36 PM #5
Member
- Join Date
- Nov 2008
- Posts
- 11
- Rep Power
- 0
OK I got it working up to the point where I was with REG EX now my problem is developing a reg ex that takes everything to the right of the keyphrase. For example I have Manufacturer: Toshiba. I want to remove the "Manufacturer: " portion and keep only the toshiba part and store it. But my reg ex is wrong can I get some advice how to write this? Thanks.
String REGEX = ";";
String REGEX2 = "Manufacturer:";
String REGEX3 = "[^Manufacturer:\\s] ";
String INPUT = Overview[1];
Pattern p = Pattern.compile(REGEX);
String[] items = p.split(INPUT);
//for(String s : items) {
for(int s=0; s<items.length; s++){
//System.out.println(items[s]);
}
System.out.println();
System.out.println();
System.out.println(items[1]);
if(items[1].matches(REGEX3))
{
Pattern q = Pattern.compile(REGEX3);
String[] items2 = q.split(items[1]);
System.out.println(items2);
}
OUTPUT
init:
deps-jar:
Compiling 1 source file to C:\NetBeansProjects\MSSQL Connector\build\classes
compile:
run:
Manufacturer: Toshiba
BUILD SUCCESSFUL (total time: 0 seconds)Last edited by everlast88az; 11-05-2008 at 10:56 PM.
- 11-05-2008, 09:43 PM #6
Member
- Join Date
- Nov 2008
- Posts
- 11
- Rep Power
- 0
Similar Threads
-
tokens
By Gilgamesh in forum New To JavaReplies: 5Last Post: 12-02-2007, 11:30 PM -
How to use StringTokenizer for multiple tokens
By javaplus in forum New To JavaReplies: 2Last Post: 11-29-2007, 09:38 AM -
tokens
By Gilgamesh in forum New To JavaReplies: 3Last Post: 11-25-2007, 02:39 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks