Results 1 to 5 of 5
- 01-29-2011, 10:38 PM #1
Member
- Join Date
- Jan 2011
- Posts
- 3
- Rep Power
- 0
Tokenizer with data validation for missing text
I'm working on a project where we have to import a file to an array. The file has country, city, region, region number, and population.
I got the constructor and tokenizer to work but it also needs to do exception handling. For example, if population was blank it wouldn't try to create an object with that line.
My question is can I even use tokenizer in this case or will I need to rewrite it to use substring? I didn't think about it when I started but if the tokenizer looks for blank space then it will be trying to parse the word after the blank space (it has crashed every time I tried to input a file missing something).
- 01-29-2011, 10:45 PM #2
Moderator
- Join Date
- Feb 2009
- Location
- New Zealand
- Posts
- 4,545
- Rep Power
- 11
It's sort of hard to tokenise something that isn't there!
I think I would check each token expecting all 5 to be there. Two different things could go wrong: what I find doesn't match what I expect (numbers where I expect alphabetic characters or vice versa) or not enough tokens. That second case effectively finds the blanks. (There is a third possibility: too many tokens. Maybe that should be flagged as an error.)
------------------------
If the data uses some sort of fixed width format then missing fields are detectable (and you know which field is missing). In that case I probably would use substring(), trim() what it returns and check for empty or bad strings.Last edited by pbrockway2; 01-29-2011 at 10:47 PM.
- 01-29-2011, 11:00 PM #3
Member
- Join Date
- Jan 2011
- Posts
- 3
- Rep Power
- 0
Yes, it does use a fix width. I started writing it again using substring figuring tokenizer wouldn't work with a blank space. I was just hoping to avoid doing it if possible.
Now that you mention it, I didn't think of testing for blank tokens. I got the validation to work checking for strings and numbers in the wrong field, but I'll see if I can add something to check for empty tokens too.
- 01-29-2011, 11:17 PM #4
Member
- Join Date
- Jan 2011
- Posts
- 3
- Rep Power
- 0
Got it working! I added a catch to handle a NoSuchElementException and it kept processing the rest of the file.
Thanks for the help.
- 01-29-2011, 11:18 PM #5
Moderator
- Join Date
- Feb 2009
- Location
- New Zealand
- Posts
- 4,545
- Rep Power
- 11
I'll see if I can add something to check for empty tokens too.
Maybe I wasn't clear - I don't think you'll find blank tokens because the tokeniser may well skip them.
What you might find is that the tokeniser only reports four tokens. In that case you know there was a blank because there should have been five tokens.
-------------------------
But, again, fixed width says "substring" to me.
[Edit] ... slow post ;(
Glad you've got it working. Catching a NoSuchElementException is one way of seeing if the data doesn't have enough tokens.
Similar Threads
-
Using Tokenizer to parse file data--CODE
By Cylab in forum Java SoftwareReplies: 0Last Post: 07-26-2010, 11:48 AM -
String Tokenizer - Basic email validation
By bobbyboyy in forum New To JavaReplies: 3Last Post: 11-01-2009, 04:44 PM -
how to use live validation with autocomplete in dojo text boxes in <s:text box>
By subashm28 in forum Suggestions & FeedbackReplies: 2Last Post: 01-23-2009, 04:09 PM -
Connection to SQL Server and Data Validation
By hisouka in forum JDBCReplies: 0Last Post: 09-01-2008, 11:57 AM -
Missing text encoding
By talgreen in forum EclipseReplies: 0Last Post: 03-30-2008, 08:14 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks