Results 1 to 5 of 5
  1. #1
    dug
    dug is offline Member
    Join Date
    Jan 2011
    Posts
    3
    Rep Power
    0

    Default Tokenizer with data validation for missing text

    I'm working on a project where we have to import a file to an array. The file has country, city, region, region number, and population.

    I got the constructor and tokenizer to work but it also needs to do exception handling. For example, if population was blank it wouldn't try to create an object with that line.

    My question is can I even use tokenizer in this case or will I need to rewrite it to use substring? I didn't think about it when I started but if the tokenizer looks for blank space then it will be trying to parse the word after the blank space (it has crashed every time I tried to input a file missing something).

  2. #2
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,585
    Rep Power
    12

    Default

    It's sort of hard to tokenise something that isn't there!

    I think I would check each token expecting all 5 to be there. Two different things could go wrong: what I find doesn't match what I expect (numbers where I expect alphabetic characters or vice versa) or not enough tokens. That second case effectively finds the blanks. (There is a third possibility: too many tokens. Maybe that should be flagged as an error.)

    ------------------------

    If the data uses some sort of fixed width format then missing fields are detectable (and you know which field is missing). In that case I probably would use substring(), trim() what it returns and check for empty or bad strings.
    Last edited by pbrockway2; 01-29-2011 at 11:47 PM.

  3. #3
    dug
    dug is offline Member
    Join Date
    Jan 2011
    Posts
    3
    Rep Power
    0

    Default

    Yes, it does use a fix width. I started writing it again using substring figuring tokenizer wouldn't work with a blank space. I was just hoping to avoid doing it if possible.

    Now that you mention it, I didn't think of testing for blank tokens. I got the validation to work checking for strings and numbers in the wrong field, but I'll see if I can add something to check for empty tokens too.

  4. #4
    dug
    dug is offline Member
    Join Date
    Jan 2011
    Posts
    3
    Rep Power
    0

    Default

    Got it working! I added a catch to handle a NoSuchElementException and it kept processing the rest of the file.

    Thanks for the help.

  5. #5
    pbrockway2 is offline Moderator
    Join Date
    Feb 2009
    Location
    New Zealand
    Posts
    4,585
    Rep Power
    12

    Default

    I'll see if I can add something to check for empty tokens too.

    Maybe I wasn't clear - I don't think you'll find blank tokens because the tokeniser may well skip them.

    What you might find is that the tokeniser only reports four tokens. In that case you know there was a blank because there should have been five tokens.

    -------------------------

    But, again, fixed width says "substring" to me.

    [Edit] ... slow post ;(

    Glad you've got it working. Catching a NoSuchElementException is one way of seeing if the data doesn't have enough tokens.

Similar Threads

  1. Using Tokenizer to parse file data--CODE
    By Cylab in forum Java Software
    Replies: 0
    Last Post: 07-26-2010, 12:48 PM
  2. String Tokenizer - Basic email validation
    By bobbyboyy in forum New To Java
    Replies: 3
    Last Post: 11-01-2009, 05:44 PM
  3. Replies: 2
    Last Post: 01-23-2009, 05:09 PM
  4. Replies: 0
    Last Post: 09-01-2008, 12:57 PM
  5. Missing text encoding
    By talgreen in forum Eclipse
    Replies: 0
    Last Post: 03-30-2008, 09:14 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •