Results 1 to 3 of 3
  1. #1
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Problem with Scanner - using delimiters

    Hello folks

    First, this isn't a homework project and in fact is just a pet project of mine. Problem I have is as follows:

    I have a large email list which has been provided to me by a third party. The third party doesn't have any validation on their email field so the end users can input any old rubbish. The data has been supplied to me in a *.csv file. Now here's the steps:

    1. remove duplicates. Doddle, just read in the *.csv file into a HashSet.
    1. fix syntax errors - now here lies the issues.

    I have examples of emails that are as follows:

    abc@somewhere.com
    abc@somewhereelse,com

    First example is the happy path and I can deal with that. The second on the other hand is where my problem lies. I'm already using the "," as the delimiter so when populating the HashSet the second example gives me "abc@somewherelse". With the large array of main domains out there I can't see how I can get the full email into the set which I can then correct (substitute the comma with a full stop). Any ideas? Is there any way I can implement an excape of the comma building back from the domain but not on the comma at the end of each entry? Note, there may be more than one comma in each email, but I have a plan to deal with those.

    Just to be clear, It's obvious from looking at the email addresses that the second example is nothing more than a typo. There are other entries in the file that are clearly nonsense and they will be dropped.

    Any advice would be appreciated.

    Thanks

  2. #2
    masijade is offline Senior Member
    Join Date
    Jun 2008
    Posts
    2,571
    Rep Power
    9

    Default

    Google for a CSV library (there are a number of them out there) rather than using Scanner. Also, hopefully, the CSV is valid (i.e. that email containing the comma is hopefully surrounded by quotes), otherwise there is nothing you can do anyway.

  3. #3
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default

    Hi masijade

    Thanks for the reply. I looked further into the data provided and managed to get round the issue by using the CRLF instead. The person that created the files obviously didn't think to hard about the quality, which in a weird way helped me out. Anyhoo, the good news is that the code worked and out of 32k records they only have 200 with quality issues (the commas being the least of their worries and I managed to fix most of them). A change request has been issues for validation on the front end....Happy days.

Similar Threads

  1. Problem with the Scanner
    By Maretaga in forum New To Java
    Replies: 6
    Last Post: 07-14-2011, 10:14 AM
  2. Tokens, delimiters, and all that jazz
    By nisim777 in forum New To Java
    Replies: 5
    Last Post: 04-18-2011, 02:07 AM
  3. String Tokenizer, no delimiters
    By fuzzdn in forum New To Java
    Replies: 3
    Last Post: 12-30-2010, 03:56 PM
  4. parsing multiple delimiters
    By meshhat in forum New To Java
    Replies: 3
    Last Post: 04-19-2009, 01:51 AM
  5. Read file delimiters
    By GraemeH in forum New To Java
    Replies: 4
    Last Post: 03-29-2009, 12:44 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •