Results 1 to 11 of 11
  1. #1
    renu is offline Senior Member
    Join Date
    May 2010
    Posts
    117
    Rep Power
    0

    Question how to parse an input files that have different types of delimiters

    Hi

    Any one pls help me with ,

    How to parse an input file with different types of delimiters (like for example tab delimiter , comma delimiter , tilda delimiter ,caret delimiter etc)

    I get an input file which will contain different delimiter (and i dont know which delimiter that is )

    How to code this in java .

    Java Code:
    //				FIRST STEP	: Here I open the input file and read in record by record 
    				Scanner in = new Scanner(readin);
    				while (in.hasNextLine()){
    					String input = in.nextLine();
    
    //how should i handle the delimiter and get the data  ????

    examples of input files :-

    file 1 :-

    "2000,2020,100,300"

    in this file ---record 1 to n --- we see comma as delimiter and double quotes which i should take care to get the data and also " should be taken care .
    column[1] = 2000
    column[2] = 2020
    column[3] = 100
    column[4] = 300
    I am getting the data in array.

    file 2 :-
    2000 2020 100 300

    in this file --- record 1 to n -- we see tab as delimiter . How to take care to get the data in the array

    file 3 :-
    2000~2020~100~300

    file 4 :-
    2000=2020=100=300

    file5 :-
    2000|'2020'|100|300

    in this file | is the delimiter and also i should take care to omit ' and just get the data 2020 into column[2]

    Pls help me in handling diiferent input files with plain java for different delimiters .

  2. #2
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,802
    Rep Power
    19

    Default

    There must be some other rule about the data you are reading in for you to determine the delimiter.

    For example, is the data allowed to have spaces in it?
    eg
    Some Data|Some Other Data

    Is it all numbers?
    100,200,300

    What?

    Because at the moment what you are trying to do would rank as impossible without some additional rules about the format of the file.

  3. #3
    renu is offline Senior Member
    Join Date
    May 2010
    Posts
    117
    Rep Power
    0

    Default

    Quote Originally Posted by Tolls View Post
    There must be some other rule about the data you are reading in for you to determine the delimiter.

    For example, is the data allowed to have spaces in it?
    eg
    Some Data|Some Other Data

    Is it all numbers?
    100,200,300

    What?

    Because at the moment what you are trying to do would rank as impossible without some additional rules about the format of the file.
    Sir

    the file with tab delimiter looks like this

    4318 4318 11 11
    4318 4318 14 14
    4318 4318 200 200

    the file with delimiter , and i should also take care of omiting " in front and back of the data

    "45010,45010,100,3100"
    "45020,45020,100,3100"

    the data should always be numbers ...otherwise throw error .
    and i should get 4 numbers
    from the above file example
    for record1 ..i should get

    column[1] = 45010
    column[2] = 45010
    column[3] = 100
    column[4] = 3100

    did i answer your question Sir ..
    I should handle any possible delimiter the input file can have ..in java coding to get the data

  4. #4
    Akynz is offline Member
    Join Date
    Apr 2011
    Posts
    3
    Rep Power
    0

    Default

    .split is your friend

  5. #5
    renu is offline Senior Member
    Join Date
    May 2010
    Posts
    117
    Rep Power
    0

    Angry

    Java Code:
    //				FIRST STEP	: Here I open the input file and read in record by record 
    				Scanner in = new Scanner(readin);
    				while (in.hasNextLine()){
    					String input = in.nextLine();
    					//If there are any double or single quotes in the  data, please remove them before using.
    					//If there are any additional lines without a valid data number values, they should not be considered.
    					input = input.replaceAll("\"+","");
    					input = input.replaceAll("\'+","");
    					if(input.length()== 0)
    					{
    					}else{				
    					input =  input.trim();	
    					String delims = "[ .,?!\t]+";
    					
    					String[] column = input.split(delims);
    I have tried using split and tried ...
    but it throws me an exception

    2) Here i ran using a tab delimiter input file .

    The error i get is :-

    Exception in thread "main" java.lang.NumberFormatException: For input string: "4318 4318 11 11"
    at java.lang.NumberFormatException.forInputString(Unk nown Source)
    at java.lang.Integer.parseInt(Unknown Source)
    at java.lang.Integer.parseInt(Unknown Source)
    at MainClass.main(MainClass.java:253)

  6. #6
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    5

    Default

    There are many ways...

    Few ideas:
    #1
    Java Code:
    		while (in.hasNextLine()) {
    			 Matcher m = Pattern.compile("\\d+").matcher(in.nextLine());
    			 while(m.find()){
    			 System.out.print(m.group()+" ");
    			 }
    			 System.out.println();}
    #2
    Java Code:
    			String[] column = in.nextLine().replaceAll("(\\d+).{1}", "$1|").split("\\|");
    			System.out.println(Arrays.toString(column));
    #3
    use of nested Scanner objects :)

    #4
    ....next :)

  7. #7
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,802
    Rep Power
    19

    Default

    It is your friend, but even split will have problems with some of this.
    The quotes for starters (though they could be stripped after the event).

    But the startpoint is probably a split() regex based on all the possible delimiters.
    Identify each of the 4 numbers, stripping out quotes as necessary.

    Then shoot whoever decided this was a good idea. Unless this is an exercise I suppose.

  8. #8
    Akynz is offline Member
    Join Date
    Apr 2011
    Posts
    3
    Rep Power
    0

    Default

    If the file is complicated you should learn and use regex, its really useful for a lot of thing.

  9. #9
    renu is offline Senior Member
    Join Date
    May 2010
    Posts
    117
    Rep Power
    0

    Default

    Quote Originally Posted by eRaaaa View Post
    There are many ways...

    Few ideas:
    #1
    Java Code:
    		while (in.hasNextLine()) {
    			 Matcher m = Pattern.compile("\\d+").matcher(in.nextLine());
    			 while(m.find()){
    			 System.out.print(m.group()+" ");
    			 }
    			 System.out.println();}
    #2
    Java Code:
    			String[] column = in.nextLine().replaceAll("(\\d+).{1}", "$1|").split("\\|");
    			System.out.println(Arrays.toString(column));
    #3
    use of nested Scanner objects :)

    #4
    ....next :)
    Sir eRaaaa

    Can you please explain me code 1 and code 2 .

    Thank You.

  10. #10
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    5

    Default

    #1
    Pattern (Java Platform SE 6)

    \\d = A digit: [0-9]
    + = one or more times

    Matcher (Java Platform SE 6)

    find():
    Attempts to find the next subsequence of the input sequence that matches the pattern

    group()
    Returns the input subsequence matched by the previous match.

    #2
    String (Java Platform SE 6)

    replaceAll("(\\d+).{1}", "$1|") - will replace each number+any character(your delimeter) with the number and a special character (here | you can use any other character if you want :D)
    as an example 200~ is replaced by 200|
    after that the string is splitting at |
    on your example with file1
    2000,2020,100,300
    -->
    2000|2020|100|300.split(\\|)
    -->
    column[0] = 2000
    column[1] = 2020
    column[2] = 100
    column[3] = 300


    Is it working at all? ;)

  11. #11
    renu is offline Senior Member
    Join Date
    May 2010
    Posts
    117
    Rep Power
    0

    Question

    Quote Originally Posted by eRaaaa View Post
    #1
    Pattern (Java Platform SE 6)

    \\d = A digit: [0-9]
    + = one or more times

    Matcher (Java Platform SE 6)

    find():
    Attempts to find the next subsequence of the input sequence that matches the pattern

    group()
    Returns the input subsequence matched by the previous match.

    #2
    String (Java Platform SE 6)

    replaceAll("(\\d+).{1}", "$1|") - will replace each number+any character(your delimeter) with the number and a special character (here | you can use any other character if you want :D)
    as an example 200~ is replaced by 200|
    after that the string is splitting at |
    on your example with file1
    2000,2020,100,300
    -->
    2000|2020|100|300.split(\\|)
    -->
    column[0] = 2000
    column[1] = 2020
    column[2] = 100
    column[3] = 300


    Is it working at all? ;)
    Thank You Very Much Sir .

    I have used your code and put it in a function and called it

    public static String getDelimiter(String str) {
    Pattern p = Pattern.compile("([^A-Za-z0-9])");
    Matcher m = p.matcher(str.trim());
    //remove whitespace as first char(s)
    if(m.find())
    return m.group(0);
    else
    return null;
    }

    And it is working .

    Thank You Very Much ..Your answer was of great help and Your explanation ..meant a lot for me .

    Thanks Again .


    Sir ,pls look at the below code and the getDelimiter method ,

    Pls tell me where should i write an exception statement ..when i dont find a delimiter .
    How to catch java errors ..and write my own exception for it ????


    Java Code:
    while (in.hasNextLine()){
    					String input = in.nextLine();					
    					//If there are any double or single quotes in the ccln data, please remove them before using.
    					//If there are any additional lines without a valid class code or line number, they should not be considered.
    					input = input.replaceAll("\"+","");
    					input = input.replaceAll("\'+","");
    					if(input.length()== 0)
    					{
    					}else{				
    					input =  input.trim();	
    					String[] column = input.split(getDelimiter(input)); 
    					// Question????
    Last edited by renu; 04-12-2011 at 06:16 PM. Reason: How to throw exception , when delimiter not found.

Similar Threads

  1. Creating Jar Files with functioning input files
    By appleLove in forum NetBeans
    Replies: 1
    Last Post: 04-10-2011, 10:37 PM
  2. Replies: 5
    Last Post: 11-24-2010, 10:57 AM
  3. Help with reading in a certain types of files
    By ShinTec in forum Advanced Java
    Replies: 2
    Last Post: 04-27-2010, 11:09 AM
  4. how do I Parse Enumerated types?
    By gcampton in forum New To Java
    Replies: 5
    Last Post: 10-12-2009, 10:41 AM
  5. dynamically search user input files
    By Juuno in forum Advanced Java
    Replies: 2
    Last Post: 04-29-2009, 04:51 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •