Results 1 to 11 of 11
- 04-12-2011, 04:32 PM #1
Senior Member
- Join Date
- May 2010
- Posts
- 113
- Rep Power
- 0
how to parse an input files that have different types of delimiters
Hi
Any one pls help me with ,
How to parse an input file with different types of delimiters (like for example tab delimiter , comma delimiter , tilda delimiter ,caret delimiter etc)
I get an input file which will contain different delimiter (and i dont know which delimiter that is )
How to code this in java .
Java Code:// FIRST STEP : Here I open the input file and read in record by record Scanner in = new Scanner(readin); while (in.hasNextLine()){ String input = in.nextLine(); //how should i handle the delimiter and get the data ????
examples of input files :-
file 1 :-
"2000,2020,100,300"
in this file ---record 1 to n --- we see comma as delimiter and double quotes which i should take care to get the data and also " should be taken care .
column[1] = 2000
column[2] = 2020
column[3] = 100
column[4] = 300
I am getting the data in array.
file 2 :-
2000 2020 100 300
in this file --- record 1 to n -- we see tab as delimiter . How to take care to get the data in the array
file 3 :-
2000~2020~100~300
file 4 :-
2000=2020=100=300
file5 :-
2000|'2020'|100|300
in this file | is the delimiter and also i should take care to omit ' and just get the data 2020 into column[2]
Pls help me in handling diiferent input files with plain java for different delimiters .
- 04-12-2011, 04:44 PM #2
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
There must be some other rule about the data you are reading in for you to determine the delimiter.
For example, is the data allowed to have spaces in it?
eg
Some Data|Some Other Data
Is it all numbers?
100,200,300
What?
Because at the moment what you are trying to do would rank as impossible without some additional rules about the format of the file.
- 04-12-2011, 04:50 PM #3
Senior Member
- Join Date
- May 2010
- Posts
- 113
- Rep Power
- 0
Sir
the file with tab delimiter looks like this
4318 4318 11 11
4318 4318 14 14
4318 4318 200 200
the file with delimiter , and i should also take care of omiting " in front and back of the data
"45010,45010,100,3100"
"45020,45020,100,3100"
the data should always be numbers ...otherwise throw error .
and i should get 4 numbers
from the above file example
for record1 ..i should get
column[1] = 45010
column[2] = 45010
column[3] = 100
column[4] = 3100
did i answer your question Sir ..
I should handle any possible delimiter the input file can have ..in java coding to get the data
- 04-12-2011, 04:51 PM #4
Member
- Join Date
- Apr 2011
- Posts
- 3
- Rep Power
- 0
.split is your friend
- 04-12-2011, 04:54 PM #5
Senior Member
- Join Date
- May 2010
- Posts
- 113
- Rep Power
- 0
I have tried using split and tried ...Java Code:// FIRST STEP : Here I open the input file and read in record by record Scanner in = new Scanner(readin); while (in.hasNextLine()){ String input = in.nextLine(); //If there are any double or single quotes in the data, please remove them before using. //If there are any additional lines without a valid data number values, they should not be considered. input = input.replaceAll("\"+",""); input = input.replaceAll("\'+",""); if(input.length()== 0) { }else{ input = input.trim(); String delims = "[ .,?!\t]+"; String[] column = input.split(delims);
but it throws me an exception
2) Here i ran using a tab delimiter input file .
The error i get is :-
Exception in thread "main" java.lang.NumberFormatException: For input string: "4318 4318 11 11"
at java.lang.NumberFormatException.forInputString(Unk nown Source)
at java.lang.Integer.parseInt(Unknown Source)
at java.lang.Integer.parseInt(Unknown Source)
at MainClass.main(MainClass.java:253)
- 04-12-2011, 04:56 PM #6
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
There are many ways...
Few ideas:
#1
#2Java Code:while (in.hasNextLine()) { Matcher m = Pattern.compile("\\d+").matcher(in.nextLine()); while(m.find()){ System.out.print(m.group()+" "); } System.out.println();}
#3Java Code:String[] column = in.nextLine().replaceAll("(\\d+).{1}", "$1|").split("\\|"); System.out.println(Arrays.toString(column));
use of nested Scanner objects :)
#4
....next :)
- 04-12-2011, 04:57 PM #7
Moderator
- Join Date
- Apr 2009
- Posts
- 10,438
- Rep Power
- 16
It is your friend, but even split will have problems with some of this.
The quotes for starters (though they could be stripped after the event).
But the startpoint is probably a split() regex based on all the possible delimiters.
Identify each of the 4 numbers, stripping out quotes as necessary.
Then shoot whoever decided this was a good idea. Unless this is an exercise I suppose.
- 04-12-2011, 05:04 PM #8
Member
- Join Date
- Apr 2011
- Posts
- 3
- Rep Power
- 0
If the file is complicated you should learn and use regex, its really useful for a lot of thing.
- 04-12-2011, 05:11 PM #9
Senior Member
- Join Date
- May 2010
- Posts
- 113
- Rep Power
- 0
- 04-12-2011, 05:53 PM #10
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
#1
Pattern (Java Platform SE 6)
\\d = A digit: [0-9]
+ = one or more times
Matcher (Java Platform SE 6)
find():
Attempts to find the next subsequence of the input sequence that matches the pattern
group()
Returns the input subsequence matched by the previous match.
#2
String (Java Platform SE 6)
replaceAll("(\\d+).{1}", "$1|") - will replace each number+any character(your delimeter) with the number and a special character (here | you can use any other character if you want :D)
as an example 200~ is replaced by 200|
after that the string is splitting at |
on your example with file1
2000,2020,100,300
-->
2000|2020|100|300.split(\\|)
-->
column[0] = 2000
column[1] = 2020
column[2] = 100
column[3] = 300
Is it working at all? ;)
- 04-12-2011, 06:00 PM #11
Senior Member
- Join Date
- May 2010
- Posts
- 113
- Rep Power
- 0
Thank You Very Much Sir .
I have used your code and put it in a function and called it
public static String getDelimiter(String str) {
Pattern p = Pattern.compile("([^A-Za-z0-9])");
Matcher m = p.matcher(str.trim());
//remove whitespace as first char(s)
if(m.find())
return m.group(0);
else
return null;
}
And it is working .
Thank You Very Much ..Your answer was of great help and Your explanation ..meant a lot for me .
Thanks Again .
Sir ,pls look at the below code and the getDelimiter method ,
Pls tell me where should i write an exception statement ..when i dont find a delimiter .
How to catch java errors ..and write my own exception for it ????
Java Code:while (in.hasNextLine()){ String input = in.nextLine(); //If there are any double or single quotes in the ccln data, please remove them before using. //If there are any additional lines without a valid class code or line number, they should not be considered. input = input.replaceAll("\"+",""); input = input.replaceAll("\'+",""); if(input.length()== 0) { }else{ input = input.trim(); String[] column = input.split(getDelimiter(input)); // Question????Last edited by renu; 04-12-2011 at 06:16 PM. Reason: How to throw exception , when delimiter not found.
Similar Threads
-
Creating Jar Files with functioning input files
By appleLove in forum NetBeansReplies: 1Last Post: 04-10-2011, 10:37 PM -
How to create a zip out file from the current directory of input files
By renu in forum New To JavaReplies: 5Last Post: 11-24-2010, 10:57 AM -
Help with reading in a certain types of files
By ShinTec in forum Advanced JavaReplies: 2Last Post: 04-27-2010, 11:09 AM -
how do I Parse Enumerated types?
By gcampton in forum New To JavaReplies: 5Last Post: 10-12-2009, 10:41 AM -
dynamically search user input files
By Juuno in forum Advanced JavaReplies: 2Last Post: 04-29-2009, 04:51 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks