Results 1 to 12 of 12
Thread: Parse structured text file
- 12-22-2012, 02:00 AM #1
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Parse structured text file
I need to parse structured text file but this thing is fairly new to me (still a beginner).
I have this text structure:
I've successfully managed to do simple file parsing (read by line and print that reading), but now I need to check the parsed file structure and that I don't know how to do.Java Code:type=TEXT;value=Prvi primer nekog teksta; type=TEXT;value=Drugi primer nekog teksta; type=NUMBER;value=Prvi primer nekog teksta; type=NUMBER;value=1234; type=NUMBER;value=1234aaa; type=TEXT;value=1234aaa; type=TEXT;value=1235;
I'm not asking for code examples (yet :) ) but only a guidance on how to do this (because I need to do more complex things later in my project).
So my first question for now would be, how to check if the parsed document is in given structure or not?
After the parsing I need to check if the values correspond to the given types (but for each type, and checking, I need to have separate class that would do the task - I'm writing this to clarify things more).
What would I use here (Collections, generics, enums)?
What approach do You suggest.
Thank You.
- 12-22-2012, 06:15 AM #2
Member
- Join Date
- Dec 2012
- Posts
- 74
- Rep Power
- 0
Re: Parse structured text file
You could do the following for each line:
1. use
String[] fields = line.split(";") where line is the String variable read for each line of data in your text file
The above line of code would use the semicolon as a delimiter. Then, you would have two String object in the array called "fields".
2. Then for each field, you could use split("=") to split each field into subfields and you could check that "type" and "value" are there and check that "TEXT" or "NUMBER" or "value" is there in the right spot.
You could use a regular expression, but I believe that the above method is good enough that you could avoid the complexity of learning how to write regular expressions.
Regular Expressions in Java:
Lesson: Regular Expressions (The Java™ Tutorials > Essential Classes)
- 12-23-2012, 11:48 PM #3
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
Re: Parse structured text file
The problem in the solution of kaydeli2 is when the semicolon can also occur in the text value :-( Then you maybe have more than two string objects in the array.
I would use the scanner class with regex (btw: split uses regular expressions too :P) for parsing.
For validating, hmm yes, maybe a enum with regex too and a isValid(String value) method ?
- 12-24-2012, 02:47 AM #4
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Re: Parse structured text file
I used the kaydell2 approach, created a separate Data class and in it I'm doing the column splits.
Also I'm using, not so complicated, regexp expression to check if there are numbers, small or capital letters in the value part of the string array, for it's validity.
Each text line I read is a separate instance of Data class and I store each instance in a ArrayList object instance.
It' works but what eRaaaa mentioned is interesting.
This is not the condition but it might be handy to implement it that way since maybe it could occur.
Well come to that latter since I have other things to implement here.
The second part is this:
Each line of text I need to validate by it's type (TEXT, NUMBER) but in a separate class.
Personally, I know a simpler way of how to do this, but here I'm required to call a specific class (one for TEXT and one for NUMBER) depending of the line's type. I've tried searching online but no luck.
Can someone tell me what approach should be used here?
Also, those validators should be loaded dynamically in run-time, from a config file, using Resource Bundle mechanism (this is totally unfamiliar to me but maybe it clarifies things on how to create and implement validators).
Thank You.Last edited by mcdhappy80; 12-24-2012 at 02:52 AM.
-
Re: Parse structured text file
Sometimes it's best not to re-invent the wheel, and since your file structure looks like it may comply with a CSV standard (although using semicolons rather than commas) and thus be parse-able with a CSV parser, you may be able to use one of these tools, many of which are available for downloading with a little searching.
- 12-24-2012, 01:16 PM #6
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Re: Parse structured text file
I agree, but this is a job application project, so I'm forced to re-invent it weather I like it or not :).
I've found this opencsv - Frequently Asked Questions and will look into it.
If You can recommend some other tools or libraries (it's hard to search for something when You don't know what You are searching) I'm happy to hear Your proposition.
Thank You.
- 12-25-2012, 02:55 PM #7
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Re: Parse structured text file - Strange dot symbol

Can someone tell me what can be the reason for me to see this symbol when I parse the text file, because when I open the text file, in text file editor, the symbol is not there?
What can cause the invisible symbol to appear and what would be the best approach to deal with it?
If You need to analyze my peace of code just say so and I will be happy to share it.
Also, I've shifted from using BufferedReader to Scanner class (I've read it is newer and have some advanced feature when dealing with streams, thnx Fubarable).
Thank YouLast edited by mcdhappy80; 12-25-2012 at 03:37 PM. Reason: Updated info
- 12-25-2012, 08:02 PM #8
- 12-26-2012, 12:35 AM #9
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Re: Parse structured text file - Strange dot symbol
Thank You DarrylBurke.
After Your answer I've googled about the BOM, and here's how I've solved my problem.
On this link I've found a class which I've incorporated in my existing code as a separate Java class.
Here is my text parse function with scanner, updated with the new class that removes BOM:
Hope this code piece helps someone else in solving the similar problem.Java Code:private static void parseFunction(String pFajl) { ArrayList<Podatak> al = null; Scanner in = null; FileInputStream is = null; UnicodeBOMInputStream ubis = null; // BOM Class addition try { File tempFajl = new File(pFajl); if(tempFajl.exists()) { is = new FileInputStream(tempFajl); ubis = new UnicodeBOMInputStream(is); // BOM Class addition ubis.skipBOM(); // BOM Class addition in = new Scanner(ubis); // BOM Class addition - ubis parameter //String line = in.nextLine(); // Obsolete - doing the split inside the class in.useDelimiter(";"); // New code line //String [] columns = null; // Obsolete - doing the split inside the class al = new ArrayList<>(); while(in.hasNextLine() != false) // New code line //while(line!= null && !line.equals("=")) //while(line!= null) { // columns = line.split(";"); // Obsolete - doing the split inside the class //Podatak p = new Podatak(columns[0],columns[1]); // Obsolete - doing the split inside the class Podatak p = new Podatak(in); // New constructor - doing the split inside the class //al.add(p); // Still not adding instance, before I develop a validation mechanism inside the class itself //line = in.nextLine(); // Obsolete in.nextLine(); // New code line } } else { System.out.println("File\n" + tempFajl.getPath() + "\ndoesn-t exist!\nCheck file path."); } if(al != null) // Prevents NullPointerException when file is not loaded { for(Podatak p : al) { System.out.println(p.toString()); //System.out.println("checkDataStructure: " + p.proveriStrukturuPodatka()); } } } catch(IOException e) { System.out.println("Error IOException:\n" + e.toString() + "\n" + e.getStackTrace()); } catch(Exception e) { System.out.println("Error Exception:\n" + e.toString() + "\n" + e.getStackTrace()); } finally { try { if(in != null) // Prevents NullPointer Exception { in.close(); } } catch(Exception e) { System.out.println("Unable to close stream: " + e); } } }Last edited by mcdhappy80; 12-26-2012 at 04:22 AM. Reason: Removed bolded lines
-
Re: Parse structured text file
You shouldn't try to use text mark-up, such as bold mark-up, in code blocks. It doesn't work and only detracts. Otherwise, thanks for posting your solution.
- 12-26-2012, 12:44 AM #11
Member
- Join Date
- Nov 2010
- Posts
- 24
- Rep Power
- 0
Re: Parse structured text file
I've removed them. Is there some other way to point out certain lines in code?
And what approach would You use to see if the structure of string is valid?
One thing that came across to my mind is to check if the semicolon comes up more than twice (then I would know that there is an extra semicolon in the value field), but what to do from there (I'm still learning about regexp)?
I've used the old school mentioned method of splitting the string, putting elements into the array, and then validate each element. How would I validate the structure of the line before splitting?
Thank You.
========================= MESSAGE UPDATE ==================================
I've done some more online research and came up to these two solutions.
One solution regards the idea of counting the number of delimiters in stream line with this function (I've found the function example on this link):
The second is the regular expression that matches my line in the text file. Also I've added another constructor signature that accepts Scanner parameter, and think of doing the line validation inside the class itself:Java Code:public static int countDelimiter(String pString, char delimiter) { int count = 0; for (int i=0; i < pString.length(); i++) { if (pString.charAt(i) == delimiter) { count++; } } return count; }
Also, I've updated the code in previous post in order to adjust it to the new situation (I've added comments but left old lines of code. If you think they should be deleted because of distraction let me know).Java Code:public Podatak(Scanner pIn) { System.out.println(pIn.findInLine(Pattern.compile("type=[A-Z]*;value=[a-zA-Z0-9 ]*;"))); }Last edited by mcdhappy80; 12-26-2012 at 04:25 AM. Reason: Updated info
- 12-26-2012, 02:52 AM #12
Similar Threads
-
Need help to parse text files.
By Num1701 in forum New To JavaReplies: 0Last Post: 11-11-2011, 03:11 PM -
Reading structured content from PDF file
By chanduk in forum Advanced JavaReplies: 1Last Post: 12-09-2010, 01:03 PM -
Problem to return XML structured WebMethod response
By Olegus in forum Advanced JavaReplies: 1Last Post: 08-31-2010, 10:16 AM -
Use Scanner to parse text file, adding to HashMap
By JordashTalon in forum New To JavaReplies: 0Last Post: 03-04-2009, 11:08 PM -
How to parse the CSV(Comma separation values)file and validate the file using java
By padmajap13 in forum Advanced JavaReplies: 7Last Post: 05-23-2008, 03:46 AM


3Likes
LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks