Results 1 to 12 of 12
Like Tree3Likes
  • 2 Post By DarrylBurke
  • 1 Post By DarrylBurke

Thread: Parse structured text file

  1. #1
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Question Parse structured text file

    I need to parse structured text file but this thing is fairly new to me (still a beginner).
    I have this text structure:
    Java Code:
    type=TEXT;value=Prvi primer nekog teksta;
    type=TEXT;value=Drugi primer nekog teksta;
    type=NUMBER;value=Prvi primer nekog teksta;
    type=NUMBER;value=1234;
    type=NUMBER;value=1234aaa;
    type=TEXT;value=1234aaa;
    type=TEXT;value=1235;
    I've successfully managed to do simple file parsing (read by line and print that reading), but now I need to check the parsed file structure and that I don't know how to do.
    I'm not asking for code examples (yet :) ) but only a guidance on how to do this (because I need to do more complex things later in my project).
    So my first question for now would be, how to check if the parsed document is in given structure or not?
    After the parsing I need to check if the values correspond to the given types (but for each type, and checking, I need to have separate class that would do the task - I'm writing this to clarify things more).
    What would I use here (Collections, generics, enums)?
    What approach do You suggest.
    Thank You.

  2. #2
    kaydell2 is offline Senior Member
    Join Date
    Dec 2012
    Posts
    106
    Rep Power
    0

    Default Re: Parse structured text file

    You could do the following for each line:

    1. use

    String[] fields = line.split(";") where line is the String variable read for each line of data in your text file

    The above line of code would use the semicolon as a delimiter. Then, you would have two String object in the array called "fields".

    2. Then for each field, you could use split("=") to split each field into subfields and you could check that "type" and "value" are there and check that "TEXT" or "NUMBER" or "value" is there in the right spot.

    You could use a regular expression, but I believe that the above method is good enough that you could avoid the complexity of learning how to write regular expressions.

    Regular Expressions in Java:
    Lesson: Regular Expressions (The Java™ Tutorials > Essential Classes)

  3. #3
    eRaaaa is offline Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default Re: Parse structured text file

    The problem in the solution of kaydeli2 is when the semicolon can also occur in the text value :-( Then you maybe have more than two string objects in the array.
    I would use the scanner class with regex (btw: split uses regular expressions too :P) for parsing.
    For validating, hmm yes, maybe a enum with regex too and a isValid(String value) method ?

  4. #4
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Question Re: Parse structured text file

    I used the kaydell2 approach, created a separate Data class and in it I'm doing the column splits.
    Also I'm using, not so complicated, regexp expression to check if there are numbers, small or capital letters in the value part of the string array, for it's validity.
    Each text line I read is a separate instance of Data class and I store each instance in a ArrayList object instance.
    It' works but what eRaaaa mentioned is interesting.
    This is not the condition but it might be handy to implement it that way since maybe it could occur.
    Well come to that latter since I have other things to implement here.
    The second part is this:
    Each line of text I need to validate by it's type (TEXT, NUMBER) but in a separate class.
    Personally, I know a simpler way of how to do this, but here I'm required to call a specific class (one for TEXT and one for NUMBER) depending of the line's type. I've tried searching online but no luck.
    Can someone tell me what approach should be used here?
    Also, those validators should be loaded dynamically in run-time, from a config file, using Resource Bundle mechanism (this is totally unfamiliar to me but maybe it clarifies things on how to create and implement validators).
    Thank You.
    Last edited by mcdhappy80; 12-24-2012 at 02:52 AM.

  5. #5
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default Re: Parse structured text file

    Sometimes it's best not to re-invent the wheel, and since your file structure looks like it may comply with a CSV standard (although using semicolons rather than commas) and thus be parse-able with a CSV parser, you may be able to use one of these tools, many of which are available for downloading with a little searching.

  6. #6
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Arrow Re: Parse structured text file

    Quote Originally Posted by Fubarable View Post
    Sometimes it's best not to re-invent the wheel, and since your file structure looks like it may comply with a CSV standard (although using semicolons rather than commas) and thus be parse-able with a CSV parser, you may be able to use one of these tools, many of which are available for downloading with a little searching.
    I agree, but this is a job application project, so I'm forced to re-invent it weather I like it or not :).
    I've found this opencsv - Frequently Asked Questions and will look into it.
    If You can recommend some other tools or libraries (it's hard to search for something when You don't know what You are searching) I'm happy to hear Your proposition.
    Thank You.

  7. #7
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Question Re: Parse structured text file - Strange dot symbol

    Parse structured text file-parsestrangesymbol.jpg
    Can someone tell me what can be the reason for me to see this symbol when I parse the text file, because when I open the text file, in text file editor, the symbol is not there?
    What can cause the invisible symbol to appear and what would be the best approach to deal with it?
    If You need to analyze my peace of code just say so and I will be happy to share it.
    Also, I've shifted from using BufferedReader to Scanner class (I've read it is newer and have some advanced feature when dealing with streams, thnx Fubarable).
    Thank You
    Last edited by mcdhappy80; 12-25-2012 at 03:37 PM. Reason: Updated info

  8. #8
    DarrylBurke's Avatar
    DarrylBurke is offline Forum Police
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,419
    Rep Power
    20

    Default Re: Parse structured text file - Strange dot symbol

    Quote Originally Posted by mcdhappy80 View Post
    Can someone tell me what can be the reason for me to see this symbol when I parse the text file
    Could be the BOM (Byte Order Mark).

    db
    Fubarable and mcdhappy80 like this.
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

  9. #9
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Lightbulb Re: Parse structured text file - Strange dot symbol

    Quote Originally Posted by DarrylBurke View Post
    Could be the BOM (Byte Order Mark).

    db
    Thank You DarrylBurke.
    After Your answer I've googled about the BOM, and here's how I've solved my problem.
    On this link I've found a class which I've incorporated in my existing code as a separate Java class.
    Here is my text parse function with scanner, updated with the new class that removes BOM:
    Java Code:
        private static void parseFunction(String pFajl)
        {
            ArrayList<Podatak> al = null; 
           
            Scanner in = null;
            FileInputStream is = null;
            UnicodeBOMInputStream ubis = null;                // BOM Class addition
     
            try
            {
                File tempFajl = new File(pFajl);
                   
                if(tempFajl.exists())
                {    
                    is = new FileInputStream(tempFajl);
                    ubis = new UnicodeBOMInputStream(is);  // BOM Class addition
                    ubis.skipBOM();                                   // BOM Class addition
                    
                    in =  new Scanner(ubis);                      // BOM Class addition - ubis parameter
    
                    //String line = in.nextLine();                  // Obsolete - doing the split inside the class
                    in.useDelimiter(";");                             // New code line
    
                    //String [] columns = null;                                           // Obsolete - doing the split inside the class
                    al = new ArrayList<>();
                    
                    while(in.hasNextLine() != false)                                   // New code line
                    //while(line!= null && !line.equals("="))
                    //while(line!= null)
                    { 
                        // columns = line.split(";");                                           // Obsolete - doing the split inside the class
                        
                        //Podatak p = new Podatak(columns[0],columns[1]);                // Obsolete - doing the split inside the class
                        Podatak p = new Podatak(in);                                       // New constructor - doing the split inside the class
                        
                        //al.add(p);                  // Still not adding instance, before I develop a validation mechanism inside the class itself
                        
                        //line = in.nextLine();        // Obsolete
                        in.nextLine();                  // New code line
                    }
       
                }
                else
                {
                    System.out.println("File\n" + tempFajl.getPath() + "\ndoesn-t exist!\nCheck file path.");
                }
                
                if(al != null) // Prevents NullPointerException when file is not loaded
                {
                    for(Podatak p : al)
                    {
                        System.out.println(p.toString());
                        //System.out.println("checkDataStructure: " + p.proveriStrukturuPodatka());
                    }
                }
            }
            catch(IOException e)
            {
                System.out.println("Error IOException:\n"
                            + e.toString() + "\n"
                            + e.getStackTrace());
            }
            catch(Exception e)
            {
                System.out.println("Error Exception:\n" 
                            + e.toString() + "\n"
                            + e.getStackTrace());
            }
            finally
            {
                try
                {
                   if(in != null) // Prevents NullPointer Exception
                   {
                        in.close(); 
                   }
                }
                catch(Exception e)
                {
                    System.out.println("Unable to close stream: " + e);
                }
            }           
        }
    Hope this code piece helps someone else in solving the similar problem.
    Last edited by mcdhappy80; 12-26-2012 at 04:22 AM. Reason: Removed bolded lines

  10. #10
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default Re: Parse structured text file

    You shouldn't try to use text mark-up, such as bold mark-up, in code blocks. It doesn't work and only detracts. Otherwise, thanks for posting your solution.

  11. #11
    mcdhappy80 is offline Member
    Join Date
    Nov 2010
    Posts
    24
    Rep Power
    0

    Question Re: Parse structured text file

    Quote Originally Posted by Fubarable View Post
    You shouldn't try to use text mark-up, such as bold mark-up, in code blocks. It doesn't work and only detracts. Otherwise, thanks for posting your solution.
    I've removed them. Is there some other way to point out certain lines in code?

    Quote Originally Posted by eRaaaa View Post
    The problem in the solution of kaydeli2 is when the semicolon can also occur in the text value :-( Then you maybe have more than two string objects in the array.
    I would use the scanner class with regex (btw: split uses regular expressions too :P) for parsing.
    For validating, hmm yes, maybe a enum with regex too and a isValid(String value) method ?
    And what approach would You use to see if the structure of string is valid?
    One thing that came across to my mind is to check if the semicolon comes up more than twice (then I would know that there is an extra semicolon in the value field), but what to do from there (I'm still learning about regexp)?
    I've used the old school mentioned method of splitting the string, putting elements into the array, and then validate each element. How would I validate the structure of the line before splitting?
    Thank You.
    ========================= MESSAGE UPDATE ==================================

    I've done some more online research and came up to these two solutions.
    One solution regards the idea of counting the number of delimiters in stream line with this function (I've found the function example on this link):
    Java Code:
    public static int countDelimiter(String pString, char delimiter)
        {
            int count = 0;
            for (int i=0; i < pString.length(); i++)
            {
                if (pString.charAt(i) == delimiter)
                {
                     count++;
                }
            }
            return count;
        }
    The second is the regular expression that matches my line in the text file. Also I've added another constructor signature that accepts Scanner parameter, and think of doing the line validation inside the class itself:
    Java Code:
    public Podatak(Scanner pIn)
        {
            System.out.println(pIn.findInLine(Pattern.compile("type=[A-Z]*;value=[a-zA-Z0-9 ]*;")));
        }
    Also, I've updated the code in previous post in order to adjust it to the new situation (I've added comments but left old lines of code. If you think they should be deleted because of distraction let me know).
    Last edited by mcdhappy80; 12-26-2012 at 04:25 AM. Reason: Updated info

  12. #12
    DarrylBurke's Avatar
    DarrylBurke is offline Forum Police
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,419
    Rep Power
    20

    Default Re: Parse structured text file

    Quote Originally Posted by mcdhappy80 View Post
    Is there some other way to point out certain lines in code?
    Java Code:
    // inline comments
    db
    mcdhappy80 likes this.
    If you're forever cleaning cobwebs, it's time to get rid of the spiders.

Similar Threads

  1. Need help to parse text files.
    By Num1701 in forum New To Java
    Replies: 0
    Last Post: 11-11-2011, 03:11 PM
  2. Reading structured content from PDF file
    By chanduk in forum Advanced Java
    Replies: 1
    Last Post: 12-09-2010, 01:03 PM
  3. Problem to return XML structured WebMethod response
    By Olegus in forum Advanced Java
    Replies: 1
    Last Post: 08-31-2010, 10:16 AM
  4. Use Scanner to parse text file, adding to HashMap
    By JordashTalon in forum New To Java
    Replies: 0
    Last Post: 03-04-2009, 11:08 PM
  5. Replies: 7
    Last Post: 05-23-2008, 03:46 AM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •