Results 1 to 9 of 9
Like Tree1Likes
  • 1 Post By Tolls

Thread: How to search an XML file for a string?

  1. #1
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default How to search an XML file for a string?

    Hi folks

    Ok, so not a homework question this time. I have an XML file, below is a snippit of what that looks like:

    Java Code:
    	<JobProfile>
    		<JobProfileId>188</JobProfileId>
    		<Title><![CDATA[Software Engineer or Programmer]]></Title>
    		<Description><![CDATA[Computer software engineer, Software programmer, Software developer.]]></Description>
    		<Summary><![CDATA[Software engineers or programmers design and develop the programs which tell computers to perform specified functions, such as controlling complex manufacturing processes or analysing financial information.]]></Summary>
    		<TheWork><![CDATA[You could be:working from specifications drawn up by a computer systems analyst or business analyst.]]></TheWork>
    		<Conditions><![CDATA[You would work normal hours from an office. You might have to work evenings or weekends to meet deadlines.]]></Conditions>
    		<GettingIn><![CDATA[Most entrants have a Higher National Certificate (HNC), Higher National Diploma (HND) or degree in a subject such as computer science.]]></GettingIn>
    		<WhatDoesItTake><![CDATA[You need to have: a sound knowledge of information technology applicationsan analytical, logical and methodical approachgood problem solving skills a high level of patience.]]></WhatDoesItTake>
    		<Training><![CDATA[Training is usually on the job.]]></Training>
    		<GettingOn><![CDATA[With experience, you might gain promotion to team leader, project manager or senior developer.]]></GettingOn>
    		<Pay><![CDATA[The figures below are only a guide. Actual pay rates may vary, depending on:where you work.]]></Pay>
    		<MoreInfo><![CDATA[If you are considering a career in IT why not take a look at the <a class="blueLink" href="http://www.java-forums.org"> Java Fourms )]]></MoreInfo>
    		<Level1>True</Level1>
    		<Level2>True</Level2>
    		<Level3>False</Level3>
    		<Level4>False</Level4>
    		<CareerSectors>
    			<CareerSector CSID="41" SectorName="Computing and ICT" CAID="8" HBCode="8A"/>
    		</CareerSectors>
    		<SocCodes>
    			<SocCode Code="2136"/>
    		</SocCodes>
    	</JobProfile>
    This has been severly snipped, and not because of confidentiality but because the original is unweildly. The total character count is 5324677, and the total word count (including the tags) is 595087. This is important to my quesiton.

    Notice in the <MoreInfo> tag there is an href. I need to be able to find all instances of href in the full file and report back. I'm comfortable with finding the href's using regex, and will probably put the results in a list, something like Title | Software Engineer or Programmer | href | Java Programming Forum - The Front Page, which I'm also confortable doing.

    What I need help with is how to read in the XML and search it in a efficient way. I've seen examples of people using DocumentBuilderFactory to parse the file then using String xml = nodeToString(xmlDoc.getDocumentElement()); to convert the input file to a string, which I could then perform the search against. Given the amount of content in the file, is this the best way to do this, considering the content of the file will grow? If so, can anyone point me to a good tutorial on using DOM sources?

    Many thanks

  2. #2
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,642
    Rep Power
    7

    Default Re: How to search an XML file for a string?

    If you are not using the DocumentBuilder parser for anything other than reading the document, you might just wish to use simple File IO reading to read the file in. If the file size is unwieldy, you don't necessarily have to read it into memory, just parse as you read using something such as a BufferedReader.
    Lesson: Basic I/O (The Java™ Tutorials > Essential Classes)

  3. #3
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Quote Originally Posted by doWhile View Post
    If you are not using the DocumentBuilder parser for anything other than reading the document, you might just wish to use simple File IO reading to read the file in. If the file size is unwieldy, you don't necessarily have to read it into memory, just parse as you read using something such as a BufferedReader.
    Lesson: Basic I/O (The Java™ Tutorials > Essential Classes)
    Hi again doWhile. I've use the I/O classes before in another project for an email cleaner. I wasn't sure if that was the right approach or not but going with your advice above I'll continue down that path. If I get it working (still a noobie remember) I'll post back.

    Thanks again

  4. #4
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Ok, bumping this as I've made some progress but came unstuck. Here is the code thusfar:

    Java Code:
    /*
     * To change this template, choose Tools | Templates
     * and open the template in the editor.
     */
    
    package xmlparser;
    
    /**
     *
     * @author james
     */
    
    import java.io.File;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import org.w3c.dom.CharacterData;
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    
    
    public class findLink
    {
        private File file;
    
        public findLink()
        {
          file = new File("JobProfiles.xml");
        }
    
        public void link()
        {
    
    try
        {
        
        DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
        Document doc = db.parse(file);
        doc.getDocumentElement().normalize();
        NodeList nodes = doc.getElementsByTagName("JobProfile");
    
        for (int i = 0; i < nodes.getLength(); i++) {
          Element element = (Element) nodes.item(i);
    
          NodeList name = element.getElementsByTagName("JobProfileId");
          Element line = (Element) name.item(0);
          System.out.println("JobProfileId: " + getCharFromElement(line));
    
          NodeList title = element.getElementsByTagName("Title");
          line = (Element) title.item(0);
          System.out.println("Title: " + getCharFromElement(line));
    
          NodeList description = element.getElementsByTagName("Description");
          line = (Element) description.item(0);
          System.out.println("Description: " + getCharFromElement(line));
    
          NodeList summary = element.getElementsByTagName("Summary");
          line = (Element) summary.item(0);
          System.out.println("Summary: " + getCharFromElement(line));
    
          NodeList theWork = element.getElementsByTagName("TheWork");
          line = (Element) theWork.item(0);
          System.out.println("TheWork: " + getCharFromElement(line));
    
          NodeList conditions = element.getElementsByTagName("Conditions");
          line = (Element) conditions.item(0);
          System.out.println("Conditions: " + getCharFromElement(line));
    
          NodeList gettingIn = element.getElementsByTagName("GettingIn");
          line = (Element) gettingIn.item(0);
          System.out.println("GettingIn: " + getCharFromElement(line));
    
          NodeList whatDoesItTake = element.getElementsByTagName("WhatDoesItTake");
          line = (Element)whatDoesItTake.item(0);
          System.out.println("WhatDoesItTake: " + getCharFromElement(line));
    
          NodeList training = element.getElementsByTagName("Training");
          line = (Element) training.item(0);
          System.out.println("Training: " + getCharFromElement(line));
    
          NodeList gettingOn = element.getElementsByTagName("GettingOn");
          line = (Element) gettingOn.item(0);
          System.out.println("GettingOn: " + getCharFromElement(line));
    
          NodeList pay = element.getElementsByTagName("Pay");
          line = (Element) pay.item(0);
          System.out.println("Pay: " + getCharFromElement(line));
    
          NodeList moreInfo = element.getElementsByTagName("MoreInfo");
          line = (Element) moreInfo.item(0);
          System.out.println("MoreInfo: " + getCharFromElement(line));
    
          NodeList level1 = element.getElementsByTagName("Level1");
          line = (Element) level1.item(0);
          System.out.println("Level1: " + getCharFromElement(line));
    
          NodeList level2 = element.getElementsByTagName("Level2");
          line = (Element) level2.item(0);
          System.out.println("Level2: " + getCharFromElement(line));
    
          NodeList level3 = element.getElementsByTagName("Level3");
          line = (Element) level3.item(0);
          System.out.println("Level3: " + getCharFromElement(line));
    
          NodeList level4 = element.getElementsByTagName("Level4");
          line = (Element) level4.item(0);
          System.out.println("Level4: " + getCharFromElement(line));
    
          NodeList careersSectors = element.getElementsByTagName("CareerSectors");
          line = (Element) title.item(0);
          System.out.println("CareerSectors: " + getCharFromElement(line));
    
          NodeList socCode = element.getElementsByTagName("SocCodes");
          line = (Element) socCode.item(0);
          System.out.println("SocCodes: " + getCharFromElement(line));
        }
          }
        catch (Exception e)
        {
            System.out.println(e);
        }
    
        }
       public static String getCharFromElement(Element e)
      {
        Node child = e.getFirstChild();
        if (child instanceof CharacterData)
        {
          CharacterData cd = (CharacterData) child;
          return cd.getData();
        }
        return "";
      }
    }
    The main class:-

    Java Code:
    /*
     * To change this template, choose Tools | Templates
     * and open the template in the editor.
     */
    
    package xmlparser;
    
    /**
     *
     * @author james
     */
    
    
        /**
         * @param args the command line arguments
         */
    
    public class Main
    {
      public static void main(String arg[]) throws Exception
      {
           new findLink();
      }
    }
    Which is simple at the moment. Problem is that the class findLink isn't printing anything out but I get no errors or exceptions. Any pointers where I'm going wrong?

    Thanks in advance.

    P.S the file JobProfiles.xml is in the correct location. I had this all working inside the main method but felt this wasn't the most elegant way to go and handed it off to the findLink class. My reason for this thinking is that I will most likely apply new classes that will perform different searches.
    Last edited by jazzermonty; 03-20-2012 at 11:01 PM. Reason: Additional relevant informaiton

  5. #5
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Appologies, got it. Man I really need to do this stuff when I'm not tired. Anyway, for those interestes amendements to the main method are as follows:

    Java Code:
    /*
     * To change this template, choose Tools | Templates
     * and open the template in the editor.
     */
    
    package xmlparser;
    
    /**
     *
     * @author james
     */
    
    
        /**
         * @param args the command line arguments
         */
    
    public class Main
    {
      public static void main(String arg[]) throws Exception
      {
            findLink fl = new findLink();
            fl.link();
    
      }
    
     
    }
    So yes, I wasn't instructing my new object findLink() what methods to execute. Doh! Will post back again when more progress in made (assuming this is within the rules).

    Cheers

  6. #6
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Hi. Ok, this time it's a genuine issue that I'd appreciate some help with. I'm focusing in on the <MoreInfo> node and I've added the following codeblock:

    Java Code:
     match = href.matcher(line.toString());
          if (match.find());
            {
              System.out.println(jobTitle + " : " + match.group());
            }
    which seems to find the regex but I get the error java.lang.IllegalStateException: No match found. If I remove the match.group() part of the output it seems to work. My objective is to return the value of the character sequence that matches the regex but I can't figure out why it won't work. Any pointers/

    Thanks

  7. #7
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Hello again

    Really hoping someone is viewing this forum. I have a new issue that I'd like to discuss. Ok, In my XML I have a node with an attribute like this:
    Java Code:
    <SocCodes>
    			<SocCode Code="3416"/>
    		</SocCodes>
    But equally valid is this:

    Java Code:
    <SocCodes>
    </SocCodes>
    And in my code I've written this:
    Java Code:
     NodeList socCode = element.getElementsByTagName("SocCode");
          line = (Element) socCode.item(0);
          code = socCode.item(0).getAttributes().getNamedItem("Code").toString();
          if (socCode == null)
                  {
                     break;
                  }
           else
                  {
                     System.out.println(code + " : " + jobTitle);
                  }
    Which I was using to try and handle the fact that <SocCode> can genuinely be missing. I want to be able to report back on this fact as all the job profiles should indeed have a SOC code attached. In the code above, I was trying to skip over the null, which I can then reverse to report back when the <SocCode> is missing.

    Any ideas?

    Thanks

  8. #8
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,755
    Rep Power
    19

    Default Re: How to search an XML file for a string?

    If there are no nodes in the NodeList then 'line' in line 2 will be null, and you'll get an exception on line 3 (NullPointer) as socCode.item(0) will be null.
    So you need to check for null before doing any of that stuff with attributes.
    jazzermonty likes this.
    Please do not ask for code as refusal often offends.

    ** This space for rent **

  9. #9
    jazzermonty is offline Member
    Join Date
    Jan 2011
    Posts
    71
    Rep Power
    0

    Default Re: How to search an XML file for a string?

    Quote Originally Posted by Tolls View Post
    If there are no nodes in the NodeList then 'line' in line 2 will be null, and you'll get an exception on line 3 (NullPointer) as socCode.item(0) will be null.
    So you need to check for null before doing any of that stuff with attributes.
    Ah, Tolls, your a star. Fixed code below:

    Java Code:
    NodeList socCode = element.getElementsByTagName("SocCode");
          line = (Element) socCode.item(0);
            if (line!=null)
                {
                    code = socCode.item(0).getAttributes().getNamedItem("Code").toString();
                    System.out.println(code);
                }
            else
                {
                    System.out.println(jobTitle);
                }

Similar Threads

  1. Replies: 3
    Last Post: 01-12-2012, 10:33 AM
  2. String search
    By tnrh1 in forum New To Java
    Replies: 11
    Last Post: 12-18-2011, 11:27 AM
  3. Search Substring in String Help Please
    By Kestrel01 in forum New To Java
    Replies: 3
    Last Post: 10-26-2010, 06:48 PM
  4. Using arguments to search for a string
    By MZA in forum New To Java
    Replies: 2
    Last Post: 02-03-2010, 09:22 AM
  5. Replies: 0
    Last Post: 11-20-2007, 04:59 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •