Results 1 to 10 of 10
  1. #1
    Kenjitsuka is offline Member
    Join Date
    Oct 2010
    Posts
    18
    Rep Power
    0

    Default Solved: Extracting values from textfile with semi-random content

    Hello everyone,

    I've programmed in JAVA 8 years ago a bit, when I was in college.
    But I was not all that good at it. Despite that I am trying to help a friend with a problem with a program, in Java.

    I've managed to scrounge, customize and combine code for most of the 9 subtasks it needs to do. But I can't find anything at all on the task I am about to ask you for some pointers.

    The program reads a directory for zip files, extracts them and then reads the contents of an xml file that was extracted. That I've already handled.

    But, I must locate and extract 23 values a different program has dumped in the file. Those values will then be stored in my database. Making the database connection and inputting and reading values is also something I've fixed.

    ==================

    The problem is the xml files. They are semi-random and the values I need are of random lenght. Basically after a random number of lines of stuff I don't need there comes lines like this:
    Java Code:
    <field id="MRZ1" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;" valuetype="unicode" checksum="NO"/>
                <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_TYPE" fieldvalue="P&lt;" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
    I need the characters between 'fieldvalue=" ' and the next ".
    What I can't figure out is how to tell Java to start reading characters starting from the "-symbol after the 'fieldvalue' text.
    Each line starts with a different value for field id, which makes it impossible to tell the program to start extracting from a fixed number of characters.

    I have been thinking about it, but all the randomness is giving me a headache...
    I have googled extensively, but all I can come up with is tons of pages telling people how to generate random text.

    For your convenience below is the entire content of one of the xml files.
    Thanks so much in advance!
    Kenji.

    Java Code:
    <?xml version="1.0" ?>
    <!-- Passport Reader Document File -->
    <root>
       <reader device="PRMC233R106491" software="2.1.2.4"/>
       <pagelist>
          <imagelist>
             <image file="image1.jpg" captime="2010-08-16 17:17:58.687" camera="514" light="INFRA" page="0"/>
             <image file="image2.jpg" captime="2010-08-16 17:18:00.093" camera="259" light="UV" page="0"/>
             <image file="image3.jpg" captime="2010-08-16 17:17:59.187" camera="257" light="WHITE" page="0"/>
          </imagelist>
       </pagelist>
       <tasklist>
          <document task="RESOLVERFIDDATA">
             <rfidfilelist>
                <rfidfile id="EF_COM" rfidfiledata="60155F0104303130375F36063034303030305C0361756F" datatype="binary"/>
             </rfidfilelist>
             <fieldlist>
                <field id="RFID_DOCUMENT_DESCRIPTOR" fieldvalue="107" rfiddir="-2147450873"/>
             </fieldlist>
          </document>
          <document type="ICAO standard Passport (MRP)" task="RESOLVEMRZTEXT">
             <fieldlist>
                <field id="MRZ1" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;" valuetype="unicode" checksum="NO"/>
                <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_TYPE" fieldvalue="P&lt;" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_NAME" fieldvalue="KARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_DOCUMENT_NUMBER" fieldvalue="HU12345600" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/>
                <field id="MRZ_BIRTH_DATE" fieldvalue="9202287" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_EXPIRY_DATE" fieldvalue="1501010" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_PERSONAL_DATA" fieldvalue="123456782&lt;&lt;&lt;&lt;&lt;0" valuetype="unicode" checksum="OK"/>
                <field id="TYPE" fieldvalue="P" valuetype="unicode" checksum="NO"/>
                <field id="ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="NAME" fieldvalue="KARPATI VIKTORIA" valuetype="unicode" checksum="NO"/>
                <field id="SURNAME" fieldvalue="KARPATI" valuetype="unicode" checksum="NO"/>
                <field id="GIVENNAME" fieldvalue="VIKTORIA" valuetype="unicode" checksum="NO"/>
                <field id="DOCUMENT_NUMBER" fieldvalue="HU1234560" valuetype="unicode" checksum="OK"/>
                <field id="NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/>
                <field id="BIRTH_DATE" fieldvalue="920228" valuetype="unicode" checksum="OK"/>
                <field id="EXPIRY_DATE" fieldvalue="150101" valuetype="unicode" checksum="OK"/>
                <field id="PERSONAL_DATA" fieldvalue="123456782" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_FIELDS" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&#x0A;HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04" valuetype="unicode" checksum="NO"/>
             </fieldlist>
          </document>
          <document task="RESOLVERFIDDATA">
             <rfidfilelist>
                <rfidfile id="EF_SOD" rfidfiledata="vxgen0001.bin" datatype="file"/>
             </rfidfilelist>
             <fieldlist/>
          </document>
          <document task="RESOLVERFIDDATA">
             <rfidfilelist>
                <rfidfile id="EF_DG15" rfidfiledata="6F81C23081BF300D06092A864886F70D01010105000381AD003081A90281A100B3246E1C134BD3DC23BF4E4FBCC520C9B2A06A3EB6D5B8DBCC3ADCF786739DE050E366CE83F7BAF1B594F9349EBB10E70A80D75EDEBBF5CC465EE3B76236A40B828BBBE1B2787AFDFEA00090D008957ABCC3A1C03F4F9637EEE76502361012F0425F6B37007B7EBD8E5FFE9D0B8C09E269913E60E9B675181396960FCF23640D3014C793052E7A9822D172BE430EC3A44A5EA907D9E2DCEEC1F94F5420B0D82B0203010001" datatype="binary"/>
             </rfidfilelist>
             <fieldlist/>
          </document>
          <document task="RESOLVERFIDDATA">
             <rfidfilelist>
                <rfidfile id="EF_DG2" rfidfiledata="vxgen0002.bin" datatype="file"/>
             </rfidfilelist>
             <fieldlist>
                <field id="RFID_FACE" fieldvalue="vxgen0003.jp2" valuetype="file" image="vxgen0003.jp2"/>
             </fieldlist>
          </document>
          <document type="ICAO standard Passport (MRP)" windowframe="4146,8430,129146,8430,129146,95434,4146,95434" task="RECOGNIZE">
             <fieldlist>
                <field id="MRZ1" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;" valuetype="unicode" checksum="NO" windowframe="10189,78318,122123,78247,122123,80754,10197,80833"/>
                <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04" valuetype="unicode" checksum="OK" windowframe="10101,84702,122079,84630,122079,87507,10101,87578"/>
                <field id="MRZ_TYPE" fieldvalue="P&lt;" valuetype="unicode" checksum="NO" windowframe="10189,78318,14601,78298,14601,80817,10197,80833"/>
                <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO" windowframe="15378,78318,22425,78318,22425,80833,15378,80833"/>
                <field id="MRZ_NAME" fieldvalue="KARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;" valuetype="unicode" checksum="NO" windowframe="23203,78389,122123,78247,122123,80754,23203,80905"/>
                <field id="MRZ_DOCUMENT_NUMBER" fieldvalue="HU12345600" valuetype="unicode" checksum="OK" windowframe="10101,84702,35213,84702,35213,87586,10101,87578"/>
                <field id="MRZ_NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO" windowframe="36086,84717,43014,84733,43014,87614,36086,87594"/>
                <field id="MRZ_SEX" fieldvalue="F" valuetype="unicode" checksum="NO" windowframe="61865,84840,63603,84840,63603,87717,61865,87717"/>
                <field id="MRZ_BIRTH_DATE" fieldvalue="9202287" valuetype="unicode" checksum="OK" windowframe="43843,84769,61027,84769,61027,87649,43843,87649"/>
                <field id="MRZ_EXPIRY_DATE" fieldvalue="1501010" valuetype="unicode" checksum="OK" windowframe="64440,84769,81489,84769,81489,87649,64440,87649"/>
                <field id="MRZ_PERSONAL_DATA" fieldvalue="123456782&lt;&lt;&lt;&lt;&lt;0" valuetype="unicode" checksum="OK" windowframe="82334,84769,119572,84630,119572,87507,82334,87649"/>
                <field id="TYPE" fieldvalue="P" valuetype="unicode" checksum="NO"/>
                <field id="ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="NAME" fieldvalue="KARPATI VIKTORIA" valuetype="unicode" checksum="NO"/>
                <field id="SURNAME" fieldvalue="KARPATI" valuetype="unicode" checksum="NO"/>
                <field id="GIVENNAME" fieldvalue="VIKTORIA" valuetype="unicode" checksum="NO"/>
                <field id="DOCUMENT_NUMBER" fieldvalue="HU1234560" valuetype="unicode" checksum="OK"/>
                <field id="NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
                <field id="SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/>
                <field id="BIRTH_DATE" fieldvalue="920228" valuetype="unicode" checksum="OK"/>
                <field id="EXPIRY_DATE" fieldvalue="150101" valuetype="unicode" checksum="OK"/>
                <field id="PERSONAL_DATA" fieldvalue="123456782" valuetype="unicode" checksum="OK"/>
                <field id="MRZ_FIELDS" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&#x0A;HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04" valuetype="unicode" checksum="NO" windowframe="10189,78318,122123,78247,122079,87507,10101,87578"/>
                <field id="VIZ_FACE" checksum="NO" windowframe="12145,21932,38645,21932,38645,57933,12145,57933"/>
                <field id="VIZ_AUTH_DOCAREA" fieldvalue="1000" checksum="OK" windowframe="4146,8430,129146,8430,129146,95434,4146,95434"/>
                <field id="MRZ_AUTH_B900" fieldvalue="1000" checksum="OK" windowframe="10101,78247,122123,78247,122123,87717,10101,87717"/>
             </fieldlist>
             <imagelist>
                <image file="image1.jpg" captime="2010-08-16 17:17:58.687" camera="514" light="INFRA" page="0"/>
                <image file="image3.jpg" captime="2010-08-16 17:17:59.187" camera="257" light="WHITE" page="0"/>
                <image file="image2.jpg" captime="2010-08-16 17:18:00.093" camera="259" light="UV" page="0"/>
             </imagelist>
          </document>
          <document task="RESOLVERFIDDATA">
             <rfidfilelist>
                <rfidfile id="EF_DG1" rfidfiledata="615B5F1F58503C48554E4B4152504154493C3C56494B544F5249413C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C4855313233343536303048554E3932303232383746313530313031303132333435363738323C3C3C3C3C3034" datatype="binary"/>
             </rfidfilelist>
             <fieldlist>
                <field id="RFID_MRZ" fieldvalue="P&lt;HUNKARPATI&lt;&lt;VIKTORIA&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;&lt;HU12345600HUN9202287F1501010123456782&lt;&lt;&lt;&lt;&lt;04"/>
             </fieldlist>
          </document>
       </tasklist>
    </root>
    Last edited by Kenjitsuka; 10-10-2010 at 04:34 PM. Reason: Issue solved

  2. #2
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

  3. #3
    Kenjitsuka is offline Member
    Join Date
    Oct 2010
    Posts
    18
    Rep Power
    0

    Default

    I am not parsing it.
    Is that nessesary or usefull?

    As I've said I am not very good at programming, so I try to keep it as simple as possible. I'll look into XPATH, as I've not heard of it yet.

    My Java books are very old and only deal with really simple stuff to teach the very basics so they are no use.

    EDIT: Reading into XPATH, it seems it can be very helpfull! Thanks :)
    Last edited by Kenjitsuka; 10-09-2010 at 08:31 PM.

  4. #4
    Kenjitsuka is offline Member
    Join Date
    Oct 2010
    Posts
    18
    Rep Power
    0

    Default

    I've been reading up on XPATH and it does seem to hold the key to extracting the field values I need. But while I managed to get a small class going I don't seem to be able to access any values.

    It seems to run through the nodes, and I can get various outputs.
    For example, I can get a row of "NULL"-s, or no output at all.

    I have tried to swap the getTextContent() method with several other methods.
    For example getAttributes() and getNodeValue().

    I've tried several things to select different nodes, to see if that would help get better output. Going from the root to different paths ands selections, like using @* or '//file/text()' as a value for XPathExpression.

    Here's the XML parser:

    Java Code:
    package laadpackage;
    import org.w3c.dom.*;
    import javax.xml.xpath.*;
    import javax.xml.parsers.*;
    import java.io.IOException;
    import org.xml.sax.SAXException;
    
    public class splitter {
    
      public static void main(String[] args) 
       throws ParserConfigurationException, SAXException, 
              IOException, XPathExpressionException {
    
        DocumentBuilderFactory domFactory = 
        DocumentBuilderFactory.newInstance();
              domFactory.setNamespaceAware(true); 
        DocumentBuilder builder = domFactory.newDocumentBuilder();
        Document doc = builder.parse("C:/temp/document.xml");
        XPath xpath = XPathFactory.newInstance().newXPath();
           // XPath Query for showing all nodes value
        XPathExpression expr = xpath.compile("/");
    
        Object result = expr.evaluate(doc, XPathConstants.NODESET);
        NodeList nodes = (NodeList) result;
        for (int i = 0; i < nodes.getLength(); i++) {
         System.out.println(nodes.item(i).getAttributes()); 
        }
      }
    }
    Your thoughts and expertise are greatly appreciated!

  5. #5
    eRaaaa is online now Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default

    try:
    XPathExpression expr = xpath.compile("/");
    -->
    XPathExpression expr = xpath.compile("//@fieldvalue");

    and
    System.out.println(nodes.item(i).getAttributes());
    -->
    System.out.println(nodes.item(i).getNodeValue());

  6. #6
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    It's been a while since I've used XPath, but from what I recall, the key is in your XPathExpression, and your String which you're basing your expression on, "/", seems a bit anemic. If it were my code, I'd study XPath expressions and try to find one better suited for solving my problem.

    edit: or what eRaaaa said...

  7. #7
    Kenjitsuka is offline Member
    Join Date
    Oct 2010
    Posts
    18
    Rep Power
    0

    Default

    I've made the changes you suggested, eRaaaa, and it works like a charm!!!

    It seems I didn't try "@fieldvalue" when trying every permutation I found on XPATH introductions. I think I did try @*", but that didn't give any output. Possibly because I had the class return Attributes instead of the getNodeValue() method.

    Thanks a lot!!!!

    If there's anyone who wants an, admitteldly very static and inflexible, way of extracting XML values using Java you can use the code above and implement the changes eRaaaa suggested. :D

    P.S.
    At first I got the value "107" as my first result, which I don't need, but because of the way it is being parsed I could easily ignore it by setting the initial value of Int to 1 :)
    The use of XPATH is really so much easier and more flexible than having to sort through the content as if it where a text file :)

  8. #8
    eRaaaa is online now Senior Member
    Join Date
    Oct 2010
    Location
    Germany
    Posts
    787
    Rep Power
    6

    Default

    Quote Originally Posted by Kenjitsuka View Post
    At first I got the value "107" as my first result, which I don't need, but because of the way it is being parsed I could easily ignore it by setting the initial value of Int to 1 :)
    Yes, or if you can distinguish of them, maybe you could use another xpath expression.
    One different thing which I notice is, that the document of "107" doesnt declare a attribute name "type" or the field node hasnt a valuetype.

    example:
    if you want only the fieldvalue if the field has a attribute valuetype, you could use: //field[@valuetype]//@fieldvalue

    but then, as an example, you will not get the fieldvalue of
    <field id="MRZ_AUTH_B900" fieldvalue="1000" checksum="OK" windowframe="10101,78247,122123,78247,122123,87717 ,10101,87717"/>
    because its not holding the attribute valuetype.

  9. #9
    Kenjitsuka is offline Member
    Join Date
    Oct 2010
    Posts
    18
    Rep Power
    0

    Default

    Thanks for the additional info, eRaaaa!
    Lucky for me the fields I need are in the same order always, and are in a row as well. So I don't have to distinguish between them, phew! ;)

  10. #10
    Junky's Avatar
    Junky is offline Grand Poobah
    Join Date
    Jan 2011
    Location
    Dystopia
    Posts
    3,798
    Rep Power
    7

    Default

    Quote Originally Posted by Kenjitsuka View Post
    I am trying to help a friend with a problem with a program, in Java.

    I've managed to scrounge, customize and combine code for most of the 9 subtasks it needs to do. But I can't find anything at all on the task I am about to ask you for some pointers.

    The program reads a directory for zip files, extracts them and then reads the contents of an xml file that was extracted. That I've already handled.

    But, I must locate and extract 23 values a different program has dumped in the file. Those values will then be stored in my database. Making the database connection and inputting and reading values is also something I've fixed.

    ==================

    The problem is the xml files. They are semi-random and the values I need are of random lenght. Basically after a random number of lines of stuff I don't need there comes lines like this:
    Ah yes, the old "This isn't my homework. Honest!" trick.

Similar Threads

  1. Replies: 17
    Last Post: 07-25-2010, 07:20 PM
  2. Replies: 3
    Last Post: 12-30-2009, 02:19 AM
  3. Array Assign Values from a Textfile
    By fawadafr in forum Java Applets
    Replies: 6
    Last Post: 11-30-2008, 12:10 AM
  4. SquareRoot Jframe semi working
    By 2o2 in forum AWT / Swing
    Replies: 6
    Last Post: 09-29-2008, 03:20 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •