Results 1 to 10 of 10
- 10-09-2010, 07:00 PM #1
Member
- Join Date
- Oct 2010
- Posts
- 18
- Rep Power
- 0
Solved: Extracting values from textfile with semi-random content
Hello everyone,
I've programmed in JAVA 8 years ago a bit, when I was in college.
But I was not all that good at it. Despite that I am trying to help a friend with a problem with a program, in Java.
I've managed to scrounge, customize and combine code for most of the 9 subtasks it needs to do. But I can't find anything at all on the task I am about to ask you for some pointers.
The program reads a directory for zip files, extracts them and then reads the contents of an xml file that was extracted. That I've already handled.
But, I must locate and extract 23 values a different program has dumped in the file. Those values will then be stored in my database. Making the database connection and inputting and reading values is also something I've fixed.
==================
The problem is the xml files. They are semi-random and the values I need are of random lenght. Basically after a random number of lines of stuff I don't need there comes lines like this:
I need the characters between 'fieldvalue=" ' and the next ".Java Code:<field id="MRZ1" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<" valuetype="unicode" checksum="NO"/> <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782<<<<<04" valuetype="unicode" checksum="OK"/> <field id="MRZ_TYPE" fieldvalue="P<" valuetype="unicode" checksum="NO"/> <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/>
What I can't figure out is how to tell Java to start reading characters starting from the "-symbol after the 'fieldvalue' text.
Each line starts with a different value for field id, which makes it impossible to tell the program to start extracting from a fixed number of characters.
I have been thinking about it, but all the randomness is giving me a headache...
I have googled extensively, but all I can come up with is tons of pages telling people how to generate random text.
For your convenience below is the entire content of one of the xml files.
Thanks so much in advance!
Kenji.
Java Code:<?xml version="1.0" ?> <!-- Passport Reader Document File --> <root> <reader device="PRMC233R106491" software="2.1.2.4"/> <pagelist> <imagelist> <image file="image1.jpg" captime="2010-08-16 17:17:58.687" camera="514" light="INFRA" page="0"/> <image file="image2.jpg" captime="2010-08-16 17:18:00.093" camera="259" light="UV" page="0"/> <image file="image3.jpg" captime="2010-08-16 17:17:59.187" camera="257" light="WHITE" page="0"/> </imagelist> </pagelist> <tasklist> <document task="RESOLVERFIDDATA"> <rfidfilelist> <rfidfile id="EF_COM" rfidfiledata="60155F0104303130375F36063034303030305C0361756F" datatype="binary"/> </rfidfilelist> <fieldlist> <field id="RFID_DOCUMENT_DESCRIPTOR" fieldvalue="107" rfiddir="-2147450873"/> </fieldlist> </document> <document type="ICAO standard Passport (MRP)" task="RESOLVEMRZTEXT"> <fieldlist> <field id="MRZ1" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<" valuetype="unicode" checksum="NO"/> <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782<<<<<04" valuetype="unicode" checksum="OK"/> <field id="MRZ_TYPE" fieldvalue="P<" valuetype="unicode" checksum="NO"/> <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="MRZ_NAME" fieldvalue="KARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<" valuetype="unicode" checksum="NO"/> <field id="MRZ_DOCUMENT_NUMBER" fieldvalue="HU12345600" valuetype="unicode" checksum="OK"/> <field id="MRZ_NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="MRZ_SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/> <field id="MRZ_BIRTH_DATE" fieldvalue="9202287" valuetype="unicode" checksum="OK"/> <field id="MRZ_EXPIRY_DATE" fieldvalue="1501010" valuetype="unicode" checksum="OK"/> <field id="MRZ_PERSONAL_DATA" fieldvalue="123456782<<<<<0" valuetype="unicode" checksum="OK"/> <field id="TYPE" fieldvalue="P" valuetype="unicode" checksum="NO"/> <field id="ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="NAME" fieldvalue="KARPATI VIKTORIA" valuetype="unicode" checksum="NO"/> <field id="SURNAME" fieldvalue="KARPATI" valuetype="unicode" checksum="NO"/> <field id="GIVENNAME" fieldvalue="VIKTORIA" valuetype="unicode" checksum="NO"/> <field id="DOCUMENT_NUMBER" fieldvalue="HU1234560" valuetype="unicode" checksum="OK"/> <field id="NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/> <field id="BIRTH_DATE" fieldvalue="920228" valuetype="unicode" checksum="OK"/> <field id="EXPIRY_DATE" fieldvalue="150101" valuetype="unicode" checksum="OK"/> <field id="PERSONAL_DATA" fieldvalue="123456782" valuetype="unicode" checksum="OK"/> <field id="MRZ_FIELDS" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<
HU12345600HUN9202287F1501010123456782<<<<<04" valuetype="unicode" checksum="NO"/> </fieldlist> </document> <document task="RESOLVERFIDDATA"> <rfidfilelist> <rfidfile id="EF_SOD" rfidfiledata="vxgen0001.bin" datatype="file"/> </rfidfilelist> <fieldlist/> </document> <document task="RESOLVERFIDDATA"> <rfidfilelist> <rfidfile id="EF_DG15" rfidfiledata="6F81C23081BF300D06092A864886F70D01010105000381AD003081A90281A100B3246E1C134BD3DC23BF4E4FBCC520C9B2A06A3EB6D5B8DBCC3ADCF786739DE050E366CE83F7BAF1B594F9349EBB10E70A80D75EDEBBF5CC465EE3B76236A40B828BBBE1B2787AFDFEA00090D008957ABCC3A1C03F4F9637EEE76502361012F0425F6B37007B7EBD8E5FFE9D0B8C09E269913E60E9B675181396960FCF23640D3014C793052E7A9822D172BE430EC3A44A5EA907D9E2DCEEC1F94F5420B0D82B0203010001" datatype="binary"/> </rfidfilelist> <fieldlist/> </document> <document task="RESOLVERFIDDATA"> <rfidfilelist> <rfidfile id="EF_DG2" rfidfiledata="vxgen0002.bin" datatype="file"/> </rfidfilelist> <fieldlist> <field id="RFID_FACE" fieldvalue="vxgen0003.jp2" valuetype="file" image="vxgen0003.jp2"/> </fieldlist> </document> <document type="ICAO standard Passport (MRP)" windowframe="4146,8430,129146,8430,129146,95434,4146,95434" task="RECOGNIZE"> <fieldlist> <field id="MRZ1" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<" valuetype="unicode" checksum="NO" windowframe="10189,78318,122123,78247,122123,80754,10197,80833"/> <field id="MRZ2" fieldvalue="HU12345600HUN9202287F1501010123456782<<<<<04" valuetype="unicode" checksum="OK" windowframe="10101,84702,122079,84630,122079,87507,10101,87578"/> <field id="MRZ_TYPE" fieldvalue="P<" valuetype="unicode" checksum="NO" windowframe="10189,78318,14601,78298,14601,80817,10197,80833"/> <field id="MRZ_ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO" windowframe="15378,78318,22425,78318,22425,80833,15378,80833"/> <field id="MRZ_NAME" fieldvalue="KARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<" valuetype="unicode" checksum="NO" windowframe="23203,78389,122123,78247,122123,80754,23203,80905"/> <field id="MRZ_DOCUMENT_NUMBER" fieldvalue="HU12345600" valuetype="unicode" checksum="OK" windowframe="10101,84702,35213,84702,35213,87586,10101,87578"/> <field id="MRZ_NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO" windowframe="36086,84717,43014,84733,43014,87614,36086,87594"/> <field id="MRZ_SEX" fieldvalue="F" valuetype="unicode" checksum="NO" windowframe="61865,84840,63603,84840,63603,87717,61865,87717"/> <field id="MRZ_BIRTH_DATE" fieldvalue="9202287" valuetype="unicode" checksum="OK" windowframe="43843,84769,61027,84769,61027,87649,43843,87649"/> <field id="MRZ_EXPIRY_DATE" fieldvalue="1501010" valuetype="unicode" checksum="OK" windowframe="64440,84769,81489,84769,81489,87649,64440,87649"/> <field id="MRZ_PERSONAL_DATA" fieldvalue="123456782<<<<<0" valuetype="unicode" checksum="OK" windowframe="82334,84769,119572,84630,119572,87507,82334,87649"/> <field id="TYPE" fieldvalue="P" valuetype="unicode" checksum="NO"/> <field id="ISSUE_COUNTRY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="NAME" fieldvalue="KARPATI VIKTORIA" valuetype="unicode" checksum="NO"/> <field id="SURNAME" fieldvalue="KARPATI" valuetype="unicode" checksum="NO"/> <field id="GIVENNAME" fieldvalue="VIKTORIA" valuetype="unicode" checksum="NO"/> <field id="DOCUMENT_NUMBER" fieldvalue="HU1234560" valuetype="unicode" checksum="OK"/> <field id="NATIONALITY" fieldvalue="HUN" valuetype="unicode" checksum="NO"/> <field id="SEX" fieldvalue="F" valuetype="unicode" checksum="NO"/> <field id="BIRTH_DATE" fieldvalue="920228" valuetype="unicode" checksum="OK"/> <field id="EXPIRY_DATE" fieldvalue="150101" valuetype="unicode" checksum="OK"/> <field id="PERSONAL_DATA" fieldvalue="123456782" valuetype="unicode" checksum="OK"/> <field id="MRZ_FIELDS" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<
HU12345600HUN9202287F1501010123456782<<<<<04" valuetype="unicode" checksum="NO" windowframe="10189,78318,122123,78247,122079,87507,10101,87578"/> <field id="VIZ_FACE" checksum="NO" windowframe="12145,21932,38645,21932,38645,57933,12145,57933"/> <field id="VIZ_AUTH_DOCAREA" fieldvalue="1000" checksum="OK" windowframe="4146,8430,129146,8430,129146,95434,4146,95434"/> <field id="MRZ_AUTH_B900" fieldvalue="1000" checksum="OK" windowframe="10101,78247,122123,78247,122123,87717,10101,87717"/> </fieldlist> <imagelist> <image file="image1.jpg" captime="2010-08-16 17:17:58.687" camera="514" light="INFRA" page="0"/> <image file="image3.jpg" captime="2010-08-16 17:17:59.187" camera="257" light="WHITE" page="0"/> <image file="image2.jpg" captime="2010-08-16 17:18:00.093" camera="259" light="UV" page="0"/> </imagelist> </document> <document task="RESOLVERFIDDATA"> <rfidfilelist> <rfidfile id="EF_DG1" rfidfiledata="615B5F1F58503C48554E4B4152504154493C3C56494B544F5249413C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C3C4855313233343536303048554E3932303232383746313530313031303132333435363738323C3C3C3C3C3034" datatype="binary"/> </rfidfilelist> <fieldlist> <field id="RFID_MRZ" fieldvalue="P<HUNKARPATI<<VIKTORIA<<<<<<<<<<<<<<<<<<<<<<HU12345600HUN9202287F1501010123456782<<<<<04"/> </fieldlist> </document> </tasklist> </root>Last edited by Kenjitsuka; 10-10-2010 at 04:34 PM. Reason: Issue solved
-
How are you parsing the XML file? DOM? SAX? StAX? Have you tried XPath?
- 10-09-2010, 07:18 PM #3
Member
- Join Date
- Oct 2010
- Posts
- 18
- Rep Power
- 0
I am not parsing it.
Is that nessesary or usefull?
As I've said I am not very good at programming, so I try to keep it as simple as possible. I'll look into XPATH, as I've not heard of it yet.
My Java books are very old and only deal with really simple stuff to teach the very basics so they are no use.
EDIT: Reading into XPATH, it seems it can be very helpfull! Thanks :)Last edited by Kenjitsuka; 10-09-2010 at 08:31 PM.
- 10-10-2010, 04:00 PM #4
Member
- Join Date
- Oct 2010
- Posts
- 18
- Rep Power
- 0
I've been reading up on XPATH and it does seem to hold the key to extracting the field values I need. But while I managed to get a small class going I don't seem to be able to access any values.
It seems to run through the nodes, and I can get various outputs.
For example, I can get a row of "NULL"-s, or no output at all.
I have tried to swap the getTextContent() method with several other methods.
For example getAttributes() and getNodeValue().
I've tried several things to select different nodes, to see if that would help get better output. Going from the root to different paths ands selections, like using @* or '//file/text()' as a value for XPathExpression.
Here's the XML parser:
Your thoughts and expertise are greatly appreciated!Java Code:package laadpackage; import org.w3c.dom.*; import javax.xml.xpath.*; import javax.xml.parsers.*; import java.io.IOException; import org.xml.sax.SAXException; public class splitter { public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException, XPathExpressionException { DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance(); domFactory.setNamespaceAware(true); DocumentBuilder builder = domFactory.newDocumentBuilder(); Document doc = builder.parse("C:/temp/document.xml"); XPath xpath = XPathFactory.newInstance().newXPath(); // XPath Query for showing all nodes value XPathExpression expr = xpath.compile("/"); Object result = expr.evaluate(doc, XPathConstants.NODESET); NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) { System.out.println(nodes.item(i).getAttributes()); } } }
- 10-10-2010, 04:08 PM #5
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
try:
XPathExpression expr = xpath.compile("/");
-->
XPathExpression expr = xpath.compile("//@fieldvalue");
and
System.out.println(nodes.item(i).getAttributes());
-->
System.out.println(nodes.item(i).getNodeValue());
-
It's been a while since I've used XPath, but from what I recall, the key is in your XPathExpression, and your String which you're basing your expression on, "/", seems a bit anemic. If it were my code, I'd study XPath expressions and try to find one better suited for solving my problem.
edit: or what eRaaaa said...
- 10-10-2010, 04:24 PM #7
Member
- Join Date
- Oct 2010
- Posts
- 18
- Rep Power
- 0
I've made the changes you suggested, eRaaaa, and it works like a charm!!!
It seems I didn't try "@fieldvalue" when trying every permutation I found on XPATH introductions. I think I did try @*", but that didn't give any output. Possibly because I had the class return Attributes instead of the getNodeValue() method.
Thanks a lot!!!!
If there's anyone who wants an, admitteldly very static and inflexible, way of extracting XML values using Java you can use the code above and implement the changes eRaaaa suggested. :D
P.S.
At first I got the value "107" as my first result, which I don't need, but because of the way it is being parsed I could easily ignore it by setting the initial value of Int to 1 :)
The use of XPATH is really so much easier and more flexible than having to sort through the content as if it where a text file :)
- 10-10-2010, 04:53 PM #8
Senior Member
- Join Date
- Oct 2010
- Location
- Germany
- Posts
- 780
- Rep Power
- 4
Yes, or if you can distinguish of them, maybe you could use another xpath expression.
One different thing which I notice is, that the document of "107" doesnt declare a attribute name "type" or the field node hasnt a valuetype.
example:
if you want only the fieldvalue if the field has a attribute valuetype, you could use: //field[@valuetype]//@fieldvalue
but then, as an example, you will not get the fieldvalue of
<field id="MRZ_AUTH_B900" fieldvalue="1000" checksum="OK" windowframe="10101,78247,122123,78247,122123,87717 ,10101,87717"/>
because its not holding the attribute valuetype.
- 10-10-2010, 10:38 PM #9
Member
- Join Date
- Oct 2010
- Posts
- 18
- Rep Power
- 0
Thanks for the additional info, eRaaaa!
Lucky for me the fields I need are in the same order always, and are in a row as well. So I don't have to distinguish between them, phew! ;)
- 03-02-2011, 11:52 PM #10
Similar Threads
-
Java Question Need Help Restarting the Values of Variables from Random Numbers So Out
By JavaStudent1990 in forum New To JavaReplies: 17Last Post: 07-25-2010, 07:20 PM -
Tokenizing textfile content into several Tables (Float[])
By althair in forum New To JavaReplies: 3Last Post: 12-30-2009, 02:19 AM -
Array Assign Values from a Textfile
By fawadafr in forum Java AppletsReplies: 6Last Post: 11-30-2008, 12:10 AM -
SquareRoot Jframe semi working
By 2o2 in forum AWT / SwingReplies: 6Last Post: 09-29-2008, 03:20 AM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks