Make Transformer Stream file output ???
Hello!
I currently am using DOM to parse a very large XML file (about 25MB). I'm quite aware that SAX parser may be better for this since it does not load the entire XML into memory, however this is the way i've built it.
My problem is, I am outputting a new XML file based on a bunch of logic i've constructed in my program. Currently it seems that my output is being built in Memory, and then when the program completes, it outputs the entire new XML file at once. I think it would be faster, and use considerably less resources if i could stream the new XML while the program is running... however am unsure of how to go about this.
My program is spit up into multiple methods, etc, so i'm not really sure how to make them all stream to the same file in the correct order, etc.
Some of my code:
method that starts the creation of the XML:
Code:
public static void createXMLHead() {
try {
dbfac = DocumentBuilderFactory.newInstance();
docBuilder = dbfac.newDocumentBuilder();
doc = docBuilder.newDocument();
root = doc.createElement("AmazonEnvelope");
doc.appendChild(root);
Attr attr = doc.createAttribute("xmlns:xsi");
attr.setValue("http://www.w3.org/2001/XMLSchema-instanc");
Attr attr2 = doc.createAttribute("xsi:noNamespaceSchemaLocation");
attr2.setValue("amzn-envelope.xsd");
root.setAttributeNode(attr);
root.setAttributeNode(attr2);
Element header = doc.createElement("Header");
root.appendChild(header);
Element docVer = doc.createElement("DocumentVersion");
docVer.appendChild(doc.createTextNode("1.01"));
header.appendChild(docVer);
Element merchIdent = doc.createElement("MerchantIdentifier");
merchIdent.appendChild(doc.createTextNode("1234566778"));
header.appendChild(merchIdent);
Element messType = doc.createElement("MessageType");
messType.appendChild(doc.createTextNode("Inventory"));
root.appendChild(messType);
} catch (Exception e) {
e.printStackTrace();
}
}
method that creates the body of the XML, it loops a bunch of times depending on how many items i parse out of the original XML:
Code:
public static void createXMLBody(String itemSku, String itemAvailability) {
numOfItems++;
Element message = doc.createElement("Message");
root.appendChild(message);
Element messageId = doc.createElement("MessageID");
messageId.appendChild(doc.createTextNode(Integer.toString(numOfItems)));
message.appendChild(messageId);
Element opType = doc.createElement("OperationType");
opType.appendChild(doc.createTextNode("Update"));
message.appendChild(opType);
Element inventory = doc.createElement("Inventory");
message.appendChild(inventory);
Element sku = doc.createElement("SKU");
sku.appendChild(doc.createTextNode(itemSku));
inventory.appendChild(sku);
Element quantity = doc.createElement("Quantity");
quantity.appendChild(doc.createTextNode(itemAvailability));
inventory.appendChild(quantity);
}
and lastly the method that saves the entire thing to an XML file:
Code:
public static void saveXML() {
String randomName = null;
String xmlName = null;
String AmzIUTimeStamp = null;
try {
/////////////////
//Output the XML
//set up a transformer
TransformerFactory transfac = TransformerFactory.newInstance();
Transformer trans = transfac.newTransformer();
trans.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
trans.setOutputProperty(OutputKeys.INDENT, "yes");
//trans.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1");
doc.setXmlStandalone(true);
// formulate XML name
DateFormat df = new SimpleDateFormat("yyyyMMdd_hhmmss");
df.setTimeZone(TimeZone.getTimeZone("PST"));
AmzIUTimeStamp = df.format(new Date());
randomName = Long.toHexString(Double.doubleToLongBits(Math.random()));
xmlName = "AMZ_IU_" + AmzIUTimeStamp + "_" + randomName + ".xml";
//create string from xml tree
StreamResult result = new StreamResult(new File("C:\\Documents and Settings\\username\\Desktop\\AMZ_TEST_DATA\\outbox\\" + xmlName));
DOMSource source = new DOMSource(doc);
trans.transform(source, result);
} catch(Exception e) {
e.printStackTrace();
}
}
can anyone make any suggestions on how to go about this correctly, while still keeping my methods split up? thanks!
Re: Make Transformer Stream file output ???
Quote:
Originally Posted by
SnakeDoc
I currently am using DOM to parse a very large XML file (about 25MB). I'm quite aware that SAX parser may be better for this since it does not load the entire XML into memory, however this is the way i've built it.
My problem is, I am outputting a new XML file based on a bunch of logic i've constructed in my program. Currently it seems that my output is being built in Memory, and then when the program completes, it outputs the entire new XML file at once. I think it would be faster, and use considerably less resources if i could stream the new XML while the program is running... however am unsure of how to go about this....
I'm not the XML guru, so please take anything I say with a grain of salt, but when you state "large XML", and "desire to stream", I can't help but think of using a streaming XML parser such as SAX, such as you've mentioned, or perhaps even better, StAX. I'm curious why you must use DOM here and not one of the streaming parsers? A change might solve two birds with one stone.
Re: Make Transformer Stream file output ???
hello Fubarable!
thanks for the response...
i would rather stick with DOM for now since i understand it decently and SAX is quite different syntax and brand new to me. besides i have a few hundred lines written using DOM right now and don't really want to start over using SAX lol. I'm not quite sure if be using DOM is even really an issue here, since i'm trying to stream my output to a file instead of dumping it all at once... as in as I create the line in my XML, i'd like it to go to a file instead of residing in memory and then writing to XML once the method is complete. I'm using DocumentBuilderFactory and DocumentBuilder to create the file coupled with Transformer... so not sure if that changes the case or not..
thanks for any suggestions!
Re: Make Transformer Stream file output ???
Of course SAX is useful for XML streaming parsing, but I'm pretty sure that it is not for writing an XML to file. I'm not sure about StAX in this regard. Edit: on review of sources, it does appear that you *can* use StAX to output XML, although I don't think that it format the output for pretty printing, but there are other utilities for that.
Re: Make Transformer Stream file output ???
hmm... ya i'm more of trying to figure out a way to stream my output to xml more than modify how i'm parsing the original file. I'm using DocumentBuilderFactory, DocumentBuilder, and Transformer to output a "clean" looking XML file based on the logic in my program, but my problem is more along the lines that it only starts to write the file after it has been fully constructed inside memory, instead i'd like to have it start streaming the output to a file as its being created, so basically write the file in real-time. if i need to rewrite how i'm outputing my file, thats ok... but i'd really like to stay away from rewriting my parsing method because i don't think that really has much to do with my output (its just how i get data from the the original file so that i can populate my output file - and since DOM reads the entire original file into memory prior to me parsing and populating my output, it shouldn't be part of the problem).. maybe i'm wrong lol... i'm still very much so a newbie!
Thanks again for any advice! :)