Results 1 to 7 of 7
  1. #1
    Salacious is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Tokenizing string[] and csv files

    I am not sure if this belongs here, How ever i am not new to Java in the sense of the word. So I didn't post this in the "I am a newb, please help me grasp the basics" section. I should also point out, I am not looking for the answer - that's cheating - I am looking for some one to say "read this..." and I do learn best from examples - if that helps at all.

    Question

    I have recently received some data from an XML Cdata Section and its now sitting in a String[] info section (passed in as a string) that is tokenized through "\\|" with a -1 to remove spaces. Currently the data when spit out, before being tokenized looks like:

    note this data here is from the String that is passed into the method bellow.

    Apples|jake|jack daniels|1234
    45|james|bananas

    When tokenized and placed into the array its then spit out as:

    Java Code:
    Apples
    Jake
    Jack Daniels
    1234
       45 //New Line, Debugger shows it as: 1234      /n45
    James
    Bananas
    I need it to take that and spit it into a csv that looks like:

    Java Code:
    Apples  Jake    Jack Daniels 1234
    45      James   Bananas
    The code I am using to tokenize everything is:

    Java Code:
    	private static final String getInfo(String str) throws Exception
    	{
    		StringTokenizer st = new StringTokenizer(str.trim());
    		while (st.hasMoreTokens())
    		{
    			String[] info = str.split("\\|", -1);
    
    			for (int i = 0; i< info.length; i++)
    			{
    				System.out.println(info[i]);
    			}
    		}
    		
    		return "Nothing to return";
    
    	}
    So based on the information I have given you, assuming it makes sense and is enough - what are your thoughts on how I would get it from its tokenized section to a csv data file formatted as how I have showed? I have been up and down the documentation for java and in and out of google - I know how to write to a csv file, the problem is when I pass this info in as it is into a method that creates the csv file - the file is created, the data is never populated.

    If this isn't the right section please move.

  2. #2
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    25

    Default Re: Tokenizing string[] and csv files

    I'm confused as that's the most un-CSV data I've ever seen. I'm sure you know, but CSV means comma-separated values.

    Can you outline exactly the behavior you're trying to achieve and how your current code does not meet this requirement?

  3. #3
    Salacious is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: Tokenizing string[] and csv files

    I do know that, and that's the problem. all the information I pulled out of the Cdata section of the XML was pipe delimited as shown in the above example. I need to take the data I have shown you (as example data) and some how put into a csv based file. So
    that would mean - as I re-evaluate what I have done and do a #headtodesk - I need to then take all those pipes and turn them into ',' so they can be processed, but that still leaves me with an issue of how to get it into csv - even if the pipes are transformed into ",".

    Essentially to dumb this right down to one sentence with a question:

    The pipe delimited data I received from the CDATA section of the xml file I read in, needs to be placed into a CSV file. How?

    --> Do I change all the pipes to commas? (is that possible - never tried)

  4. #4
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    25

    Default Re: Tokenizing string[] and csv files

    Maybe I'm misunderstanding your question and over-simplifying but the pipes can be changed to commas easily with String's replace(...) or replaceAll(...) method. There are still some unresolved issues in my mind though including
    • Is there any chance that the pre-analyzed (pre-piped) text contains pipe symbols, and if so, how would you want to handle that?
    • Is there any chance that the pre-analyzed (pre-piped) text contains commas, and if so, how would you want to handle that? Would you want to enclose the String in quotes?
    • Could the pre-analyzed (pre-piped) text contain quoted Strings, and if so, how would you want to handle that?
    • I'm still unclear on how you wish to handle line breaks? Do you want to change them to spaces? Leave them in? Treat them the same as pipes and use them as a place to place a comma delimiter?

  5. #5
    Salacious is offline Member
    Join Date
    Jan 2012
    Posts
    3
    Rep Power
    0

    Default Re: Tokenizing string[] and csv files

    Let me post everything I have and give more information which is what I should have done in my first post >_<. Essentially bellow is all the code I have written so far. It rips the data from the CData, parses it for pipes and spits it out.

    So with this information, this code and all this jazz I will again boil this down - I appoligize for not giving all the info.

    How do I take this pipe delimited info with commas, quotes and possibly other pipes and in as little code as possible (apparently 10 lines or less - impossible to me - but not to a computer science graduate that is my boss) convert it all to csv? Is there any examples, documentation or what not of this out there?

    • There are line breaks
    • Their are commas
    • There are quotes
    • There might be other |'s but as far as I can see the pipes are separating the data like cells in csv (opened as excel)
    • Must be comma delimted, each line break is a new row



    Thoughts?

    Your help and patience with my apparent stupidity is much appreciated.

    Java Code:
    package xmlcsv;
    
    import java.io.File;
    import java.io.FileWriter;
    import java.util.StringTokenizer;
    
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    
    import org.w3c.dom.CDATASection;
    import org.w3c.dom.CharacterData;
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    
    /**
     * 
     * Essentially we take a xml file, find its cdata section and then
     * process the information in there to then spit out to a csv file.
     * 
     * This code contains hard coded information that should be changed.
     * You are granted permission to change, alter, redistribute this at will.
     *
     */
    public class XmlCSV {
    
    	/**
    	 * We need to process the data in the Cdata section and spit it out in a csv file.
    	 * This is the "main" part of the program which is run.
    	 * 
    	 * @param args
    	 */
    	public static void main(String[] args) throws Exception 
    	{
    		File file = new File("Data.xml");
    		DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
    		Document doc = db.parse(file);
    		doc.getDocumentElement().normalize();
    
    		NodeList nodes = doc.getElementsByTagName("TransmissionWrapper");
    		for (int i = 0; i< nodes.getLength(); i++)
    		{
    			Element element = (Element) nodes.item(i);
    			NodeList cData = element.getElementsByTagName("TransmissionData");
    			Element line = (Element) cData.item(0);
    
    			//test
    			getInfo(getCData(line));
    		}
    	}
    
    	/**
    	 * We want to get all the data out of the CData Section.
    	 * Pass it into the getInfo() method to then get the
    	 * the data which will be parsed and thrown into a csv file.
    	 * 
    	 * @param e
    	 * @return
    	 */
    	private static final String getCData(Element e)
    	{
    		NodeList child = e.getChildNodes();
    		if (child != null)
    		{
    			for (int i = 0; i< child.getLength(); i++)
    			{
    				Node childNode = child.item(i);
    				if (childNode.getNodeType() == Node.CDATA_SECTION_NODE)
    				{
    					CDATASection cdata = (CDATASection) childNode;
    					String data = cdata.getData();
    					return data;
    				}
    			}
    		}
    
    		return "Theres nothing there";
    	}
    
    	/**
    	 * We need to tokenize based on the pipe symbol and spit
    	 * it out into another method that will then place it into
    	 * the CSV file to which we have named (hardcoded): "data2.csv"
    	 * 
    	 * @throws Exception 
    	 * 
    	 */
    	private static final String getInfo(String str) throws Exception
    	{
    		StringTokenizer st = new StringTokenizer(str.trim());
    		while (st.hasMoreTokens())
    		{
    			String[] info = str.split("\\|", -1);
    
    			for (int i = 0; i< info.length; i++)
    			{
    				System.out.println(info[i]);
    			}
    		}
    		
    		return "Nothing to return";
    
    	}
    
    }
    I really think I am going in a wrong direction and I really need some one to just point me in the right direction.
    Last edited by Salacious; 01-28-2012 at 10:01 PM.

  6. #6
    bams is offline Member
    Join Date
    Jan 2012
    Posts
    8
    Rep Power
    0

    Default Re: Tokenizing string[] and csv files

    Try using Regular Expressions in Java

    Regular Expressions and the Java Programming Language

  7. #7
    Tolls is offline Moderator
    Join Date
    Apr 2009
    Posts
    11,450
    Rep Power
    19

    Default Re: Tokenizing string[] and csv files

    I realise this is a couple of days old now, but Fubarable is quite right I think.
    If this is your data:
    Apples|jake|jack daniels|1234
    45|james|bananas

    Then read a line, replace the pipe for commas and write it out to the new file.
    Repeat to EOF.

    Indeed you could probably use Apaches CSV library to handle much of this for you, though I haven't used it myself.

Similar Threads

  1. Replies: 5
    Last Post: 01-13-2012, 08:40 PM
  2. Tokenizing VB code from java
    By hedonist in forum New To Java
    Replies: 8
    Last Post: 07-16-2010, 12:57 AM
  3. Trouble with Tokenizing String
    By ramathews in forum New To Java
    Replies: 0
    Last Post: 03-30-2010, 02:19 PM
  4. String tokenizing with Scanner
    By vijaygk in forum Advanced Java
    Replies: 2
    Last Post: 07-15-2008, 04:44 AM
  5. Tokenizing with Scanner
    By sireesha in forum New To Java
    Replies: 3
    Last Post: 02-05-2008, 08:44 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •