Results 1 to 12 of 12
  1. #1
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default Problems with string split

    Hi,

    I am trying to read and parse the header of a dat file. I am porting code over from C++ where name-value pairs are easily handled by copying the header to a char array and doing sscanf( --, "%s %f",--).

    In Java I am reading the file in through a dataInputStream and doing a read(char[], 0, 80*Character.SIZE) where 80*Character.SIZE is the length of the header. I then convert the char array to a string, and this is where things get funny.

    If I do a println on this string nothing prints out, if I do a trim on this string then the first name-value pair prints. (I am looking at this output in an Eclipse IDE console viewer).

    If I do s1 = s1.replace("//W+"," "), everything that is in the header is printed out except values that have negatives or decimal points are split at that point, meaning 67.52 becomes 67 52.

    If I do String[] s2 = s1.split("//W+") it does the same as above and will place non-integer numbers in two separate elements.

    If I do String[] s2 = s1.split("[ ]+"), all the values are kept in good condition but all the names are removed.

    I can look at this file in emacs and see there are '@' between each name-value pair and whitespace between the actual name and value.

    Hopefully I have provided enough information. Please let me know if you can help out in any way.

    Thanks in advance.

  2. #2
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    618
    Rep Power
    4

    Default

    Can you show example data, which you read from file and what result do you want achieve? because you wrote many text. but it doesn't clearly.
    Skype: petrarsentev
    http://TrackStudio.com

  3. #3
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    Hi,

    I thought it was a little convoluted, here is a sample of the dat file and my code.

    I had to relabel it as .txt but it still has the same problem.

    Java Code:
    public ArrayReader() throws IOException {
    	String filename;
    	File inputFile;
    	FileInputStream instream;
    	StringBuffer inString;
    	DataInputStream datastream;
    
    	
    
    		filename = "example.txt";
    		inputFile = new File(filename);
    		System.out.println("File exists: " + inputFile.exists());
    		instream = new FileInputStream(inputFile);		
    		
    		System.out.println("Available: " + instream.available());
    		datastream = new DataInputStream(instream);
    		
    		BufferedReader bufin = new BufferedReader(new FileReader(inputFile));
    
    		
    
    		char[] header = new char[40*Character.SIZE];		
    		bufin.read(header, 0, 40*Character.SIZE);
    		
    				
    		String s1 = new String(header);
    		
    		System.out.println(s1.replaceAll("\\W+", " "));
    		String[] headerLine = s1.split("\\s+");
    		
    		for (int ii = 0; ii < headerLine.length ; ii++) {
    			
    			System.out.println(headerLine[ii]);
    		
    		}
    }
    You should notice that the when I do s1.replace it prints out everything but delimits periods, etc. When I do split it doesnt see the name of the name-value pair.

    Thanks for the help.
    Attached Files Attached Files

  4. #4
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    618
    Rep Power
    4

    Default

    I see you use special encoding. What kind is encoding?
    Skype: petrarsentev
    http://TrackStudio.com

  5. #5
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    I do not know how this was written. Is there anyway to figure it out?

  6. #6
    Petr's Avatar
    Petr is offline Senior Member
    Join Date
    Jan 2011
    Location
    Russia
    Posts
    618
    Rep Power
    4

    Default

    Ok. When I open file through gedit I see follow
    Java Code:
    TIME   Wed Sep 27 22:26:15 2006

    how I understand you will want get key/value as look like
    TIME=Wed Sep 27 22:26:15 2006
    JDAY=001
    Am'I right?
    Skype: petrarsentev
    http://TrackStudio.com

  7. #7
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    Right, exactly.

  8. #8
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    Hi,

    I was hoping wondering if you had any luck with the example I gave. I am still having difficulty parsing this file.

    Thanks again

  9. #9
    toadaly is offline Senior Member
    Join Date
    Jan 2009
    Posts
    671
    Rep Power
    6

    Default

    Well, if I were doing this, I'd treat the first line special since it is a different format than the second line. I'd do a simple String.find("TIME") to make sure it's the first thing in the first line, and then do a substring to get the rest of the first line. For the second line, I'd split first on "\\.+". This gives you the tokens to parse. Then I'd split each token on whitespace to get the key and value.

  10. #10
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    Thanks Toadaly,

    The weird thing is that its all whitespace according to Java. If you look at this file in emacs or some other reader there is a difference between the characters outside the pairs and within the pairs themselves.

    In the example I gave I did: s1.replace("//W+", " ") and this removes all the space outside the pairs and within the pairs. If I similarly do s1.replace("//W+","-"), it again replaces everything outside the pairs and within the pairs to a dash.

    There is a difference between these characters but I don't know how to find out what it is.

    Thanks for the help.

  11. #11
    toadaly is offline Senior Member
    Join Date
    Jan 2009
    Posts
    671
    Rep Power
    6

    Default

    Ok, so the "." are really an unknown character that emacs displays as ".". No problem. Instead of trying to strip those out, you can create a pattern to look for the other stuff:

    Java Code:
    import java.util.regex.*; 
    ...
    
    Pattern pattern = Pattern.compile(".*([A-Z]+\\p{Space}+\\p{Graph}*).*");
    Matcher matcher = pattern.match(....your line of characters...);
    
    while(matcher.find()) {
      String keyValPair = matcher.group(1);
    }
    This regex is saying "look for zero or more don't care characters followed by a capturing group followed by zero or more don't cares". The capturing group is the part in paranetheses. It is saying "look for one or more all cap alpha characters followed by at least 1 white space followed by zero or more displayable characters (alphas, numerics, and punctuation).

  12. #12
    dvreed77 is offline Member
    Join Date
    Feb 2011
    Posts
    27
    Rep Power
    0

    Default

    Hi,

    I figured it out. I had to go look at the actual bytes of data. The characters in between were null chars so I just did:

    s1.split("/0+")

    and it worked perfectly.

    Thanks for taking a look.

Similar Threads

  1. String split help
    By YoungJavaBoy in forum New To Java
    Replies: 7
    Last Post: 01-19-2011, 01:39 AM
  2. Split a String with split()--Help
    By danilson in forum New To Java
    Replies: 7
    Last Post: 11-19-2010, 04:08 PM
  3. string split
    By gisler in forum New To Java
    Replies: 6
    Last Post: 12-17-2009, 02:23 PM
  4. How to split a String using split function
    By Java Tip in forum java.lang
    Replies: 4
    Last Post: 04-17-2009, 08:27 PM
  5. How to split a String using split function
    By JavaBean in forum Java Tip
    Replies: 0
    Last Post: 10-04-2007, 09:32 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •