Results 1 to 2 of 2
  1. #1
    loki is offline Member
    Join Date
    Apr 2009
    Posts
    11
    Rep Power
    0

    Default removing duplicate whitespace

    Hi all

    Basically i've got some code below which removes all the whitespace from a text file.

    Originally it read through the whole file line by line, but this can take forever as some of the files that i'm dealing with can be quite large.

    What i want the program to do is just remove the white space from the xml tag headers and not the whole file. For instance the tags appear as < t a g >< / t a g > instead of the normal way <tag></tag> which is a problem when i'm trying to use DOMParse to get the xml from the file.

    As you can see from the code i added in the line
    Java Code:
     if (strLine.contains("< M D R - D V D >"))
    This was just to see if the program would pick up the that particular tag which it did, however my file has loads of different tags.

    My question is, is there any way of modifying the code to make the program pull all the tag names using a single line of code without having to enter every single tag name?

    I have a few other questions, but i'll get this one out of the way first.

    Java Code:
    import java.util.regex.*;
    import java.io.*;
    
    public class regularexpressions{
      public static void main(String[] args) throws IOException{
        BufferedReader bf = new BufferedReader(new InputStreamReader(System.in));
        System.out.print("Enter file name: ");
        String filename = bf.readLine();
        File file = new File(filename);
        if(!filename.endsWith(".txt")){
          System.out.println("Usage: This is not a text file!");
          System.exit(0);
        }
        else if(!file.exists()){
          System.out.println("File not found!");
          System.exit(0);
        }
        FileInputStream fstream = new FileInputStream(filename);
        DataInputStream in = new DataInputStream(fstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        Pattern p;
        Matcher m;
        String afterReplace = "";
        String strLine;
        String inputText = "";
        while ((strLine = br.readLine()) != null)
        	if (strLine.contains("< M D R - D V D >")){
          System.out.println (strLine);
          inputText = strLine;
          p = Pattern.compile("\\s+");
          m = p.matcher(inputText);
          System.out.println(afterReplace);
          afterReplace = afterReplace + m.replaceAll("") + "\r\n";
        }
        FileWriter fstream1 = new FileWriter(filename);
        BufferedWriter out = new BufferedWriter(fstream1);
        out.write(afterReplace);
        in.close();
        out.close();
      }
    }
    Thanks

    loki

  2. #2
    Singing Boyo is offline Senior Member
    Join Date
    Mar 2009
    Posts
    552
    Rep Power
    6

    Default

    This is one of those things that sounds simplistic, but is really very difficult to program. Your best bet is to use String.split("<") to determine where the tags begin, add the '<' back onto the beginning of each String in the array that is returned, and then use String.split(">") to determine the end of the tag, adding the '>' char back onto the end of the FIRST String in the returned array. You will then have numerous arrays of Strings, with the first String of each array being a tag. After removing the whitespaces from the String, you can put all the Strings back together into the original string. e.g
    Java Code:
    String[] myLongTags = myString.split("<");
    	String[][] myTags = new String[myLongTags.length][];
    	myString = "";
    	for(int i = 0; i<myLongTags.length; i++){
    		if(i != 0)
    			myLongTags[i] = "<" + myLongTags[i];
    	    myTags[i] = myLongTags[i].split(">");
    	    if(i != 0){
    	    	myTags[i][0] += ">";
    	        myTags[i][0] = removeWhiteSpaces(myTags[i][0]);//removes the whitespaces from the String
    	    }
    	    myLongTags[i] = "";
    	    for(int x = 0; x < myTags[i].length; x++)
    	         myLongTags[i] += myTags[i][x];
    	    myString += myLongTags[i];
    	}
    		System.out.println(myString);
    	}
    	public String removeWhiteSpaces(String s){
    		String[] tagSubArray = s.split(" ");
    		s = "";
    		for(int i = 0; i<tagSubArray.length; i++)
    			s += tagSubArray[i];
    		return s;
    	}
    }
    Hope this helps... heres a main method to test it
    Java Code:
    	public static void main(String[]args){
    		String s = "The quick < b r > brown fox jumped over <b  r > the <l  a zy> dogs";
    		new question(s);
    	}
    Last edited by Singing Boyo; 04-25-2009 at 07:01 PM.
    If the above doesn't make sense to you, ignore it, but remember it - might be useful!
    And if you just randomly taught yourself to program, well... you're just like me!

Similar Threads

  1. trouble in removing a value
    By jacline in forum New To Java
    Replies: 5
    Last Post: 03-20-2009, 06:56 PM
  2. removing reference
    By ajith_raj in forum Advanced Java
    Replies: 4
    Last Post: 02-12-2009, 12:46 PM
  3. Reproducing whitespace with DOM
    By doofuslarge in forum XML
    Replies: 1
    Last Post: 02-11-2009, 02:42 PM
  4. image removing
    By Triss in forum New To Java
    Replies: 3
    Last Post: 01-20-2008, 09:27 PM
  5. Removing characters
    By kDude in forum New To Java
    Replies: 3
    Last Post: 12-03-2007, 03:38 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •