Results 1 to 16 of 16
  1. #1
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default Remove duplicate lines from a text file

    Hi,

    This is my first post. :)

    I am trying to remove duplicate lines from a text file. To make things difficult the lines contain non unique timestamps but a unique reference number. Some of the duplicates amount to 10 lines whereas others can only be 2 lines.

    1. Here are some examples of duplicates lines: <timestamp>,<reference>,<error message>

    08:47:22,95847170050,Problem inputting data.
    08:47:29,95847170050,Problem inputting data.
    08:47:35,95847170050,Problem inputting data.
    08:53:28, 96672540040, More problems inputting data.
    08:53:35, 96672540040, More problems inputting data.
    08:53:41, 96672540040, More problems inputting data.

    I want to delete all but the most recent duplicate line.

    I am new to java so can you tell what the best way of doing this is? :(

    Thank you in advance.

  2. #2
    hardwired's Avatar
    hardwired is offline Senior Member
    Join Date
    Jul 2007
    Posts
    1,576
    Rep Power
    8

    Default

    Java Code:
    import java.io.*;
    import java.text.*;
    import java.util.*;
    
    public class PurgeTest {
        static DateFormat df = new SimpleDateFormat("HH:mm:ss");
    
        public static void main(String[] args) {
            String source = "purgeTest.txt";
            List<String> allLines = new ArrayList<String>();
            List<String> references = new ArrayList<String>();
            // Read in the file and load lists.
            readData(source, allLines, references);
            // What did we get?
            print(allLines, "allLines");
            print(references, "references");
            // Process data.
            List<String> latest = getLatestEntries(allLines, references);
            print(latest, "latest");
            // Write out to file.
            String dest = "purgeTestOutput.txt";
            writeToFile(dest, latest);
        }
    
        private static void readData(String path, List<String> allLines,
                                     List<String> references) {
            try {
                File file = new File(path);
                BufferedReader br = new BufferedReader(
                                    new InputStreamReader(
                                    new FileInputStream(file)));
                String line;
                while((line = br.readLine()) != null) {
                    allLines.add(line);
                    // Collect unique references.
                    String[] s = line.split(",");
                    //System.out.printf("s = %s  s[1] = %s%n",
                    //                   Arrays.toString(s), s[1]);
                    if(!references.contains(s[1])) {
                        references.add(s[1]);
                    }
                }
                br.close();
            } catch(IOException e) {
                System.out.println("read error: " + e.getMessage());
            }
        }
    
        private static List<String> getLatestEntries(List<String> allLines,
                                                     List<String> references) {
            // For each reference, save the latest entry.
            List<String> list = new ArrayList<String>();
            for(int i = 0; i < references.size(); i++) {
                String ref = references.get(i);
                Date date = null;
                int maxValIndex = i;
                //System.out.printf("ref = %s%n", ref);
                for(int j = 0; j < allLines.size(); j++) {
                    String next = allLines.get(j);
                    if(next.split(",")[1].equals(ref)) {
                        Date nextDate = parse(next.split(",")[0]);
                        if(date == null) {
                            date = nextDate;
                            maxValIndex = j;
                            continue;
                        }
                        if(nextDate.compareTo(date) > 0) {
                            date = nextDate;
                            maxValIndex = j;
                        }
                    }
                }
                list.add(allLines.get(maxValIndex));
            }
            return list;
        }
    
        private static Date parse(String s) {
            try {
                return df.parse(s);
            } catch(ParseException e) {
                System.out.printf("parse error for %s: %s%n",
                                   s, e.getMessage());
                return null;
            }
        }
    
        private static void writeToFile(String path, List<String> list) {
            try {
                File file = new File(path);
                BufferedWriter bw = new BufferedWriter(
                                    new OutputStreamWriter(
                                    new FileOutputStream(file)));
                for(int i = 0; i < list.size(); i++) {
                    String s = list.get(i);
                    bw.write(s, 0, s.length());
                    bw.newLine();
                }
                bw.close();
            } catch(IOException e) {
                System.out.println("write error: " + e.getMessage());
            }
        }
    
        private static void print(List<String> list, String s) {
            System.out.println(s + " =");
            for(int i = 0; i < list.size(); i++) {
                System.out.println(list.get(i));
            }
            System.out.println("---------------");
        }
    }
    purgeTest.txt
    Java Code:
    08:47:22,95847170050,Problem inputting data.
    08:47:29,95847170050,Problem inputting data.
    08:47:35,95847170050,Problem inputting data.
    08:53:28, 96672540040, More problems inputting data.
    08:53:35, 96672540040, More problems inputting data.
    08:53:41, 96672540040, More problems inputting data.

  3. #3
    fishtoprecords's Avatar
    fishtoprecords is offline Senior Member
    Join Date
    Jun 2008
    Posts
    571
    Rep Power
    7

    Default

    on a good OS, no need to write a program, just do
    sort < infile.txt | uniq

  4. #4
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

  5. #5
    fishtoprecords's Avatar
    fishtoprecords is offline Senior Member
    Join Date
    Jun 2008
    Posts
    571
    Rep Power
    7

    Default

    Quote Originally Posted by Eranga View Post
    What you mean 'good OS', I'm not clear you.
    I didn't want to heat up any flame wars. :-)

    Any OS with Unix tools, which can mean any Linux, BSD, Unix, Apple OS-X, or even
    Windows with the cygwin package.

  6. #6
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

    Default

    Got the point. But rather using additions, what is the effect use of a code. I think it's better.

  7. #7
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default

    Thankyou hardwired.

    When I compile the code I get the following errors and I can't see what is wrong.

    PurgeTest.java:10: '(' or '[' expected
    List<String> allLines = new ArrayList<String>();
    ^
    PurgeTest.java:11: '(' or '[' expected
    List<String> references = new ArrayList<String>();
    ^
    PurgeTest.java:25: <identifier> expected
    private static void readData(String path, List<String> allLines,List<String>
    references) {
    ^
    PurgeTest.java:111: ')' expected
    }

    I am compiling against j2sdk1.4.2_17 due to work constraints.

  8. #8
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

  9. #9
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default

    But I still get the same errors on compile.

    I am using j2sdk1.4.2_17 due to work constraints. Could this be the problem?

  10. #10
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

    Default

    JDK version is not effected on this. Because List are valid from around 1.1, if I'm remember it correctly.

    Try define the List as ArrayList and see. But it's obvious. Most of the time it can't help you.

  11. #11
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default

    I have redefined List as ArrayList and I get the same problems.

    >javac PurgeTest.java
    PurgeTest.java:10: '(' or '[' expected
    ArrayList<String> allLines = new ArrayList<String>();
    ^
    PurgeTest.java:11: '(' or '[' expected
    ArrayList<String> references = new ArrayList<String>();
    ^
    PurgeTest.java:25: <identifier> expected
    private static void readData(String path, ArrayList<String> allLines, ArrayList<String> references) {
    ^
    PurgeTest.java:111: ')' expected
    }
    ^
    4 errors

  12. #12
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,372
    Blog Entries
    1
    Rep Power
    19

  13. #13
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default

    In line 111 I have a "}" and nothing else.

  14. #14
    daGame is offline Member
    Join Date
    May 2008
    Posts
    24
    Rep Power
    0

    Default

    Quote Originally Posted by Dirt.Diver View Post
    But I still get the same errors on compile.

    I am using j2sdk1.4.2_17 due to work constraints. Could this be the problem?
    The error is because j2sdk 1.4.2 doesnt support Generics, just remove the <String> things from the list<String> and then try. Generics are supported from version 1.5 in java.

    regards

  15. #15
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

    Default

    Studying further I have found that generics was introduced in 1.5, (the stuff in angle brackets).

    I have removed them.

    Redefined "List" to "ArrayList" and placed a cast in appropriate places and I have got this to work.

    Thank you

  16. #16
    Dirt.Diver is offline Member
    Join Date
    Jun 2008
    Posts
    7
    Rep Power
    0

Similar Threads

  1. how can we remove blank lines from a .txt
    By Camden in forum New To Java
    Replies: 12
    Last Post: 07-29-2011, 01:38 PM
  2. concate all duplicate line in a file.
    By vaskarbasak in forum Advanced Java
    Replies: 0
    Last Post: 06-02-2008, 12:49 PM
  3. Replies: 0
    Last Post: 04-06-2008, 07:45 PM
  4. how to remove whitespaces in a text
    By christina in forum New To Java
    Replies: 2
    Last Post: 08-03-2007, 05:24 PM
  5. How to remove Control Characters from an input file?
    By renjan in forum Advanced Java
    Replies: 0
    Last Post: 08-01-2007, 03:33 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •