Results 1 to 4 of 4
  1. #1
    bezudar is offline Member
    Join Date
    Nov 2008
    Posts
    7
    Rep Power
    0

    Default Compare 2 large files

    Hi,
    i want to compare 2 large files both around 300 mb. They are text files having only one column of numbers. I wish to get the unique values in file2. I tried using diff but it gave an error of memory exhausted. Both files are sorted and i am running on a 1gb ram core 2 duo 2ghz. Help!!
    i tried using java too by taking 100 records at a time and comparing
    getting this error
    Exception in thread "main" java.lang.StackOverflowError
    at java.nio.Buffer.<init>(Unknown Source)
    at java.nio.CharBuffer.<init>(Unknown Source)
    at java.nio.HeapCharBuffer.<init>(Unknown Source)
    at java.nio.CharBuffer.wrap(Unknown Source)
    at sun.nio.cs.StreamDecoder$CharsetSD.implRead(Unknow n Source)
    at sun.nio.cs.StreamDecoder.read(Unknown Source)
    at java.io.InputStreamReader.read(Unknown Source)
    at java.io.BufferedReader.fill(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    at java.io.BufferedReader.readLine(Unknown Source)
    Thanks in advance
    AC

    my code is
    import java.io.BufferedReader;
    import java.io.BufferedWriter;
    import java.io.File;
    import java.io.FileReader;
    import java.io.FileWriter;
    import java.io.IOException;
    import java.io.LineNumberReader;
    import java.io.RandomAccessFile;
    import java.io.Reader;
    import java.util.ArrayList;
    import java.util.Stack;

    import com.sun.org.apache.bcel.internal.generic.NEW;


    public class Try1 {
    int count1=0;
    int count2=0;
    int count3=0;
    int count4=0;
    File file1 = new File("c:\\1.txt");
    BufferedReader bufferReader1;
    ArrayList<Long> tempArrayList2 = new ArrayList<Long>();
    File file2 = new File("c:\\3.txt");
    BufferedWriter bufferwriter;
    File file = new File("c:\\2.txt");
    BufferedReader bufferReader;

    public void init(){
    try {
    bufferwriter = new BufferedWriter(new FileWriter(file2));
    bufferReader1 = new BufferedReader(new FileReader(
    file1));

    bufferReader = new BufferedReader(new FileReader(
    file));


    BufferedReader bufferReader3 = new BufferedReader(new FileReader(
    file));
    String readLine = null;
    RandomAccessFile randFile = new RandomAccessFile(file, "r");
    long lastRec = randFile.length();
    randFile.close();

    LineNumberReader lineRead = new LineNumberReader(bufferReader);
    lineRead.skip(lastRec);
    countRec = lineRead.getLineNumber() + 1;
    } catch (IOException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
    }
    }

    public void readBlacklistFile(Stack<Long> tempArrayList){
    ArrayList<Long> tempArrayList1 = new ArrayList<Long>();
    String readLine = null;


    try{


    LineNumberReader lineNumber= new LineNumberReader(bufferReader1);
    while ((readLine = bufferReader1.readLine()) !=null) {
    Long tempL = Long.parseLong(readLine.trim());
    count2++;
    System.out.println(tempArrayList.get(tempArrayList .size()-1));
    if(tempL <=tempArrayList.get(tempArrayList.size()-1)){
    //System.out.println("abhi");
    tempArrayList1.add(tempL);
    //System.out.println(tempArrayList1.get(count2-1));
    }else {
    //count3 = lineNumber.getLineNumber();
    System.out.println("kals");
    break;
    }

    }
    for(int i=0;i<tempArrayList.size();i++){
    System.out.println("final");
    long l = tempArrayList.get(i);
    tempArrayList2.add(l);
    if(tempArrayList1.contains(l)){
    System.out.println("kas");
    tempArrayList2.remove(l);
    }

    }
    System.out.println("sasas");
    tempArrayList.clear();
    tempArrayList1.clear();
    for(int i=0;i<tempArrayList2.size();i++){
    System.out.println("hello");
    bufferwriter.write(""+tempArrayList2.get(i));
    bufferwriter.newLine();
    }
    tempArrayList2.clear();
    readBaseFile("");

    }catch(Exception e){
    e.printStackTrace();
    }
    }
    public void readBaseFile(String filePath){
    String readLine = null;
    Stack<Long> tempArrayList = new Stack<Long>();
    try{
    System.out.println("aaaaaaaaaaa");
    while (((readLine = bufferReader.readLine()) != null)) {
    count1++;

    String s=readLine.trim();
    long temp = Long.parseLong(s);
    tempArrayList.add(temp);

    if(count1%100==0){
    readBlacklistFile(tempArrayList);
    tempArrayList.clear();
    break;
    }



    }
    }catch(Exception e){
    e.printStackTrace();
    }
    }
    public static void main(String[] args) {
    Try1 test1 = new Try1();
    test1.init();
    test1.readBaseFile("c:\\1.txt");
    }

    }

  2. #2
    fishtoprecords's Avatar
    fishtoprecords is offline Senior Member
    Join Date
    Jun 2008
    Posts
    571
    Rep Power
    7

    Default

    What are you trying to do? tell if they are identical or different?

    or tell exactly what is different?

    In either case, you can't just load the whole file(s) into memory, you have to be smarter than that.

    To do #1, all you have to do is calculate a SHA1 checksum. If the checksum is the same, the files are the same. You can either read in each line and pass it to the java.crypto sha function, or call an external program such as sha1sum to do it for you.

    Doing a diff is actually hard. Its not hard to read a line from each file and see if they are identical. That's easy. But if one file has one line deleted, it will show up as different. The challenge is to get back in sync and see if the rest of the files match.

  3. #3
    bezudar is offline Member
    Join Date
    Nov 2008
    Posts
    7
    Rep Power
    0

    Default

    dont wanna check files but wanna check file contents if record is present in file1 and file2 irrespective of line number dont want tht only want unique records of file2

  4. #4
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    Then we can do TreeMap or TreeSet of File 2 and do a List for file one. Combinatorics has many nook and cranny:
    Analytic Combinatorics
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

Similar Threads

  1. Getting rid of commas in large numbers?
    By wwuster in forum Advanced Java
    Replies: 12
    Last Post: 03-05-2012, 11:35 AM
  2. OutofMemory while downloading large files through FTP
    By deb_santanu in forum Advanced Java
    Replies: 0
    Last Post: 11-13-2008, 02:04 PM
  3. Need to find large files and folder on the PC. What app needed?
    By Cleaner007 in forum Reviews / Advertising
    Replies: 1
    Last Post: 09-30-2008, 08:06 PM
  4. Eclipse with VERY LARGE source trees
    By wyrickre in forum Eclipse
    Replies: 0
    Last Post: 02-01-2008, 03:23 AM
  5. Replies: 1
    Last Post: 07-26-2007, 08:28 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •