Results 1 to 14 of 14
  1. #1
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default deep copying arraylist to add to a list

    i want to read some values from a file and then add to arraylist. then another list of arraylist to keep these values. then after that i've a list of list.

    but add() of the list only adds the shallow(clone) copy of the list. so everytime i clear my arraylist, all my lists values turns to null.

    is there a way to add a deep copy of the arraylist to my list? can anyone help?

    __________________________________________________ _
    List<List> sentences = new ArrayList<List>();
    StreamTokenizer token = new StreamTokenizer(new FileReader("some file.txt));


    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {

    List<List> each_sentence = new ArrayList<List>();
    ArrayList<String> iob = new ArrayList<String> ();
    ArrayList<String> pos = new ArrayList<String> ();

    iob.add(token.sval);
    token.nextToken();
    iob.add(token.sval);
    token.nextToken();

    pos.add(token.sval);
    token.nextToken();
    pos.add(token.sval);
    token.nextToken();

    each_sentence.add(iob);
    each_sentence.add(pos);

    sentences.add(each_sentence);
    iob.clear();
    pos.clear();
    }

    ______________________________________________

  2. #2
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    why do you feel compelled to clear iob and pos? Just leave them be.

  3. #3
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    Why are you calling new on each
    Java Code:
    f(token.ttype==StreamTokenizer.TT_WORD)
    ?....

    You get a new list that way, where did the old one go?...
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  4. #4
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    oh i think i've post the wrong condition. let me post the full code.

    ___________________________________________
    List<List> sentences = new ArrayList<List>();

    List<List> each_sentence = new ArrayList<List>();
    ArrayList<String> iob = new ArrayList<String> ();
    ArrayList<String> pos = new ArrayList<String> ();

    try
    {

    StreamTokenizer token = new StreamTokenizer(new FileReader("some_file.text"));
    token.resetSyntax();
    token.ordinaryChar(' ');
    token.wordChars(33,126);
    token.wordChars(48,57);
    //token.quoteChar(9);



    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {

    iob.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word.add(token.sval);
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence.add(iob); each_sentence.add(pos);
    each_sentence.add(word);
    sentences.add(each_sentence);
    iob.clear();
    pos.clear();
    word.clear();
    }
    else
    token.pushBack();


    }}
    __________________________________________________ _____
    //"some_file.text"

    xxx yyy zzz
    oas asd dfg

    der trg dft
    erb thy erg
    __________________________________________________ _____

    i want to save xxx, oas should be in the arraylist <iob>,
    then yyy, asd in arraylist <pos>
    and zzz,dfg in arraylist <word>

    these 3 list should be under the list of arraylist <each_sentence>
    and each sentence is an element of <sentences>.

    one problem is i need to clear it so that i can read the other <each_sentence>. if not the 2nd <each_sentence> will be made up of <each_sentence>1 and <each_sentence>2.

  5. #5
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,578
    Rep Power
    25

    Default

    To debug your code, you need to add some println() statements to it to show how the values are changing and where the execution flow goes.

  6. #6
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    i've tried printing, the error is every time i clear my arraylist, my list of arraylist gets cleared too. i want to make a deep copy so that such problem with clone(shallow) copy won't occur.

  7. #7
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,578
    Rep Power
    25

    Default

    Instead of copying and clearing the ArrayList, just create a new one, leaving the old ones in the List.
    Otherwise can you write a short, simple program that compiles and executes to demonstrate your problem and post it? No need for a file or StreamTokenizer. put everything in one program.
    Last edited by Norm; 10-06-2008 at 07:33 PM.

  8. #8
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    thanks for the replies, here's my program i've made

    _______________________________________
    import java.io.*;
    import java.util.*;
    import java.lang.*;
    import java.util.ArrayList;
    import java.util.Collections;


    public class gather_data
    {


    public static void main(String args[]) throws Exception
    {
    int iobcount=0;int storecount=0;
    List<List> sentences = new ArrayList<List>();
    ArrayList<String> store = new ArrayList<String> ();

    String filename[] = new String[100];
    for(int i=0;i<100;i++) filename[i] = "";
    /* for(int i=0;i<100;i++)
    {
    filename[i] = "";
    String jstring = Integer.toString(i+100);
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    }*/

    for(int i=0;i<100;i++)
    {
    if(i<10)
    {
    int j=i+1;
    String jstring = Integer.toString(j);
    filename[i] = filename[i].concat("00");
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    // System.out.println(filename[i]);
    }
    else
    {
    int j=i+1;
    String jstring = Integer.toString(j);
    filename[i] = filename[i].concat("0");
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    // System.out.println(filename[i]);
    }
    }


    for(int i=0; i<100; i++)
    {

    List<List> each_sentence = new ArrayList<List>();
    ArrayList<String> iob = new ArrayList<String> ();
    ArrayList<String> pos = new ArrayList<String> ();
    ArrayList<String> word = new ArrayList<String> ();

    try
    {
    StreamTokenizer token = new StreamTokenizer(new FileReader(filename[i]));
    //StreamTokenizer token = new StreamTokenizer(new FileReader("008.txt"));
    token.resetSyntax();
    token.ordinaryChar(' ');
    token.wordChars(33,126);
    token.wordChars(48,57);
    //token.quoteChar(9);



    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {

    iob.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {

    each_sentence.add(iob);
    each_sentence.add(pos);
    each_sentence.add(word);
    sentences.add(each_sentence);
    sentencecount++;
    iob.clear();
    pos.clear();
    word.clear();
    System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();


    }

    /* try {
    BufferedWriter out = new BufferedWriter(new FileWriter("combined", true));
    out.write(iob.get(iobcount-1));
    out.write("\t");
    out.write(pos.get(iobcount-1));
    out.write("\t");
    out.write(word.get(iobcount-1));
    out.write("\n");
    out.close();
    } catch (IOException e) {}
    */

    }

    // System.out.println(sentences.get(sentencecount-1));
    } catch (IOException e) {}





    }

    }

    }


    __________________________________________________ _____

    B-NP NNP Pierre
    I-NP NNP Vinken
    O COMMA COMMA
    B-NP CD 61
    I-NP NNS years
    B-ADJP JJ old
    O COMMA COMMA
    B-VP MD will
    I-VP VB join
    B-NP DT the
    I-NP NN board
    B-PP IN as
    B-NP DT a
    I-NP JJ nonexecutive
    I-NP NN director
    B-NP NNP Nov.
    I-NP CD 29
    O . .

    B-NP NNP Mr.
    I-NP NNP Vinken
    B-VP VBZ is
    B-NP NN chairman
    B-PP IN of
    B-NP NNP Elsevier
    I-NP NNP N.V.
    O COMMA COMMA
    B-NP DT the
    I-NP NNP Dutch
    I-NP VBG publishing
    I-NP NN group
    O . .

    ________________________________________________

    that is one of the files i've read. i'm suppose to read in these values into sentences object or a list of sentences. is this clearer for you guys?

  9. #9
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    This is clearer, but I need some overview:
    one problem is i need to clear it so that i can read the other
    in post of 10-05-2008, 02:54 PM along with more complete code brings to mind the question of why the need to clear something. In general, I have arrvied at the syntax:if(condition){doSomething();}else{next} which I still cannot determine from code what and why we need to clear what and where.

    It looks as though you are doing a student exercise designed to read a file and pull paticular information for a report. Is this what you are trying to achieve?
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  10. #10
    Norm's Avatar
    Norm is offline Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,578
    Rep Power
    25

    Default

    Instead of copying and clearing the ArrayList, just create a new one, leaving the old ones in the List.

  11. #11
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    i'm trying to write a text chunker to do natural language processing. but before that i need to extract some data and put them into arraylist. the text file is part of a corpus i've.

    so what i'm trying to achieve is to put the columns into an arraylist. and group the 3 columns into a list. i'm using arraylist since i do not know how many words there are in a sentence. since every sentence is seperated by a newline i'll need to clear my iob,pos and word to store the next sentence.

    if i were to keep creating ArrayList i'll still end up with a List of null after i get out for the for loop. i need it to stay stored. that is why i've a <sentence> list that should store the values. but it stored the pointers inside of the value itself because i cannot deep copy the elements of the arraylist into a list.

    if we were to strip the idea of using arraylist or list, is there another way to extract the data?

  12. #12
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    i've tried norm suggestion and did this. it's impossible to create array of list since it's against generic rules. i can't seem to get the syntax to create an array of arraylist. so this is what i've comed up with. it works. but sometimes i removes a few sentence from my data and then i get some sentence.get() with no strings...

    any idea what to do with it. this is like the hardcoded version of deep copy by creating lots of copy.

    ________________________________________
    import java.io.*;
    import java.util.*;
    import java.lang.*;
    import java.util.ArrayList;
    import java.util.Collections;


    public class gather_data
    {

    public static void main(String args[]) throws Exception
    {
    int iobcount=0;int storecount=0; int sentencecount = 0;
    List<List> sentences = new ArrayList<List>();

    String filename[] = new String[100];
    for(int i=0;i<100;i++) filename[i] = "";
    /* for(int i=0;i<100;i++)
    {
    filename[i] = "";
    String jstring = Integer.toString(i+100);
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    }*/

    for(int i=0;i<100;i++)
    {
    if(i<10)
    {
    int j=i+1;
    String jstring = Integer.toString(j);
    filename[i] = filename[i].concat("00");
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    // System.out.println(filename[i]);
    }
    else
    {
    int j=i+1;
    String jstring = Integer.toString(j);
    filename[i] = filename[i].concat("0");
    filename[i] = filename[i].concat(jstring);
    filename[i] = filename[i].concat(".txt");
    // System.out.println(filename[i]);
    }
    }


    for(int i=0; i<100; i++)
    {

    List<List> each_sentence = new ArrayList<List>();
    List<List> each_sentence2 = new ArrayList<List>();
    List<List> each_sentence3 = new ArrayList<List>();
    List<List> each_sentence4 = new ArrayList<List>();
    List<List> each_sentence5 = new ArrayList<List>();
    List<List> each_sentence6 = new ArrayList<List>();
    List<List> each_sentence7 = new ArrayList<List>();
    List<List> each_sentence8 = new ArrayList<List>();
    List<List> each_sentence9 = new ArrayList<List>();
    List<List> each_sentence10 = new ArrayList<List>();
    List<List> each_sentence11 = new ArrayList<List>();
    List<List> each_sentence12 = new ArrayList<List>();
    List<List> each_sentence13 = new ArrayList<List>();
    List<List> each_sentence14 = new ArrayList<List>();
    List<List> each_sentence15 = new ArrayList<List>();
    List<List> each_sentence16 = new ArrayList<List>();
    List<List> each_sentence17 = new ArrayList<List>();

    ArrayList<String> iob = new ArrayList<String> ();
    ArrayList<String> pos = new ArrayList<String> ();
    ArrayList<String> word = new ArrayList<String> ();
    ArrayList<String> iob2 = new ArrayList<String> ();
    ArrayList<String> pos2 = new ArrayList<String> ();
    ArrayList<String> word2 = new ArrayList<String> ();
    ArrayList<String> iob3 = new ArrayList<String> ();
    ArrayList<String> pos3 = new ArrayList<String> ();
    ArrayList<String> word3 = new ArrayList<String> ();
    ArrayList<String> iob4 = new ArrayList<String> ();
    ArrayList<String> pos4 = new ArrayList<String> ();
    ArrayList<String> word4 = new ArrayList<String> ();
    ArrayList<String> iob5 = new ArrayList<String> ();
    ArrayList<String> pos5 = new ArrayList<String> ();
    ArrayList<String> word5 = new ArrayList<String> ();
    ArrayList<String> iob6 = new ArrayList<String> ();
    ArrayList<String> pos6 = new ArrayList<String> ();
    ArrayList<String> word6 = new ArrayList<String> ();
    ArrayList<String> iob7 = new ArrayList<String> ();
    ArrayList<String> pos7 = new ArrayList<String> ();
    ArrayList<String> word7 = new ArrayList<String> ();
    ArrayList<String> iob8 = new ArrayList<String> ();
    ArrayList<String> pos8 = new ArrayList<String> ();
    ArrayList<String> word8 = new ArrayList<String> ();
    ArrayList<String> iob9 = new ArrayList<String> ();
    ArrayList<String> pos9 = new ArrayList<String> ();
    ArrayList<String> word9 = new ArrayList<String> ();
    ArrayList<String> iob10 = new ArrayList<String> ();
    ArrayList<String> pos10 = new ArrayList<String> ();
    ArrayList<String> word10 = new ArrayList<String> ();
    ArrayList<String> iob11 = new ArrayList<String> ();
    ArrayList<String> pos11 = new ArrayList<String> ();
    ArrayList<String> word11 = new ArrayList<String> ();
    ArrayList<String> iob12 = new ArrayList<String> ();
    ArrayList<String> pos12 = new ArrayList<String> ();
    ArrayList<String> word12 = new ArrayList<String> ();
    ArrayList<String> iob13 = new ArrayList<String> ();
    ArrayList<String> pos13 = new ArrayList<String> ();
    ArrayList<String> word13 = new ArrayList<String> ();
    ArrayList<String> iob14 = new ArrayList<String> ();
    ArrayList<String> pos14 = new ArrayList<String> ();
    ArrayList<String> word14 = new ArrayList<String> ();
    ArrayList<String> iob15 = new ArrayList<String> ();
    ArrayList<String> pos15 = new ArrayList<String> ();
    ArrayList<String> word15 = new ArrayList<String> ();
    ArrayList<String> iob16 = new ArrayList<String> ();
    ArrayList<String> pos16 = new ArrayList<String> ();
    ArrayList<String> word16 = new ArrayList<String> ();



    try
    {
    StreamTokenizer token = new StreamTokenizer(new FileReader(filename[i]));
    //StreamTokenizer token = new StreamTokenizer(new FileReader("008.txt"));
    token.resetSyntax();
    token.ordinaryChar(' ');
    token.wordChars(33,126);
    token.wordChars(48,57);
    //token.quoteChar(9);


    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    // System.out.println(token.sval);
    iob.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word.add(token.sval);
    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {

    each_sentence.add(iob);
    each_sentence.add(pos);
    each_sentence.add(word);
    sentences.add(each_sentence);
    sentencecount++;
    // System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob2.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos2.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word2.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence2.add(iob2);
    each_sentence2.add(pos2);
    each_sentence2.add(word2);
    sentences.add(each_sentence2);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob3.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos3.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word3.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence3.add(iob3);
    each_sentence3.add(pos3);
    each_sentence3.add(word3);
    sentences.add(each_sentence3);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob4.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos4.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word4.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence4.add(iob4);
    each_sentence4.add(pos4);
    each_sentence4.add(word4);
    sentences.add(each_sentence4);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob5.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos5.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word5.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence5.add(iob5);
    each_sentence5.add(pos5);
    each_sentence5.add(word5);
    sentences.add(each_sentence5);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob4.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos4.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word4.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence6.add(iob6);
    each_sentence6.add(pos6);
    each_sentence6.add(word6);
    sentences.add(each_sentence6);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob7.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos7.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word7.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence7.add(iob7);
    each_sentence7.add(pos7);
    each_sentence7.add(word7);
    sentences.add(each_sentence7);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob8.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos8.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word8.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence8.add(iob8);
    each_sentence8.add(pos8);
    each_sentence8.add(word8);
    sentences.add(each_sentence8);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob9.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos9.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word9.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence9.add(iob9);
    each_sentence9.add(pos9);
    each_sentence9.add(word9);
    sentences.add(each_sentence9);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob10.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos10.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word10.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence10.add(iob10);
    each_sentence10.add(pos10);
    each_sentence10.add(word10);
    sentences.add(each_sentence10);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}
    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob11.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos11.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word11.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence11.add(iob11);
    each_sentence11.add(pos11);
    each_sentence11.add(word11);
    sentences.add(each_sentence11);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob12.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos12.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word12.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence12.add(iob12);
    each_sentence12.add(pos12);
    each_sentence12.add(word12);
    if(each_sentence == null) break;
    sentences.add(each_sentence12);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob13.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos13.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word13.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence13.add(iob13);
    each_sentence13.add(pos13);
    each_sentence13.add(word13);
    sentences.add(each_sentence13);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob14.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos14.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word14.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence14.add(iob14);
    each_sentence14.add(pos14);
    each_sentence14.add(word14);
    sentences.add(each_sentence14);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}

    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob15.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos15.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word15.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence15.add(iob15);
    each_sentence15.add(pos15);
    each_sentence15.add(word15);
    sentences.add(each_sentence15);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}
    while(token.nextToken()!= StreamTokenizer.TT_EOF)
    {
    if(token.ttype==StreamTokenizer.TT_WORD)
    {
    //System.out.println(token.sval);
    iob16.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    pos16.add(token.sval);
    token.nextToken();
    token.nextToken();
    token.nextToken();
    word16.add(token.sval);

    // System.out.println(word.get(2));
    }

    if(token.ttype ==StreamTokenizer.TT_EOL)
    {
    if(token.nextToken()==StreamTokenizer.TT_EOL)
    {
    each_sentence16.add(iob16);
    each_sentence16.add(pos16);
    each_sentence16.add(word16);
    sentences.add(each_sentence16);
    sentencecount++;
    //System.out.println(sentences.get(sentencecount-1));
    break;
    // iob.clear();
    // pos.clear();
    // word.clear();
    // System.out.println(sentences.get(sentencecount-1));
    }
    else
    token.pushBack();
    }}



    // System.out.println(sentences.get(sentencecount-1));
    } catch (IOException e) {}





    }

    }

    }

    __________________________________________________ __

  13. #13
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    NLP is challenging ( to say the least ) given the skills displayed by declaration of numerous Collections, probably study some compiler science for a while as that is the front end for most compilers. There is a syntax that looks something like < < >> with which one can do lists of lists or otherwise what leads to n-dimensioned arrays, but that is clumsy and I suggest study of Stacks, Lists, Maps and realated Collections rather than trying to push the cart with hardcode.
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  14. #14
    alvations is offline Member
    Join Date
    Oct 2008
    Posts
    19
    Rep Power
    0

    Default

    thanks guys, i guess for this NLP project, i'm just going to pick out 20 sentences for each textfile from the corpus. i'll go and study more on Collections and Mapping LinkList but for now 20 sentences from each file sounds reasonable. hahaha. i could say i'm trying to do random sampling...

    I've read up on deep copying the linklist and the simpliest way was to manipulate memorybuffer and copy data from there. i guess that's too much for this project too.

    oh by the way, i'm trying to code a simplistics chunker that detects Noun Phrases and Verb Phrases from Sentences. it's a transformation based learning chunker inspired by Ramshaw. i saw his chunker and he uses lots of hashtables to store his data, due to my lack of knowledge of hashtables, i've chosen list class as my data type.

    guess there's still a long way to go before i'm good enough to do proper NLP, but for now i'll just make do with hardcode. cos of the deadline for my projects.

    thanks guys

Similar Threads

  1. List and ArrayList
    By ravian in forum New To Java
    Replies: 3
    Last Post: 07-27-2011, 05:05 PM
  2. Deep Copy Test
    By Java Tip in forum java.lang
    Replies: 0
    Last Post: 04-16-2008, 11:05 PM
  3. Copying Eclipse Plugins
    By gapper in forum Eclipse
    Replies: 1
    Last Post: 01-23-2008, 10:37 AM
  4. Replies: 0
    Last Post: 12-28-2007, 12:25 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •