Results 1 to 14 of 14
- 10-05-2008, 09:54 PM #1
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
deep copying arraylist to add to a list
i want to read some values from a file and then add to arraylist. then another list of arraylist to keep these values. then after that i've a list of list.
but add() of the list only adds the shallow(clone) copy of the list. so everytime i clear my arraylist, all my lists values turns to null.
is there a way to add a deep copy of the arraylist to my list? can anyone help?
__________________________________________________ _
List<List> sentences = new ArrayList<List>();
StreamTokenizer token = new StreamTokenizer(new FileReader("some file.txt));
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
List<List> each_sentence = new ArrayList<List>();
ArrayList<String> iob = new ArrayList<String> ();
ArrayList<String> pos = new ArrayList<String> ();
iob.add(token.sval);
token.nextToken();
iob.add(token.sval);
token.nextToken();
pos.add(token.sval);
token.nextToken();
pos.add(token.sval);
token.nextToken();
each_sentence.add(iob);
each_sentence.add(pos);
sentences.add(each_sentence);
iob.clear();
pos.clear();
}
______________________________________________
-
why do you feel compelled to clear iob and pos? Just leave them be.
- 10-05-2008, 11:08 PM #3
Why are you calling new on each
?....Java Code:f(token.ttype==StreamTokenizer.TT_WORD)
You get a new list that way, where did the old one go?...Introduction to Programming Using Java.
Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor
- 10-06-2008, 03:30 AM #4
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
oh i think i've post the wrong condition. let me post the full code.
___________________________________________
List<List> sentences = new ArrayList<List>();
List<List> each_sentence = new ArrayList<List>();
ArrayList<String> iob = new ArrayList<String> ();
ArrayList<String> pos = new ArrayList<String> ();
try
{
StreamTokenizer token = new StreamTokenizer(new FileReader("some_file.text"));
token.resetSyntax();
token.ordinaryChar(' ');
token.wordChars(33,126);
token.wordChars(48,57);
//token.quoteChar(9);
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
iob.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word.add(token.sval);
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence.add(iob); each_sentence.add(pos);
each_sentence.add(word);
sentences.add(each_sentence);
iob.clear();
pos.clear();
word.clear();
}
else
token.pushBack();
}}
__________________________________________________ _____
//"some_file.text"
xxx yyy zzz
oas asd dfg
der trg dft
erb thy erg
__________________________________________________ _____
i want to save xxx, oas should be in the arraylist <iob>,
then yyy, asd in arraylist <pos>
and zzz,dfg in arraylist <word>
these 3 list should be under the list of arraylist <each_sentence>
and each sentence is an element of <sentences>.
one problem is i need to clear it so that i can read the other <each_sentence>. if not the 2nd <each_sentence> will be made up of <each_sentence>1 and <each_sentence>2.
- 10-06-2008, 02:02 PM #5
To debug your code, you need to add some println() statements to it to show how the values are changing and where the execution flow goes.
- 10-06-2008, 06:58 PM #6
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
i've tried printing, the error is every time i clear my arraylist, my list of arraylist gets cleared too. i want to make a deep copy so that such problem with clone(shallow) copy won't occur.
- 10-06-2008, 07:18 PM #7
Instead of copying and clearing the ArrayList, just create a new one, leaving the old ones in the List.
Otherwise can you write a short, simple program that compiles and executes to demonstrate your problem and post it? No need for a file or StreamTokenizer. put everything in one program.Last edited by Norm; 10-06-2008 at 07:33 PM.
- 10-07-2008, 04:50 AM #8
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
thanks for the replies, here's my program i've made
_______________________________________
import java.io.*;
import java.util.*;
import java.lang.*;
import java.util.ArrayList;
import java.util.Collections;
public class gather_data
{
public static void main(String args[]) throws Exception
{
int iobcount=0;int storecount=0;
List<List> sentences = new ArrayList<List>();
ArrayList<String> store = new ArrayList<String> ();
String filename[] = new String[100];
for(int i=0;i<100;i++) filename[i] = "";
/* for(int i=0;i<100;i++)
{
filename[i] = "";
String jstring = Integer.toString(i+100);
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
}*/
for(int i=0;i<100;i++)
{
if(i<10)
{
int j=i+1;
String jstring = Integer.toString(j);
filename[i] = filename[i].concat("00");
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
// System.out.println(filename[i]);
}
else
{
int j=i+1;
String jstring = Integer.toString(j);
filename[i] = filename[i].concat("0");
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
// System.out.println(filename[i]);
}
}
for(int i=0; i<100; i++)
{
List<List> each_sentence = new ArrayList<List>();
ArrayList<String> iob = new ArrayList<String> ();
ArrayList<String> pos = new ArrayList<String> ();
ArrayList<String> word = new ArrayList<String> ();
try
{
StreamTokenizer token = new StreamTokenizer(new FileReader(filename[i]));
//StreamTokenizer token = new StreamTokenizer(new FileReader("008.txt"));
token.resetSyntax();
token.ordinaryChar(' ');
token.wordChars(33,126);
token.wordChars(48,57);
//token.quoteChar(9);
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
iob.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence.add(iob);
each_sentence.add(pos);
each_sentence.add(word);
sentences.add(each_sentence);
sentencecount++;
iob.clear();
pos.clear();
word.clear();
System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}
/* try {
BufferedWriter out = new BufferedWriter(new FileWriter("combined", true));
out.write(iob.get(iobcount-1));
out.write("\t");
out.write(pos.get(iobcount-1));
out.write("\t");
out.write(word.get(iobcount-1));
out.write("\n");
out.close();
} catch (IOException e) {}
*/
}
// System.out.println(sentences.get(sentencecount-1));
} catch (IOException e) {}
}
}
}
__________________________________________________ _____
B-NP NNP Pierre
I-NP NNP Vinken
O COMMA COMMA
B-NP CD 61
I-NP NNS years
B-ADJP JJ old
O COMMA COMMA
B-VP MD will
I-VP VB join
B-NP DT the
I-NP NN board
B-PP IN as
B-NP DT a
I-NP JJ nonexecutive
I-NP NN director
B-NP NNP Nov.
I-NP CD 29
O . .
B-NP NNP Mr.
I-NP NNP Vinken
B-VP VBZ is
B-NP NN chairman
B-PP IN of
B-NP NNP Elsevier
I-NP NNP N.V.
O COMMA COMMA
B-NP DT the
I-NP NNP Dutch
I-NP VBG publishing
I-NP NN group
O . .
________________________________________________
that is one of the files i've read. i'm suppose to read in these values into sentences object or a list of sentences. is this clearer for you guys?
- 10-07-2008, 01:05 PM #9
This is clearer, but I need some overview:
in post of 10-05-2008, 02:54 PM along with more complete code brings to mind the question of why the need to clear something. In general, I have arrvied at the syntax:if(condition){doSomething();}else{next} which I still cannot determine from code what and why we need to clear what and where.one problem is i need to clear it so that i can read the other
It looks as though you are doing a student exercise designed to read a file and pull paticular information for a report. Is this what you are trying to achieve?Introduction to Programming Using Java.
Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor
- 10-07-2008, 02:02 PM #10
Instead of copying and clearing the ArrayList, just create a new one, leaving the old ones in the List.
- 10-08-2008, 03:22 AM #11
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
i'm trying to write a text chunker to do natural language processing. but before that i need to extract some data and put them into arraylist. the text file is part of a corpus i've.
so what i'm trying to achieve is to put the columns into an arraylist. and group the 3 columns into a list. i'm using arraylist since i do not know how many words there are in a sentence. since every sentence is seperated by a newline i'll need to clear my iob,pos and word to store the next sentence.
if i were to keep creating ArrayList i'll still end up with a List of null after i get out for the for loop. i need it to stay stored. that is why i've a <sentence> list that should store the values. but it stored the pointers inside of the value itself because i cannot deep copy the elements of the arraylist into a list.
if we were to strip the idea of using arraylist or list, is there another way to extract the data?
- 10-08-2008, 05:40 AM #12
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
i've tried norm suggestion and did this. it's impossible to create array of list since it's against generic rules. i can't seem to get the syntax to create an array of arraylist. so this is what i've comed up with. it works. but sometimes i removes a few sentence from my data and then i get some sentence.get() with no strings...
any idea what to do with it. this is like the hardcoded version of deep copy by creating lots of copy.
________________________________________
import java.io.*;
import java.util.*;
import java.lang.*;
import java.util.ArrayList;
import java.util.Collections;
public class gather_data
{
public static void main(String args[]) throws Exception
{
int iobcount=0;int storecount=0; int sentencecount = 0;
List<List> sentences = new ArrayList<List>();
String filename[] = new String[100];
for(int i=0;i<100;i++) filename[i] = "";
/* for(int i=0;i<100;i++)
{
filename[i] = "";
String jstring = Integer.toString(i+100);
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
}*/
for(int i=0;i<100;i++)
{
if(i<10)
{
int j=i+1;
String jstring = Integer.toString(j);
filename[i] = filename[i].concat("00");
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
// System.out.println(filename[i]);
}
else
{
int j=i+1;
String jstring = Integer.toString(j);
filename[i] = filename[i].concat("0");
filename[i] = filename[i].concat(jstring);
filename[i] = filename[i].concat(".txt");
// System.out.println(filename[i]);
}
}
for(int i=0; i<100; i++)
{
List<List> each_sentence = new ArrayList<List>();
List<List> each_sentence2 = new ArrayList<List>();
List<List> each_sentence3 = new ArrayList<List>();
List<List> each_sentence4 = new ArrayList<List>();
List<List> each_sentence5 = new ArrayList<List>();
List<List> each_sentence6 = new ArrayList<List>();
List<List> each_sentence7 = new ArrayList<List>();
List<List> each_sentence8 = new ArrayList<List>();
List<List> each_sentence9 = new ArrayList<List>();
List<List> each_sentence10 = new ArrayList<List>();
List<List> each_sentence11 = new ArrayList<List>();
List<List> each_sentence12 = new ArrayList<List>();
List<List> each_sentence13 = new ArrayList<List>();
List<List> each_sentence14 = new ArrayList<List>();
List<List> each_sentence15 = new ArrayList<List>();
List<List> each_sentence16 = new ArrayList<List>();
List<List> each_sentence17 = new ArrayList<List>();
ArrayList<String> iob = new ArrayList<String> ();
ArrayList<String> pos = new ArrayList<String> ();
ArrayList<String> word = new ArrayList<String> ();
ArrayList<String> iob2 = new ArrayList<String> ();
ArrayList<String> pos2 = new ArrayList<String> ();
ArrayList<String> word2 = new ArrayList<String> ();
ArrayList<String> iob3 = new ArrayList<String> ();
ArrayList<String> pos3 = new ArrayList<String> ();
ArrayList<String> word3 = new ArrayList<String> ();
ArrayList<String> iob4 = new ArrayList<String> ();
ArrayList<String> pos4 = new ArrayList<String> ();
ArrayList<String> word4 = new ArrayList<String> ();
ArrayList<String> iob5 = new ArrayList<String> ();
ArrayList<String> pos5 = new ArrayList<String> ();
ArrayList<String> word5 = new ArrayList<String> ();
ArrayList<String> iob6 = new ArrayList<String> ();
ArrayList<String> pos6 = new ArrayList<String> ();
ArrayList<String> word6 = new ArrayList<String> ();
ArrayList<String> iob7 = new ArrayList<String> ();
ArrayList<String> pos7 = new ArrayList<String> ();
ArrayList<String> word7 = new ArrayList<String> ();
ArrayList<String> iob8 = new ArrayList<String> ();
ArrayList<String> pos8 = new ArrayList<String> ();
ArrayList<String> word8 = new ArrayList<String> ();
ArrayList<String> iob9 = new ArrayList<String> ();
ArrayList<String> pos9 = new ArrayList<String> ();
ArrayList<String> word9 = new ArrayList<String> ();
ArrayList<String> iob10 = new ArrayList<String> ();
ArrayList<String> pos10 = new ArrayList<String> ();
ArrayList<String> word10 = new ArrayList<String> ();
ArrayList<String> iob11 = new ArrayList<String> ();
ArrayList<String> pos11 = new ArrayList<String> ();
ArrayList<String> word11 = new ArrayList<String> ();
ArrayList<String> iob12 = new ArrayList<String> ();
ArrayList<String> pos12 = new ArrayList<String> ();
ArrayList<String> word12 = new ArrayList<String> ();
ArrayList<String> iob13 = new ArrayList<String> ();
ArrayList<String> pos13 = new ArrayList<String> ();
ArrayList<String> word13 = new ArrayList<String> ();
ArrayList<String> iob14 = new ArrayList<String> ();
ArrayList<String> pos14 = new ArrayList<String> ();
ArrayList<String> word14 = new ArrayList<String> ();
ArrayList<String> iob15 = new ArrayList<String> ();
ArrayList<String> pos15 = new ArrayList<String> ();
ArrayList<String> word15 = new ArrayList<String> ();
ArrayList<String> iob16 = new ArrayList<String> ();
ArrayList<String> pos16 = new ArrayList<String> ();
ArrayList<String> word16 = new ArrayList<String> ();
try
{
StreamTokenizer token = new StreamTokenizer(new FileReader(filename[i]));
//StreamTokenizer token = new StreamTokenizer(new FileReader("008.txt"));
token.resetSyntax();
token.ordinaryChar(' ');
token.wordChars(33,126);
token.wordChars(48,57);
//token.quoteChar(9);
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
// System.out.println(token.sval);
iob.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence.add(iob);
each_sentence.add(pos);
each_sentence.add(word);
sentences.add(each_sentence);
sentencecount++;
// System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob2.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos2.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word2.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence2.add(iob2);
each_sentence2.add(pos2);
each_sentence2.add(word2);
sentences.add(each_sentence2);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob3.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos3.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word3.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence3.add(iob3);
each_sentence3.add(pos3);
each_sentence3.add(word3);
sentences.add(each_sentence3);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob4.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos4.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word4.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence4.add(iob4);
each_sentence4.add(pos4);
each_sentence4.add(word4);
sentences.add(each_sentence4);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob5.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos5.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word5.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence5.add(iob5);
each_sentence5.add(pos5);
each_sentence5.add(word5);
sentences.add(each_sentence5);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob4.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos4.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word4.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence6.add(iob6);
each_sentence6.add(pos6);
each_sentence6.add(word6);
sentences.add(each_sentence6);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob7.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos7.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word7.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence7.add(iob7);
each_sentence7.add(pos7);
each_sentence7.add(word7);
sentences.add(each_sentence7);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob8.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos8.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word8.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence8.add(iob8);
each_sentence8.add(pos8);
each_sentence8.add(word8);
sentences.add(each_sentence8);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob9.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos9.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word9.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence9.add(iob9);
each_sentence9.add(pos9);
each_sentence9.add(word9);
sentences.add(each_sentence9);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob10.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos10.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word10.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence10.add(iob10);
each_sentence10.add(pos10);
each_sentence10.add(word10);
sentences.add(each_sentence10);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob11.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos11.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word11.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence11.add(iob11);
each_sentence11.add(pos11);
each_sentence11.add(word11);
sentences.add(each_sentence11);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob12.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos12.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word12.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence12.add(iob12);
each_sentence12.add(pos12);
each_sentence12.add(word12);
if(each_sentence == null) break;
sentences.add(each_sentence12);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob13.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos13.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word13.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence13.add(iob13);
each_sentence13.add(pos13);
each_sentence13.add(word13);
sentences.add(each_sentence13);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob14.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos14.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word14.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence14.add(iob14);
each_sentence14.add(pos14);
each_sentence14.add(word14);
sentences.add(each_sentence14);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob15.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos15.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word15.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence15.add(iob15);
each_sentence15.add(pos15);
each_sentence15.add(word15);
sentences.add(each_sentence15);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
while(token.nextToken()!= StreamTokenizer.TT_EOF)
{
if(token.ttype==StreamTokenizer.TT_WORD)
{
//System.out.println(token.sval);
iob16.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
pos16.add(token.sval);
token.nextToken();
token.nextToken();
token.nextToken();
word16.add(token.sval);
// System.out.println(word.get(2));
}
if(token.ttype ==StreamTokenizer.TT_EOL)
{
if(token.nextToken()==StreamTokenizer.TT_EOL)
{
each_sentence16.add(iob16);
each_sentence16.add(pos16);
each_sentence16.add(word16);
sentences.add(each_sentence16);
sentencecount++;
//System.out.println(sentences.get(sentencecount-1));
break;
// iob.clear();
// pos.clear();
// word.clear();
// System.out.println(sentences.get(sentencecount-1));
}
else
token.pushBack();
}}
// System.out.println(sentences.get(sentencecount-1));
} catch (IOException e) {}
}
}
}
__________________________________________________ __
- 10-08-2008, 02:05 PM #13
NLP is challenging ( to say the least ) given the skills displayed by declaration of numerous Collections, probably study some compiler science for a while as that is the front end for most compilers. There is a syntax that looks something like < < >> with which one can do lists of lists or otherwise what leads to n-dimensioned arrays, but that is clumsy and I suggest study of Stacks, Lists, Maps and realated Collections rather than trying to push the cart with hardcode.
Introduction to Programming Using Java.
Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor
- 10-08-2008, 03:13 PM #14
Member
- Join Date
- Oct 2008
- Posts
- 19
- Rep Power
- 0
thanks guys, i guess for this NLP project, i'm just going to pick out 20 sentences for each textfile from the corpus. i'll go and study more on Collections and Mapping LinkList but for now 20 sentences from each file sounds reasonable. hahaha. i could say i'm trying to do random sampling...
I've read up on deep copying the linklist and the simpliest way was to manipulate memorybuffer and copy data from there. i guess that's too much for this project too.
oh by the way, i'm trying to code a simplistics chunker that detects Noun Phrases and Verb Phrases from Sentences. it's a transformation based learning chunker inspired by Ramshaw. i saw his chunker and he uses lots of hashtables to store his data, due to my lack of knowledge of hashtables, i've chosen list class as my data type.
guess there's still a long way to go before i'm good enough to do proper NLP, but for now i'll just make do with hardcode. cos of the deadline for my projects.
thanks guys
Similar Threads
-
List and ArrayList
By ravian in forum New To JavaReplies: 3Last Post: 07-27-2011, 05:05 PM -
Deep Copy Test
By Java Tip in forum java.langReplies: 0Last Post: 04-16-2008, 11:05 PM -
Copying Eclipse Plugins
By gapper in forum EclipseReplies: 1Last Post: 01-23-2008, 10:37 AM -
how can i deep copy objects themselves instead of handles or references.. ?
By ishakteyran in forum Advanced JavaReplies: 0Last Post: 12-28-2007, 12:25 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks