Results 1 to 3 of 3
Thread: building a tokenizer
- 01-20-2010, 02:15 PM #1
Member
- Join Date
- Dec 2007
- Posts
- 3
- Rep Power
- 0
building a tokenizer
Hey there,
I'm trying to build a tokenizer to break up sentences in words. Now since I find it not sufficient to break up/split string on whitespace, I'd like to give it a few more arguments.
Whitespace being one of them, but also dot followed by whitespace, question mark followed by whitespace etc. (But not a dot alone, since that can be part of an abbreviation for example). So it's actually a mix of (single) characters (whitespace) and strings (punctuation mark followed by whitespace) that I want to split on.
so conceptually I'd figure it looks something like this:
input.split("[", ", ". ", "? ", "! , " "]";
but eclipse doesn't really like that...
I've read the doc for string and pattern/matcher, but it doesn't really help me.
Anybody here who could point me in the right direction?
thanks.
- 01-20-2010, 04:08 PM #2
Probably a crosspost? New To Java - trying to build a tokenizer
Math problems? Call 1-800-[(10x)(13i)^2]-[sin(xy)/2.362x]
The Ubiquitous Newbie Tips
- 01-20-2010, 05:45 PM #3
Similar Threads
-
Manipulating String Tokenizer
By Bomber_Will in forum New To JavaReplies: 2Last Post: 04-19-2009, 11:54 PM -
string tokenizer
By twinytwo in forum New To JavaReplies: 2Last Post: 03-26-2009, 02:10 PM -
Problem with string tokenizer
By twinytwo in forum AWT / SwingReplies: 2Last Post: 03-26-2009, 11:27 AM -
Parsing or Tokenizer??
By hiklior in forum New To JavaReplies: 15Last Post: 05-28-2008, 02:20 PM -
question on string tokenizer
By munigantipraveen in forum New To JavaReplies: 2Last Post: 05-23-2008, 05:00 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks