Results 1 to 3 of 3
Thread: building a tokenizer
- 01-20-2010, 03:15 PM #1Member
- Join Date
- Dec 2007
- Rep Power
building a tokenizer
I'm trying to build a tokenizer to break up sentences in words. Now since I find it not sufficient to break up/split string on whitespace, I'd like to give it a few more arguments.
Whitespace being one of them, but also dot followed by whitespace, question mark followed by whitespace etc. (But not a dot alone, since that can be part of an abbreviation for example). So it's actually a mix of (single) characters (whitespace) and strings (punctuation mark followed by whitespace) that I want to split on.
so conceptually I'd figure it looks something like this:
input.split("[", ", ". ", "? ", "! , " "]";
but eclipse doesn't really like that...
I've read the doc for string and pattern/matcher, but it doesn't really help me.
Anybody here who could point me in the right direction?
- 01-20-2010, 05:08 PM #2
- 01-20-2010, 06:45 PM #3
- By Bomber_Will in forum New To JavaReplies: 2Last Post: 04-20-2009, 12:54 AM
- By twinytwo in forum New To JavaReplies: 2Last Post: 03-26-2009, 03:10 PM
- By twinytwo in forum AWT / SwingReplies: 2Last Post: 03-26-2009, 12:27 PM
- By hiklior in forum New To JavaReplies: 15Last Post: 05-28-2008, 03:20 PM
- By munigantipraveen in forum New To JavaReplies: 2Last Post: 05-23-2008, 06:00 AM