Results 1 to 12 of 12
- 07-09-2011, 01:05 AM #1
Test if string *could* match regular expression
Rather than test if a whole string matches a given pattern, I want to know if a string could begin a match for that pattern.
For example, if the pattern is "[A-Z]+[0-9]+"
"A" - not a match, but could begin of one: true
"ABC" - not a match, but could begin one: true
"ABC123" - match: true
"9" - could not begin a match: false
"ABC123X" - could not begin a match: false
I don't think is is possible with Patterns and Matchers. (Correct me if I'm wrong!) Is there a library out there that can do this?
- 07-09-2011, 02:04 AM #2Moderator
- Join Date
- Jul 2010
- Rep Power
Your answer might be more complicated than this solution, but using a * quantifier on your last character class (as opposed to +) would correlate with what you want.
- 07-09-2011, 02:19 AM #3
As I read characters from an input stream, I want to know if what I've read so far could be the beginning of a VT220 escape sequence. For example, I want "\033\\[" to match because it is the beginning of many valid sequences. Whenever adding one more character would make it no longer a possible match -- like the 'X' in the final example of my original post -- I want to act on it.
I can't just wait for it to become a complete match, and then cease to be a match, because if the input is bad it might never become a valid match.
To state my problem more generally, I want to know if regexp A (the characters I've read, taken as a literal regexp) matches a subset of regexp B (a regexp describing VT220 escape sequences). I found some stuff about this problem on StackOverflow... something about converting the two regexps to minimal DFAs and something about empty languages. Stuff I don't really understand.
I'm probably totally overthinking this...
- 07-09-2011, 12:49 PM #4
Sun's RE implementation doesn't offer direct access to the DFA (Deterministic Finit Automaton) that represents the original RE. Your question would've been easy to solve: the input String doesn't bring the DFA to an error state and it doesn't bring it to its acccepting state. You could write your own RE framework that gives access to the NFA (the predecessor of the DFA) and check that. Creating an NFA (Non-deterministic Finit Automaton) is quite easy. For any single character RE, say x, create a small graph [S] -> x -> [E] with start state [S] and ending (accepting) state [E]. For a sequence of normal characters concatenate the individual NFAs. For the reflexive closure * create an additional edge from [E] to [S] and from [S] to [E]. For the transitive closure only create an additional edge from [E] to [S]. For the inclusive or operation create two edges starting from the same state [S] and ending in the same state [E]; the edges are labeled with the operands of the inclusive-or REs of the entire RE. The final state [E] is the accepting state and the 'leftmost' [S] state is the start state of the NFA. For any state not having a character y on its outgoing edges create a (single) failure state [F] from [S] -> y -> [F].
Your problem boils down to traversing the NFA given an input string and checking that the string doesn't lead to state [F].
Last edited by JosAH; 07-09-2011 at 12:51 PM.cenosillicaphobia: the fear for an empty beer glass
- 07-09-2011, 09:10 PM #5
Thanks, Jos! I can't quite wrap my head around a NFA this morning (I find a DFA much easier to comprehend) but I'm sure I'll figure it out. I found this article that I think will be helpful when I have time to sit down and study it: Regular Expression Matching Can Be Simple And Fast
- 07-09-2011, 09:23 PM #6
edit: sorry, forgot a link (btw, Sedgewick rewrote his book in Java too)
Last edited by JosAH; 07-09-2011 at 09:45 PM.cenosillicaphobia: the fear for an empty beer glass
- 07-10-2011, 12:21 AM #7
Oh yeah, I read Sedgewick's book on graph algorithms. Good stuff.
BTW... the next component of my project, after I get the escape sequences figured out, will be a text menu system that owes its general design pattern to your article on Bytes. Thanks for that.
- 07-10-2011, 10:42 AM #8
- 07-10-2011, 11:01 AM #9
Mine's going to be a bit more complicated, because I want to accept a few different types of input (single characters, numeric strings, or both) and the ability to print a message, then redisplay the prompt along with any incomplete input. But it'll follow your general pattern and advice on separating the "business logic" from the menu.
A nonsensical example of what I mean:
Enter a five-digit number: [COLOR="red"]12[/COLOR] MONSTERS ARE ATTACKING! Enter a five-digit number: 12[COLOR="red"]345[/COLOR] You entered 12345.
Last edited by kjkrum; 07-10-2011 at 11:03 AM.
- 07-10-2011, 11:30 AM #10
Last edited by JosAH; 07-10-2011 at 11:32 AM.cenosillicaphobia: the fear for an empty beer glass
- 07-12-2011, 10:19 AM #11
- 07-12-2011, 11:01 AM #12
Joscenosillicaphobia: the fear for an empty beer glass
- By johann_p in forum Advanced JavaReplies: 11Last Post: 06-12-2011, 09:38 AM
- By j_kathiresan in forum New To JavaReplies: 1Last Post: 04-04-2011, 01:51 PM
- By prof.deedee in forum JDBCReplies: 3Last Post: 02-19-2010, 12:15 PM
- By sidharth in forum Advanced JavaReplies: 2Last Post: 11-11-2009, 06:56 AM
- By johann_p in forum New To JavaReplies: 0Last Post: 05-16-2008, 08:50 PM