Results 1 to 12 of 12
- 07-09-2011, 12:05 AM #1
Test if string *could* match regular expression
Rather than test if a whole string matches a given pattern, I want to know if a string could begin a match for that pattern.
For example, if the pattern is "[A-Z]+[0-9]+"
"A" - not a match, but could begin of one: true
"ABC" - not a match, but could begin one: true
"ABC123" - match: true
"9" - could not begin a match: false
"ABC123X" - could not begin a match: false
I don't think is is possible with Patterns and Matchers. (Correct me if I'm wrong!) Is there a library out there that can do this?Get in the habit of using standard Java naming conventions!
- 07-09-2011, 01:04 AM #2
Moderator
- Join Date
- Jul 2010
- Location
- California
- Posts
- 1,605
- Rep Power
- 5
Your answer might be more complicated than this solution, but using a * quantifier on your last character class (as opposed to +) would correlate with what you want.
- 07-09-2011, 01:19 AM #3
As I read characters from an input stream, I want to know if what I've read so far could be the beginning of a VT220 escape sequence. For example, I want "\033\\[" to match because it is the beginning of many valid sequences. Whenever adding one more character would make it no longer a possible match -- like the 'X' in the final example of my original post -- I want to act on it.
I can't just wait for it to become a complete match, and then cease to be a match, because if the input is bad it might never become a valid match.
To state my problem more generally, I want to know if regexp A (the characters I've read, taken as a literal regexp) matches a subset of regexp B (a regexp describing VT220 escape sequences). I found some stuff about this problem on StackOverflow... something about converting the two regexps to minimal DFAs and something about empty languages. Stuff I don't really understand.
I'm probably totally overthinking this...Get in the habit of using standard Java naming conventions!
- 07-09-2011, 11:49 AM #4
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,399
- Blog Entries
- 7
- Rep Power
- 17
Sun's RE implementation doesn't offer direct access to the DFA (Deterministic Finit Automaton) that represents the original RE. Your question would've been easy to solve: the input String doesn't bring the DFA to an error state and it doesn't bring it to its acccepting state. You could write your own RE framework that gives access to the NFA (the predecessor of the DFA) and check that. Creating an NFA (Non-deterministic Finit Automaton) is quite easy. For any single character RE, say x, create a small graph [S] -> x -> [E] with start state [S] and ending (accepting) state [E]. For a sequence of normal characters concatenate the individual NFAs. For the reflexive closure * create an additional edge from [E] to [S] and from [S] to [E]. For the transitive closure only create an additional edge from [E] to [S]. For the inclusive or operation create two edges starting from the same state [S] and ending in the same state [E]; the edges are labeled with the operands of the inclusive-or REs of the entire RE. The final state [E] is the accepting state and the 'leftmost' [S] state is the start state of the NFA. For any state not having a character y on its outgoing edges create a (single) failure state [F] from [S] -> y -> [F].
Your problem boils down to traversing the NFA given an input string and checking that the string doesn't lead to state [F].
kind regards,
JosLast edited by JosAH; 07-09-2011 at 11:51 AM.
When people rob a bank they get a penalty; when banks rob people they get a bonus.
- 07-09-2011, 08:10 PM #5
Thanks, Jos! I can't quite wrap my head around a NFA this morning (I find a DFA much easier to comprehend) but I'm sure I'll figure it out. I found this article that I think will be helpful when I have time to sit down and study it: Regular Expression Matching Can Be Simple And Fast
Get in the habit of using standard Java naming conventions!
- 07-09-2011, 08:23 PM #6
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,399
- Blog Entries
- 7
- Rep Power
- 17
That's a nice article; it explains much better than I did how to build an NFA. The explanation how to turn an NFA into a DFA is a bit sloppy though; I refer to 'the Dragon Book' by Aho and Ullman (sp?) for a very thorough explanation of NFA --> DFA. Also, read Sedgewick's "Algoritms in C" for an efficient NFA simulator.
kind regards,
Jos
edit: sorry, forgot a link (btw, Sedgewick rewrote his book in Java too)Last edited by JosAH; 07-09-2011 at 08:45 PM.
When people rob a bank they get a penalty; when banks rob people they get a bonus.
- 07-09-2011, 11:21 PM #7
Oh yeah, I read Sedgewick's book on graph algorithms. Good stuff.
BTW... the next component of my project, after I get the escape sequences figured out, will be a text menu system that owes its general design pattern to your article on Bytes. Thanks for that.Get in the habit of using standard Java naming conventions!
- 07-10-2011, 09:42 AM #8
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,399
- Blog Entries
- 7
- Rep Power
- 17
- 07-10-2011, 10:01 AM #9
Mine's going to be a bit more complicated, because I want to accept a few different types of input (single characters, numeric strings, or both) and the ability to print a message, then redisplay the prompt along with any incomplete input. But it'll follow your general pattern and advice on separating the "business logic" from the menu.
A nonsensical example of what I mean:
Java Code:Enter a five-digit number: [COLOR="red"]12[/COLOR] MONSTERS ARE ATTACKING! Enter a five-digit number: 12[COLOR="red"]345[/COLOR] You entered 12345.
Last edited by kjkrum; 07-10-2011 at 10:03 AM.
Get in the habit of using standard Java naming conventions!
- 07-10-2011, 10:30 AM #10
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,399
- Blog Entries
- 7
- Rep Power
- 17
Last edited by JosAH; 07-10-2011 at 10:32 AM.
When people rob a bank they get a penalty; when banks rob people they get a bonus.
- 07-12-2011, 09:19 AM #11
I ended up using the BRICS automata library. It can compile a RegExp into an Automaton, and then step(char) through the Automaton's States.
Get in the habit of using standard Java naming conventions!
- 07-12-2011, 10:01 AM #12
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,399
- Blog Entries
- 7
- Rep Power
- 17
If you can step through the NFA (or DFA) your question is answered; if the iteration doesn't end up in an error state the input String could be the start of a matching String for your RE.If the iteration ends up in an accepting state your input String does match your RE.
kind regards,
JosWhen people rob a bank they get a penalty; when banks rob people they get a bonus.
Similar Threads
-
Regular Expresion: match longest leftmost alternative, not first?
By johann_p in forum Advanced JavaReplies: 11Last Post: 06-12-2011, 08:38 AM -
Searching the XML string content using Regular Expression !!!!
By j_kathiresan in forum New To JavaReplies: 1Last Post: 04-04-2011, 12:51 PM -
regular expression
By prof.deedee in forum JDBCReplies: 3Last Post: 02-19-2010, 11:15 AM -
java regular expression to search block/string/word exist in a paragraph?
By sidharth in forum Advanced JavaReplies: 2Last Post: 11-11-2009, 05:56 AM -
Get all groups from a regular expression match?
By johann_p in forum New To JavaReplies: 0Last Post: 05-16-2008, 07:50 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks