Results 1 to 12 of 12
  1. #1
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default Test if string *could* match regular expression

    Rather than test if a whole string matches a given pattern, I want to know if a string could begin a match for that pattern.

    For example, if the pattern is "[A-Z]+[0-9]+"
    "A" - not a match, but could begin of one: true
    "ABC" - not a match, but could begin one: true
    "ABC123" - match: true
    "9" - could not begin a match: false
    "ABC123X" - could not begin a match: false

    I don't think is is possible with Patterns and Matchers. (Correct me if I'm wrong!) Is there a library out there that can do this?
    Get in the habit of using standard Java naming conventions!

  2. #2
    doWhile is offline Moderator
    Join Date
    Jul 2010
    Location
    California
    Posts
    1,641
    Rep Power
    7

    Default

    Your answer might be more complicated than this solution, but using a * quantifier on your last character class (as opposed to +) would correlate with what you want.

  3. #3
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default

    As I read characters from an input stream, I want to know if what I've read so far could be the beginning of a VT220 escape sequence. For example, I want "\033\\[" to match because it is the beginning of many valid sequences. Whenever adding one more character would make it no longer a possible match -- like the 'X' in the final example of my original post -- I want to act on it.

    I can't just wait for it to become a complete match, and then cease to be a match, because if the input is bad it might never become a valid match.

    To state my problem more generally, I want to know if regexp A (the characters I've read, taken as a literal regexp) matches a subset of regexp B (a regexp describing VT220 escape sequences). I found some stuff about this problem on StackOverflow... something about converting the two regexps to minimal DFAs and something about empty languages. Stuff I don't really understand.

    I'm probably totally overthinking this...
    Get in the habit of using standard Java naming conventions!

  4. #4
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Sun's RE implementation doesn't offer direct access to the DFA (Deterministic Finit Automaton) that represents the original RE. Your question would've been easy to solve: the input String doesn't bring the DFA to an error state and it doesn't bring it to its acccepting state. You could write your own RE framework that gives access to the NFA (the predecessor of the DFA) and check that. Creating an NFA (Non-deterministic Finit Automaton) is quite easy. For any single character RE, say x, create a small graph [S] -> x -> [E] with start state [S] and ending (accepting) state [E]. For a sequence of normal characters concatenate the individual NFAs. For the reflexive closure * create an additional edge from [E] to [S] and from [S] to [E]. For the transitive closure only create an additional edge from [E] to [S]. For the inclusive or operation create two edges starting from the same state [S] and ending in the same state [E]; the edges are labeled with the operands of the inclusive-or REs of the entire RE. The final state [E] is the accepting state and the 'leftmost' [S] state is the start state of the NFA. For any state not having a character y on its outgoing edges create a (single) failure state [F] from [S] -> y -> [F].

    Your problem boils down to traversing the NFA given an input string and checking that the string doesn't lead to state [F].

    kind regards,

    Jos
    Last edited by JosAH; 07-09-2011 at 12:51 PM.
    cenosillicaphobia: the fear for an empty beer glass

  5. #5
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default

    Thanks, Jos! I can't quite wrap my head around a NFA this morning (I find a DFA much easier to comprehend) but I'm sure I'll figure it out. I found this article that I think will be helpful when I have time to sit down and study it: Regular Expression Matching Can Be Simple And Fast
    Get in the habit of using standard Java naming conventions!

  6. #6
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by kjkrum View Post
    Thanks, Jos! I can't quite wrap my head around a NFA this morning (I find a DFA much easier to comprehend) but I'm sure I'll figure it out. I found this article that I think will be helpful when I have time to sit down and study it: Regular Expression Matching Can Be Simple And Fast
    That's a nice article; it explains much better than I did how to build an NFA. The explanation how to turn an NFA into a DFA is a bit sloppy though; I refer to 'the Dragon Book' by Aho and Ullman (sp?) for a very thorough explanation of NFA --> DFA. Also, read Sedgewick's "Algoritms in C" for an efficient NFA simulator.

    kind regards,

    Jos

    edit: sorry, forgot a link (btw, Sedgewick rewrote his book in Java too)
    Last edited by JosAH; 07-09-2011 at 09:45 PM.
    cenosillicaphobia: the fear for an empty beer glass

  7. #7
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default

    Oh yeah, I read Sedgewick's book on graph algorithms. Good stuff.

    BTW... the next component of my project, after I get the escape sequences figured out, will be a text menu system that owes its general design pattern to your article on Bytes. Thanks for that.
    Get in the habit of using standard Java naming conventions!

  8. #8
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by kjkrum View Post
    BTW... the next component of my project, after I get the escape sequences figured out, will be a text menu system that owes its general design pattern to your article on Bytes. Thanks for that.
    Golly, I forgot all about that one; does it work for your purposes? When I reread my old code it looks suspiciously simple (but that should be a good sign ;-)

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  9. #9
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default

    Mine's going to be a bit more complicated, because I want to accept a few different types of input (single characters, numeric strings, or both) and the ability to print a message, then redisplay the prompt along with any incomplete input. But it'll follow your general pattern and advice on separating the "business logic" from the menu.

    A nonsensical example of what I mean:

    Java Code:
    Enter a five-digit number: [COLOR="red"]12[/COLOR]
    MONSTERS ARE ATTACKING!
    Enter a five-digit number: 12[COLOR="red"]345[/COLOR]
    You entered 12345.
    Last edited by kjkrum; 07-10-2011 at 11:03 AM.
    Get in the habit of using standard Java naming conventions!

  10. #10
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by kjkrum View Post
    Mine's going to be a bit more complicated, because I want to accept a few different types of input (single characters, numeric strings, or both) and the ability to print a message, then redisplay the prompt along with any incomplete input. But it'll follow your general pattern and advice on separating the "business logic" from the menu.

    A nonsensical example of what I mean:

    Java Code:
    Enter a five-digit number: [COLOR="red"]12[/COLOR]
    MONSTERS ARE ATTACKING!
    Enter a five-digit number: 12[COLOR="red"]345[/COLOR]
    You entered 12345.
    Does that example belong to the 'menu logic' or is it part of your 'business logic'?

    edit: now I see: you want the prompting for a next menu to be more flexible, right?

    kind regards,

    Jos
    Last edited by JosAH; 07-10-2011 at 11:32 AM.
    cenosillicaphobia: the fear for an empty beer glass

  11. #11
    kjkrum's Avatar
    kjkrum is offline Senior Member
    Join Date
    Apr 2011
    Location
    Tucson, AZ
    Posts
    1,060
    Rep Power
    6

    Default

    I ended up using the BRICS automata library. It can compile a RegExp into an Automaton, and then step(char) through the Automaton's States.
    Get in the habit of using standard Java naming conventions!

  12. #12
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by kjkrum View Post
    I ended up using the BRICS automata library. It can compile a RegExp into an Automaton, and then step(char) through the Automaton's States.
    If you can step through the NFA (or DFA) your question is answered; if the iteration doesn't end up in an error state the input String could be the start of a matching String for your RE.If the iteration ends up in an accepting state your input String does match your RE.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

Similar Threads

  1. Replies: 11
    Last Post: 06-12-2011, 09:38 AM
  2. Replies: 1
    Last Post: 04-04-2011, 01:51 PM
  3. regular expression
    By prof.deedee in forum JDBC
    Replies: 3
    Last Post: 02-19-2010, 12:15 PM
  4. Replies: 2
    Last Post: 11-11-2009, 06:56 AM
  5. Get all groups from a regular expression match?
    By johann_p in forum New To Java
    Replies: 0
    Last Post: 05-16-2008, 08:50 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •