Results 1 to 4 of 4
  1. #1
    Alessio is offline Member
    Join Date
    Oct 2010
    Posts
    14
    Rep Power
    0

    Question Regex: find all possible matches

    Hi,

    this is a problem that has been bothering me a while now:

    imagine you have a string (e.g. "___a___b___c___") and you have a regex (e.g. "([abc]).*([abc])") and you want to find all possible combinations of matching groups (e.g. a-b, a-c, b-c).

    You can play around with greedy and non greedy quantifiers (e.g. "([abc]).*([abc])" and "([abc]).*?([abc])") which will find some of this combinations (e.g. a-b and a-c) - but not all (in this case not b-c).

    What I've been doing till now is basically this:

    Java Code:
    String haystack = "___a___b___c___";
    for (int j = 0; j < 100; j++) {
    	for (int i = 0; i < 100; i++) {
    		String pattern = "^.{" + j + "}([abc]).{" + i + "}([abc])";
    		Pattern p = Pattern.compile(pattern);
    		Matcher m = p.matcher(haystack);
    		while (m.find()) {
    			System.out.printf("%2d,%2d : %s - %s\n",
    					j, i, 
    					m.group(1),
    					m.group(2));
    		}
    	}
    }
    output:
    Java Code:
     3, 3 : a - b
     3, 7 : a - c
     7, 3 : b - c
    this code runs through all possible string lengths and finds all combinations that match the given pattern.

    However it does this by trying all possible expressions (in this case 100*100 = 10000 times - yes you could use less than 100, but you get the point).

    Is there a more elegant (i.e. faster) solution to this?

    Regards,
    Alessio

  2. #2
    Alessio is offline Member
    Join Date
    Oct 2010
    Posts
    14
    Rep Power
    0

    Default

    *bump* (sorry, about that)

  3. #3
    Zack's Avatar
    Zack is offline Senior Member
    Join Date
    Jun 2010
    Location
    Destiny Islands
    Posts
    692
    Rep Power
    10

    Default

    This is probably how I'd do it... it's not much "faster" but it prevents hardcoding values (like 100).
    Java Code:
    String haystack = "___a___b___c___";
    for (int i = 0; i < haystack.length() - 1; i++) {
    	for (int j = i + 1; j < haystack.length(); j++) {
    		String pattern = "^([abc]).*([abc])$";
    		Pattern p = Pattern.compile(pattern);
    		Matcher m = p.matcher(haystack.substring(i,j));
    		while (m.find()) {
    			System.out.printf("%2d,%2d : %s - %s\n",
    					i,j,
    					m.group(1),
    					m.group(2));
    		}
    	}
    }
    This code loops through the string starting at some point then uses all the subsets of the string after that point. Using the substring method, it pulls a section out of the string. For example, you'd get something like this for the value of haystack.substring(i,j):
    Java Code:
    _
    __
    ___
    ___a
    ___a_
    ___a__
    ___a___
    ___a___b
    ___a___b_
    ...
    _
    _c
    _c_
    _c__
    c
    c_
    c__
    _
    __
    _
    Then the regex compares that pattern (using string start/end ^$) to the substring taken out. If it matches, it records the start/end indices.

    I'm kind of convinced there's a faster way to do this yet, so I'm still looking. This was worth posting I thought though.

  4. #4
    Alessio is offline Member
    Join Date
    Oct 2010
    Posts
    14
    Rep Power
    0

    Default

    Yes, thats definitely a possibility, but basically it does the same thing as my code: just try every possible sub string / string length.

    I'm beginning to think that thats just the only possible solution with the Java Regex Parser...

Similar Threads

  1. help: matches don`t work
    By asabdo in forum New To Java
    Replies: 6
    Last Post: 01-16-2010, 08:06 AM
  2. Replies: 0
    Last Post: 11-17-2009, 02:58 PM
  3. How "Pattern.matches(regex, input)" function works?
    By kishan in forum Advanced Java
    Replies: 2
    Last Post: 04-26-2009, 01:46 AM
  4. Regex that matches whole word
    By moaxjlou in forum New To Java
    Replies: 13
    Last Post: 11-04-2008, 06:48 PM
  5. Help with password matches
    By Albert in forum AWT / Swing
    Replies: 1
    Last Post: 07-10-2007, 05:17 PM

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •