Results 1 to 4 of 4
Thread: Regex: find all possible matches
- 10-13-2010, 12:43 PM #1
Member
- Join Date
- Oct 2010
- Posts
- 14
- Rep Power
- 0
Regex: find all possible matches
Hi,
this is a problem that has been bothering me a while now:
imagine you have a string (e.g. "___a___b___c___") and you have a regex (e.g. "([abc]).*([abc])") and you want to find all possible combinations of matching groups (e.g. a-b, a-c, b-c).
You can play around with greedy and non greedy quantifiers (e.g. "([abc]).*([abc])" and "([abc]).*?([abc])") which will find some of this combinations (e.g. a-b and a-c) - but not all (in this case not b-c).
What I've been doing till now is basically this:
output:Java Code:String haystack = "___a___b___c___"; for (int j = 0; j < 100; j++) { for (int i = 0; i < 100; i++) { String pattern = "^.{" + j + "}([abc]).{" + i + "}([abc])"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(haystack); while (m.find()) { System.out.printf("%2d,%2d : %s - %s\n", j, i, m.group(1), m.group(2)); } } }
this code runs through all possible string lengths and finds all combinations that match the given pattern.Java Code:3, 3 : a - b 3, 7 : a - c 7, 3 : b - c
However it does this by trying all possible expressions (in this case 100*100 = 10000 times - yes you could use less than 100, but you get the point).
Is there a more elegant (i.e. faster) solution to this?
Regards,
Alessio
- 10-18-2010, 01:19 PM #2
Member
- Join Date
- Oct 2010
- Posts
- 14
- Rep Power
- 0
*bump* (sorry, about that)
- 10-18-2010, 07:30 PM #3
This is probably how I'd do it... it's not much "faster" but it prevents hardcoding values (like 100).
This code loops through the string starting at some point then uses all the subsets of the string after that point. Using the substring method, it pulls a section out of the string. For example, you'd get something like this for the value of haystack.substring(i,j):Java Code:String haystack = "___a___b___c___"; for (int i = 0; i < haystack.length() - 1; i++) { for (int j = i + 1; j < haystack.length(); j++) { String pattern = "^([abc]).*([abc])$"; Pattern p = Pattern.compile(pattern); Matcher m = p.matcher(haystack.substring(i,j)); while (m.find()) { System.out.printf("%2d,%2d : %s - %s\n", i,j, m.group(1), m.group(2)); } } }
Then the regex compares that pattern (using string start/end ^$) to the substring taken out. If it matches, it records the start/end indices.Java Code:_ __ ___ ___a ___a_ ___a__ ___a___ ___a___b ___a___b_ ... _ _c _c_ _c__ c c_ c__ _ __ _
I'm kind of convinced there's a faster way to do this yet, so I'm still looking. This was worth posting I thought though.
- 10-19-2010, 01:10 PM #4
Member
- Join Date
- Oct 2010
- Posts
- 14
- Rep Power
- 0
Similar Threads
-
help: matches don`t work
By asabdo in forum New To JavaReplies: 6Last Post: 01-16-2010, 07:06 AM -
Lucene Not Throwing Matches When Search String Is Without Space
By nishu.soni in forum LuceneReplies: 0Last Post: 11-17-2009, 01:58 PM -
How "Pattern.matches(regex, input)" function works?
By kishan in forum Advanced JavaReplies: 2Last Post: 04-26-2009, 12:46 AM -
Regex that matches whole word
By moaxjlou in forum New To JavaReplies: 13Last Post: 11-04-2008, 05:48 PM -
Help with password matches
By Albert in forum AWT / SwingReplies: 1Last Post: 07-10-2007, 04:17 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks