Results 1 to 3 of 3
  1. #1
    bobocheez is offline Member
    Join Date
    Aug 2010
    Posts
    28
    Rep Power
    0

    Default Getting unknown substring from indexes?

    Hi,
    I'm trying to run through some long lines of strings stored in arrays and trying to find unknown substrings based on an ending and a beginning index. Primarily a URL.

    For example:
    Java Code:
    line[0] = "someRandom283794237982570388 <a href="http://somplace.com">SP</a>jibberish aliaj)&^#@ofnmao of TEXT";
    
    //we want to print http://someplace.com
    //assuming that we don't know that the URL is
    My approach to this is as follows (it's unpolished & not finished, it prints out a lot of the same thing over and over):
    Java Code:
    //line[i] is an array of strings
    for(int i=0; i<=12; i++){
    			int lengthOfString = line[i].length();
    			for(int i2=0; i2<=lengthOfString; i2++){
    				int indexOfChar = line[i].indexOf('g',i2);
    					if(indexOfChar != -1){
    					System.out.println("L"+i+" "+indexOfChar);
    					}
    			}
    		}
    //it prints out the locations of the letter g, but prints the same thing multiple times
    I was thinking of getting the index of 9 sequential characters <a href=" including the space. Then doing the same for ">
    Note: disregard the HTML quotes without the escapes. In the actual script, the line would come from a text file.
    If everything in between the index at the first " and the second one, then the following would print out:
    Java Code:
    http://someplace.com
    However, this looks very inefficient to me(noob), but I'm now sure how to go about this. Would regex(not familiar with) work better, or am I just going in the wrong direction all together?
    Thanks

  2. #2
    xerberuz is offline Member
    Join Date
    Apr 2010
    Posts
    8
    Rep Power
    0

    Default

    Regex would work better.

    Java Code:
          final Pattern pattern = Pattern.compile("<a href=\"([^\"]*)\">");
          final String urls = "<a href=\"www.test.com\"><a href=\"www.test2.com\"><a href=\"www.test3.com\">";
    
          final Matcher matcher = pattern.matcher(urls);
          while (matcher.find())
          {
             System.out.println(matcher.group(1));
          }

  3. #3
    bobocheez is offline Member
    Join Date
    Aug 2010
    Posts
    28
    Rep Power
    0

    Default

    That works great. Thanks
    What if you don't want to exclude the double quotes though and match every character?
    All I can think of is to exclude the "`" character one or more times, because it's not used that often, but it could still occur.
    Java Code:
    ([^`]*)
    Also, what does the double quote escape do? I can't find anything about it online. I can see that it's the literal of the double quote outside of the square bracket, but what about inside? If would look for everything except the double quote zero or more times, then what would you do if you wanted every character in between 2 specific characters?
    Last edited by bobocheez; 12-19-2010 at 05:29 AM.

Similar Threads

  1. Lucene indexes on network dir
    By heyvishy in forum Lucene
    Replies: 0
    Last Post: 12-13-2010, 08:55 PM
  2. Get query indexes.
    By peliukasss in forum Lucene
    Replies: 1
    Last Post: 08-23-2010, 08:11 PM
  3. array indexes
    By Kaito in forum New To Java
    Replies: 5
    Last Post: 10-30-2009, 03:14 AM
  4. Removing Indexes
    By gilbertsavier in forum JDBC
    Replies: 0
    Last Post: 07-17-2009, 07:23 AM
  5. Creating Indexes
    By gilbertsavier in forum JDBC
    Replies: 0
    Last Post: 07-17-2009, 07:23 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •