Page 2 of 2 FirstFirst 12
Results 21 to 27 of 27
  1. #21
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Smile Mastering Regular Expressions

    Quote Originally Posted by Darryl.Burke View Post
    ....And when I'm really really stuck I wait for some guru to post a solution and then try to understand it, with help from the author.
    Ahhh..., another Master. ( we seem to be accumulating them - maybe it's our slogan poll or something ) I have a copy of Mastering Regular Expressions which appears to be profoundly useful in this setting. I would have worked on this using that book but am swamped right now.

    It is hard to envision someone of this caliber being "really stuck" .....
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  2. #22
    fishtoprecords's Avatar
    fishtoprecords is offline Senior Member
    Join Date
    Jun 2008
    Posts
    571
    Rep Power
    7

    Default

    Actually, regex are not always a good solution to problems like this.

    It looks like you are trying to pull apart real HTML, which is rarely compliant with even the weak HTML specs.

    There are times when you really want a proper parser. In the olden days, you'd use lex and yacc, but there are now Java equivalents and even compiler development books using them.

    With a context grammar, you can easily decide which whitespace is meaningful and which is just optional filler. Doing it in a context free grammar may be possible, but I have not beat my brain into those depths in a decade.

  3. #23
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,791
    Rep Power
    25

    Default

    This was a exercise to learn regex.
    I'll restate the problem.
    Looking for a regex that will match a substring:
    starting with a <
    followed by optional spaces
    followed by any char that is NOT a P
    followed by any characters
    followed by a >

    I can't get the optional spaces bit to work.
    I thought \s*? would do it : white space, 0 or more, non greedy

  4. #24
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,791
    Rep Power
    25

    Default Solved(???)

    Here's as far as I'm going with this problem. The given regex removes all tags except those in the list. It allows for leading blanks and lower case
    Java Code:
          String regex = "<[^((\\s*P)|(\\s*H1)|(\\s*LI))].*?>";     // Remove all tags except these
    
          System.out.println(regex + " on '" + input + "' = '" 
            + Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(input).replaceAll("") +"'");

  5. #25
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    Java Code:
    String regex = "<[^((\\s*P)|(\\s*H1)|(\\s*LI))].*?>";
    Opening brace,followed by not zero or more whitespace,...alternation inside character class?...letter 'P' OR zero or more whitespace followed by letter H or character '1' OR zero or more whitespace followed by character 'L' or character 'I' followed by anything zero or more tiems ( reluctant ) followed by closing angle bracket.

    Probably an interesting study, but where is the forward slash in the closing tag?... and this will leave the text between opening and closing tags, will it not? which appears here to me to be the intent of the work. To save the displayed text, perhaps exposing that to further work later in the code such as wrapping in new tag I would think we could use round braces in a tree like manner, placing code to find the closing tag due to the limited logic of Pattern,Matcher combinations.

    Mostly, not to be picky, what I see here is alternation inside [] - which appears to me to need further review. Then later we have .* which I would think would slurp to the eol. I would, if I use that place it at the back of a tree or something so as not to pick it up or use sytax which states, "this is here but do not use it" for efficiency reasons. The above code reads to me to skip any tag ( the contents of any tag ) which holds P or H or "1" or L or I but finds the opening braces either way and does not account for the forward slash in the closing tag. Text between tags would be skipped and thus the design says to me student's work that finds all opening tags except for not doing it correctly.

    Not harping, learning on your dime.
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

  6. #26
    Norm's Avatar
    Norm is online now Moderator
    Join Date
    Jun 2008
    Location
    Eastern Florida
    Posts
    17,791
    Rep Power
    25

    Default

    Good points. I'll leave them for those interested to pursue.

  7. #27
    Nicholas Jordan's Avatar
    Nicholas Jordan is offline Senior Member
    Join Date
    Jun 2008
    Location
    Southwest
    Posts
    1,018
    Rep Power
    8

    Default

    I posted about the regex on a one read / my take on the matter and decided to just try to read the regex and convert to plain english. I have a rather good book on regexes, the author who has mastery of the subject takes a decent part of a major chapter to show [] v () as syntax. Short of the deal is for kiss on smaller, use ( | | | ) but for finds that can grow large on large strings something along the lines of "^\\w+" or "^[abcdef]+" or "< ?[lsmft]>" or something is of greater efficiency. Alternation: | can and does drive NFA regex engines nuts on some conditions unless there is good tweaks and optimizations built in.
    Introduction to Programming Using Java.
    Cybercartography: A new theoretical construct proposed by D.R. Fraser Taylor

Page 2 of 2 FirstFirst 12

Similar Threads

  1. [SOLVED] More RegEx help
    By JT4NK3D in forum New To Java
    Replies: 2
    Last Post: 05-23-2008, 05:07 AM
  2. Allowing only numeric values in a TextField
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 03-01-2008, 11:08 PM
  3. Using Scanner with regex.MatchResult
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 01-18-2008, 03:08 PM
  4. Regex Quantifiers Example
    By Java Tip in forum Java Tip
    Replies: 0
    Last Post: 01-10-2008, 11:44 AM
  5. Regex pattern
    By ravian in forum New To Java
    Replies: 4
    Last Post: 12-11-2007, 11:20 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •