Results 1 to 3 of 3
Thread: [SOLVED] More RegEx help
- 05-22-2008, 11:25 PM #1
[SOLVED] More RegEx help
Im trying to make a xhtml file parser with regular expressions. I want it to find tags and store them. for example :
<html>
<head>
<title>example</title>
</head>
<body>
<p>para</p>
<hr />
<p>para2</p>
</body>
</html>
The application should return:
<html>:1
<head>:1
<title>:1
</title>:1
</head>:1
<body>:1
<p>:2
</p>:2
<hr />:1
</body>:1
</html>:1
After I'll make it group together starting and ending tags (<head> ,</head>)
and make it recognize that something isn't a tag if a ! follows the <
(<!-- --> and <!DOCTYPE html...) and to skip between <script> and </script>
since there would be no tags, and there might be confusion with java/vb script > sign and < sign etc. I'll also change it so that it shows only < tagname > and no attributes.
But for now, I just want to get it started. Here is my code so far. I need help where it says so in the comments:
please reply i need help with thisJava Code:/** * This class parses xhtml files - 05/22/08 * and returns each unique tag * type and the quantity of it * using java.util.regex package. */ import java.util.regex.Pattern; import java.util.regex.Matcher; public class XCheck1 { // simple driver class to be improved later. public static void main() { XC_Model model = new XC_Model(); model.run(); } } class XC_Model { private String[] tags; // array to store the tags private int[] tagCounts; // each item in tags has the same index item in counts that stores private String cmatch; // cmatch = current match // ^ how many of that tag there are. private boolean found; // if another of the same type of tag is found in the array private int top; public XC_Model() { // paramless constructor initializes fields tags = new String[500]; // I'll deal with more then 500 tags later tagCounts = new String[500]; // see above^ cmatch = ""; top = 0; } public String getData() { // I'll change this after, for now it can parse Strings of "XHTML" String data = "<p>This is a <em><strong>XHTML</strong></em> paragraph</p><hr />"// tags and random text return data; } public void doScan() { // the method that actually does the parsing Pattern pattern = Pattern.compile("<.*>"); // This should mean "<".....">" Matcher matcher = pattern.matcher(getData()); // getData later will get the file while( /* theres more matches */ ) { // ************ this area is where i need help ******** cmatch = /* The current match */ // i'm not sure how to use regex to match 1 by 1 like this found = false; // found starts off as false for( int i = 0; i <= top; i++ ) { if( cmatch.equals(tags[i]) ) {// if it finds another tag that is the same tagCounts[i]++; // add 1 to the quantity of that tag found = true; // found a match is true i = top + 1; // break off any more looping } } if( found == false ) { //if this is the first encounter with this tag tags[top] = cmatch; // add it to the array tagCounts[top] = 1; // 1 of this unique tag so far top++; // shift up to the next item } } } public void run() { // later this will have a param for xhtml file doScan(); for( int j = 0; j < tags.length; j++) { System.out.println(tags[j] + ": " + tagCounts[j]); // print out the results } } }Last edited by JT4NK3D; 05-23-2008 at 10:10 PM. Reason: typo
- 05-23-2008, 01:01 AM #2
plz answer i need help
- 05-23-2008, 04:07 AM #3
- Join Date
- Jul 2007
- Location
- Colombo, Sri Lanka
- Posts
- 11,374
- Blog Entries
- 1
- Rep Power
- 18
Is that XCheck1 is your starting point?
Similar Threads
-
Regex for file extension
By gapper in forum New To JavaReplies: 1Last Post: 01-31-2008, 03:59 PM -
Using Scanner with regex.MatchResult
By Java Tip in forum Java TipReplies: 0Last Post: 01-18-2008, 02:08 PM -
Regex Quantifiers Example
By Java Tip in forum Java TipReplies: 0Last Post: 01-10-2008, 10:44 AM -
Regex pattern
By ravian in forum New To JavaReplies: 4Last Post: 12-11-2007, 10:20 AM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks