Java Forums

Main Menu
Home
Today's Posts
FAQ
Search
Contact Us

Java Network
Linux Archive
Java Tips
Java Tips Blog

Sponsored Links





Welcome to the Java Forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community, you will:

  • have access to post topics
  • communicate privately with other members (PM)
  • not see advertisements between posts
  • have the possibility to earn one of our surprises if you are an active member
  • access many other special features that will be introduced later.

Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact us.

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 05-23-2008, 01:25 AM
JT4NK3D's Avatar
Member
 
Join Date: Nov 2007
Posts: 50
JT4NK3D is on a distinguished road
[SOLVED] More RegEx help
Im trying to make a xhtml file parser with regular expressions. I want it to find tags and store them. for example :
<html>
<head>
<title>example</title>
</head>
<body>
<p>para</p>
<hr />
<p>para2</p>
</body>
</html>
The application should return:
<html>:1
<head>:1
<title>:1
</title>:1
</head>:1
<body>:1
<p>:2
</p>:2
<hr />:1
</body>:1
</html>:1
After I'll make it group together starting and ending tags (<head> ,</head>)
and make it recognize that something isn't a tag if a ! follows the <
(<!-- --> and <!DOCTYPE html...) and to skip between <script> and </script>
since there would be no tags, and there might be confusion with java/vb script > sign and < sign etc. I'll also change it so that it shows only < tagname > and no attributes.
But for now, I just want to get it started. Here is my code so far. I need help where it says so in the comments:
Code:
/** * This class parses xhtml files - 05/22/08 * and returns each unique tag * type and the quantity of it * using java.util.regex package. */ import java.util.regex.Pattern; import java.util.regex.Matcher; public class XCheck1 { // simple driver class to be improved later. public static void main() { XC_Model model = new XC_Model(); model.run(); } } class XC_Model { private String[] tags; // array to store the tags private int[] tagCounts; // each item in tags has the same index item in counts that stores private String cmatch; // cmatch = current match // ^ how many of that tag there are. private boolean found; // if another of the same type of tag is found in the array private int top; public XC_Model() { // paramless constructor initializes fields tags = new String[500]; // I'll deal with more then 500 tags later tagCounts = new String[500]; // see above^ cmatch = ""; top = 0; } public String getData() { // I'll change this after, for now it can parse Strings of "XHTML" String data = "<p>This is a <em><strong>XHTML</strong></em> paragraph</p><hr />"// tags and random text return data; } public void doScan() { // the method that actually does the parsing Pattern pattern = Pattern.compile("<.*>"); // This should mean "<".....">" Matcher matcher = pattern.matcher(getData()); // getData later will get the file while( /* theres more matches */ ) { // ************ this area is where i need help ******** cmatch = /* The current match */ // i'm not sure how to use regex to match 1 by 1 like this found = false; // found starts off as false for( int i = 0; i <= top; i++ ) { if( cmatch.equals(tags[i]) ) {// if it finds another tag that is the same tagCounts[i]++; // add 1 to the quantity of that tag found = true; // found a match is true i = top + 1; // break off any more looping } } if( found == false ) { //if this is the first encounter with this tag tags[top] = cmatch; // add it to the array tagCounts[top] = 1; // 1 of this unique tag so far top++; // shift up to the next item } } } public void run() { // later this will have a param for xhtml file doScan(); for( int j = 0; j < tags.length; j++) { System.out.println(tags[j] + ": " + tagCounts[j]); // print out the results } } }
please reply i need help with this

Last edited by JT4NK3D : 05-24-2008 at 12:10 AM. Reason: typo
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
  #2 (permalink)  
Old 05-23-2008, 03:01 AM
JT4NK3D's Avatar
Member
 
Join Date: Nov 2007
Posts: 50
JT4NK3D is on a distinguished road
plz answer i need help
Bookmark Post in Technorati
Reply With Quote
  #3 (permalink)  
Old 05-23-2008, 06:07 AM
Eranga's Avatar
Moderator
 
Join Date: Jul 2007
Location: Colombo, Sri Lanka
Posts: 4,592
Eranga has a spectacular aura aboutEranga has a spectacular aura about
Send a message via Yahoo to Eranga
Is that XCheck1 is your starting point?
__________________
Use an appropriate Subject. "Help, urgent!" isn't one.
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

Has someone helped you? Then you can
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
their helpful post.

Want to make your IDE the best?
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.

To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
Bookmark Post in Technorati
Reply With Quote
Sponsored Links
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex for file extension gapper New To Java 1 01-31-2008 05:59 PM
Using Scanner with regex.MatchResult Java Tip Java Tips 0 01-18-2008 04:08 PM
Regex Quantifiers Example Java Tip Java Tips 0 01-10-2008 12:44 PM
Regex pattern ravian New To Java 4 12-11-2007 12:20 PM
Matching Patters using Regex JavaForums Java Blogs 0 11-01-2007 06:29 PM


All times are GMT +3. The time now is 03:11 AM.


VBulletin, Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2007, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org