Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 12-02-2008, 03:56 AM
Member
 
Join Date: Dec 2008
Posts: 6
Rep Power: 0
wntdaliv is on a distinguished road
Default using Delimiter with metacharacters
Hi I'm trying to parse a file that splits up information by being surrounded by brackets. ie, [] and {}.

example segment of file:

{
[program]
[statement] NL . NL ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
}

I'm trying to parse each token within [] (ex, [statement]) to an object, however I can't use the delimiter("[") to split up the tokens and use the next() method because [] is a metacharacter

How can I go about parsing my tokens? Is there a work around to force "[" to be a pattern in itself like any other pattern that you can use delimiters with or is there something completely different that I can do?

Thanks in advance,
wntdaliv
Bookmark Post in Technorati
Reply With Quote
  #2 (permalink)  
Old 12-02-2008, 04:00 AM
Fubarable's Avatar
Moderator
 
Join Date: Jun 2008
Posts: 3,190
Rep Power: 5
Fubarable is on a distinguished road
Default
Have you tried back-slashing it? i.e., instead of "[" use "\["

caveat: I'm no expert in the field of regex.
Bookmark Post in Technorati
Reply With Quote
  #3 (permalink)  
Old 12-02-2008, 04:02 AM
Member
 
Join Date: Dec 2008
Posts: 6
Rep Power: 0
wntdaliv is on a distinguished road
Default
Yeah I did, unfortunately "\[" is an invalid escape sequence.

I think it's because "\" is itself a metacharacter which has to be followed used with something like "\n" for a new line, etc
Bookmark Post in Technorati
Reply With Quote
  #4 (permalink)  
Old 12-02-2008, 04:15 AM
Member
 
Join Date: Dec 2008
Posts: 6
Rep Power: 0
wntdaliv is on a distinguished road
Default Update
ok so I've found on the java pages that you can supposedly force a metcharacter to be treated like a regular character if you:

Quote:

There are two ways to force a metacharacter to be treated as an ordinary character:

precede the metacharacter with a backslash, or
enclose it within \Q (which starts the quote) and \E (which ends it).
When using this technique, the \Q and \E can be placed at any location within the expression, provided that the \Q comes first.
However, when I try to do this, my compiler (Eclipse) gives me the error:

Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )

Any ideas about getting around this?
Bookmark Post in Technorati
Reply With Quote
  #5 (permalink)  
Old 12-02-2008, 04:27 AM
Fubarable's Avatar
Moderator
 
Join Date: Jun 2008
Posts: 3,190
Rep Power: 5
Fubarable is on a distinguished road
Default
Let's see your code.

Also, are you only interested in the text between the brackets and not interested in the other text? Exactly what is your goal here?
Bookmark Post in Technorati
Reply With Quote
  #6 (permalink)  
Old 12-02-2008, 04:37 AM
Member
 
Join Date: Dec 2008
Posts: 6
Rep Power: 0
wntdaliv is on a distinguished road
Default
Ok, so my goal is to get the text between the brackets and turn it into an object and take the text that isn't in between brackets and turn it into a different type of object

Here's some code:

Code:
	public Grammar parseFile(File file)
	throws IOException
	{
		Grammar g = new Grammar();
		Scanner scanner = new Scanner(file);
		Pattern pattern = Pattern.compile("["); // This is the trouble
		scanner.useDelimiter(pattern);
		startVariable = scanner.next();
		startVariable = startVariable.substring(1, startVariable.length() - 2);
		
		scanner.useDelimiter("{");
		while(scanner.hasNext())
		{
			g.addRule(parseRule(scanner.next()));
		}
		return g;
	}
I'm trying to take all of the information in the file and turn them into objects (using other classes and such)
Bookmark Post in Technorati
Reply With Quote
  #7 (permalink)  
Old 12-02-2008, 05:10 AM
Fubarable's Avatar
Moderator
 
Join Date: Jun 2008
Posts: 3,190
Rep Power: 5
Fubarable is on a distinguished road
Default
You do know of course that to use back slashes here, you have to double them, right?

For instance, I think that a regex String like this will match anything inside square brackets:
Code:
String regex = "(?<=\\[)([^\\]]*)(?=\\])";
but again, I'm still very new to regexes so I can't guarantee how well this would work.

Last edited by Fubarable; 12-02-2008 at 05:12 AM.
Bookmark Post in Technorati
Reply With Quote
  #8 (permalink)  
Old 12-02-2008, 05:17 AM
Member
 
Join Date: Dec 2008
Posts: 6
Rep Power: 0
wntdaliv is on a distinguished road
Default
haha, all that trouble and I just had to double it. Thanks so much!

"\\{" worked just fine as a delimiter
Bookmark Post in Technorati
Reply With Quote
  #9 (permalink)  
Old 12-02-2008, 05:17 AM
Fubarable's Avatar
Moderator
 
Join Date: Jun 2008
Posts: 3,190
Rep Power: 5
Fubarable is on a distinguished road
Default
For example given this input file:
parsefile.txt
Code:
{
[program]
[statement] NL . NL ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
[statement] NL [program] ;
}
and this code:
MyParse.java
Code:
import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class MyParse
{
  private static final String PATH = "src/dy08/m12/a/";
  private static final String FILE = "parsefile.txt";

  public static void main(String[] args) throws FileNotFoundException
  {
    File toParse = new File(PATH + FILE);
    Scanner scanner = new Scanner(toParse);
    String regex = "(?<=\\[)([^\\]]*)(?=\\])";
    Pattern p = Pattern.compile(regex);
    
    while (scanner.hasNextLine())
    {
      String line = scanner.nextLine();
      Matcher matcher = p.matcher(line);
      int index = 0;
      while (matcher.find(index))
      {
        System.out.print(matcher.group() + ", ");
        index = matcher.start() + 1;
      }
      System.out.println();
    }
  }
}
I get this result:
Code:
program, 
statement, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program, 
statement, program,
Bookmark Post in Technorati
Reply With Quote
  #10 (permalink)  
Old 12-02-2008, 05:19 AM
Fubarable's Avatar
Moderator
 
Join Date: Jun 2008
Posts: 3,190
Rep Power: 5
Fubarable is on a distinguished road
Default
Originally Posted by wntdaliv View Post
haha, all that trouble and I just had to double it. Thanks so much!
Cool. I'm glad the fix was so simple.
Bookmark Post in Technorati
Reply With Quote
  #11 (permalink)  
Old 12-02-2008, 06:42 AM
Senior Member
 
Join Date: Sep 2008
Posts: 607
Rep Power: 1
Darryl.Burke is on a distinguished road
Default
Just to expand on the reason for doubling backslashes for regex Strings:

-- the backslash is the quote character for a String literal
-- to include a single backslash in the value of a String variable assigned from a String literal, you have to quote it by preceding it with another backslash

A simple test that helps this sink in:
Code:
System.out.println("\\".length()); // prints 1
This is important to understand especially when a \ character is required to be matched by regex. Since the \ is also the quoting character for a regex pattern, you now need 4 backslashes in the String literal:
Code:
String regex = "\\\\";
results in the value of the variable regex being "\\" which results in the regex matching a single "\"

If you were reading a regex String from a text file or a JOptionPane#showImputDialog, you would use single, not double, backslashes to quote any regex metacharacter. Two backslashes from such a source would match the backslash character itself.

db
Bookmark Post in Technorati
Reply With Quote
Reply

Bookmarks

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


Similar Threads
Thread Thread Starter Forum Replies Last Post
delimiter satin New To Java 2 11-17-2008 10:50 PM


All times are GMT +2. The time now is 09:35 PM.



VBulletin, Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Content Relevant URLs by vBSEO ©2009, Crawlability, Inc.
Copyright ©2006 - 2007, www.java-forums.org