Results 1 to 11 of 11
  1. #1
    wntdaliv is offline Member
    Join Date
    Dec 2008
    Posts
    6
    Rep Power
    0

    Default using Delimiter with metacharacters

    Hi I'm trying to parse a file that splits up information by being surrounded by brackets. ie, [] and {}.

    example segment of file:

    {
    [program]
    [statement] NL . NL ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    }

    I'm trying to parse each token within [] (ex, [statement]) to an object, however I can't use the delimiter("[") to split up the tokens and use the next() method because [] is a metacharacter

    How can I go about parsing my tokens? Is there a work around to force "[" to be a pattern in itself like any other pattern that you can use delimiters with or is there something completely different that I can do?

    Thanks in advance,
    wntdaliv

  2. #2
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    Have you tried back-slashing it? i.e., instead of "[" use "\["

    caveat: I'm no expert in the field of regex.

  3. #3
    wntdaliv is offline Member
    Join Date
    Dec 2008
    Posts
    6
    Rep Power
    0

    Default

    Yeah I did, unfortunately "\[" is an invalid escape sequence.

    I think it's because "\" is itself a metacharacter which has to be followed used with something like "\n" for a new line, etc

  4. #4
    wntdaliv is offline Member
    Join Date
    Dec 2008
    Posts
    6
    Rep Power
    0

    Default Update

    ok so I've found on the java pages that you can supposedly force a metcharacter to be treated like a regular character if you:


    There are two ways to force a metacharacter to be treated as an ordinary character:

    precede the metacharacter with a backslash, or
    enclose it within \Q (which starts the quote) and \E (which ends it).
    When using this technique, the \Q and \E can be placed at any location within the expression, provided that the \Q comes first.
    However, when I try to do this, my compiler (Eclipse) gives me the error:

    Invalid escape sequence (valid ones are \b \t \n \f \r \" \' \\ )

    Any ideas about getting around this?

  5. #5
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    Let's see your code.

    Also, are you only interested in the text between the brackets and not interested in the other text? Exactly what is your goal here?

  6. #6
    wntdaliv is offline Member
    Join Date
    Dec 2008
    Posts
    6
    Rep Power
    0

    Default

    Ok, so my goal is to get the text between the brackets and turn it into an object and take the text that isn't in between brackets and turn it into a different type of object

    Here's some code:

    Java Code:
    	public Grammar parseFile(File file)
    	throws IOException
    	{
    		Grammar g = new Grammar();
    		Scanner scanner = new Scanner(file);
    		Pattern pattern = Pattern.compile("["); // This is the trouble
    		scanner.useDelimiter(pattern);
    		startVariable = scanner.next();
    		startVariable = startVariable.substring(1, startVariable.length() - 2);
    		
    		scanner.useDelimiter("{");
    		while(scanner.hasNext())
    		{
    			g.addRule(parseRule(scanner.next()));
    		}
    		return g;
    	}
    I'm trying to take all of the information in the file and turn them into objects (using other classes and such)

  7. #7
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    You do know of course that to use back slashes here, you have to double them, right?

    For instance, I think that a regex String like this will match anything inside square brackets:
    Java Code:
    String regex = "(?<=\\[)([^\\]]*)(?=\\])";
    but again, I'm still very new to regexes so I can't guarantee how well this would work.
    Last edited by Fubarable; 12-02-2008 at 05:12 AM.

  8. #8
    wntdaliv is offline Member
    Join Date
    Dec 2008
    Posts
    6
    Rep Power
    0

    Default

    haha, all that trouble and I just had to double it. Thanks so much!

    "\\{" worked just fine as a delimiter

  9. #9
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    For example given this input file:
    parsefile.txt
    Java Code:
    {
    [program]
    [statement] NL . NL ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    [statement] NL [program] ;
    }
    and this code:
    MyParse.java
    Java Code:
    import java.io.File;
    import java.io.FileNotFoundException;
    import java.util.Scanner;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    public class MyParse
    {
      private static final String PATH = "src/dy08/m12/a/";
      private static final String FILE = "parsefile.txt";
    
      public static void main(String[] args) throws FileNotFoundException
      {
        File toParse = new File(PATH + FILE);
        Scanner scanner = new Scanner(toParse);
        String regex = "(?<=\\[)([^\\]]*)(?=\\])";
        Pattern p = Pattern.compile(regex);
        
        while (scanner.hasNextLine())
        {
          String line = scanner.nextLine();
          Matcher matcher = p.matcher(line);
          int index = 0;
          while (matcher.find(index))
          {
            System.out.print(matcher.group() + ", ");
            index = matcher.start() + 1;
          }
          System.out.println();
        }
      }
    }
    I get this result:
    Java Code:
    program, 
    statement, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program, 
    statement, program,

  10. #10
    Fubarable's Avatar
    Fubarable is offline Moderator
    Join Date
    Jun 2008
    Posts
    19,316
    Blog Entries
    1
    Rep Power
    26

    Default

    Quote Originally Posted by wntdaliv View Post
    haha, all that trouble and I just had to double it. Thanks so much!
    Cool. I'm glad the fix was so simple.

  11. #11
    DarrylBurke's Avatar
    DarrylBurke is offline Member
    Join Date
    Sep 2008
    Location
    Madgaon, Goa, India
    Posts
    11,193
    Rep Power
    19

    Default

    Just to expand on the reason for doubling backslashes for regex Strings:

    -- the backslash is the quote character for a String literal
    -- to include a single backslash in the value of a String variable assigned from a String literal, you have to quote it by preceding it with another backslash

    A simple test that helps this sink in:
    Java Code:
    System.out.println("\\".length()); // prints 1
    This is important to understand especially when a \ character is required to be matched by regex. Since the \ is also the quoting character for a regex pattern, you now need 4 backslashes in the String literal:
    Java Code:
    String regex = "\\\\";
    results in the value of the variable regex being "\\" which results in the regex matching a single "\"

    If you were reading a regex String from a text file or a JOptionPane#showImputDialog, you would use single, not double, backslashes to quote any regex metacharacter. Two backslashes from such a source would match the backslash character itself.

    db

Similar Threads

  1. delimiter
    By satin in forum New To Java
    Replies: 2
    Last Post: 11-17-2008, 10:50 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •