Page 2 of 2 FirstFirst 12
Results 21 to 30 of 30
  1. #21
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    Sure, but what would be the value of recognizing a literal String in a mess such as this: ((;"foo))"
    A regex could convert your string into a series of tokens, namely a stack:
    Java Code:
    "
    )
    )
    foo
    "
    ;
    (
    (
    which could then be fed into a derivation table for a series of reductions and or code generation. That's the whole point of compiler. The scanner portion, simply tokenizes the input into discrete tokens which are a combination of literal values, terminal and non terminal symbols described in the grammar for the language.

  2. #22
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,560
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by quad64bit View Post
    A regex could convert your string into a series of tokens, namely a stack:
    Java Code:
    "
    )
    )
    foo
    "
    ;
    (
    (
    which could then be fed into a derivation table for a series of reductions and or code generation. That's the whole point of compiler. The scanner portion, simply tokenizes the input into discrete tokens which are a combination of literal values, terminal and non terminal symbols described in the grammar for the language.
    No, that is not what the tokenizer would produce; it would produce a sequence of tokens like this: (, (, ;, "foo))". The compiler would strongly complain about this sequence but the literal String "foo))" would've been recognized by the tokenizer. That's why REs are of little use here and you do indeed need a full parser if one wants to recognize methods, etc.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  3. #23
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    I don't know what to tell you. I wrote one. It works fine. It uses regex in java. It produces tokens like I described. It converts high level languages like PL/SQL into assembly. As I described, the tokens in and of themselves are useless without a grammar. The string you provided would end in error, but as a result of a failed reduction when recursing through the grammar. Whether this happens in the tokenizer or the parser doesn't really matter in the end, as bad syntax is bad syntax. Perhaps we're just not understanding each other?

  4. #24
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,560
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by quad64bit View Post
    I don't know what to tell you. I wrote one. It works fine. It uses regex in java. It produces tokens like I described. It converts high level languages like PL/SQL into assembly. As I described, the tokens in and of themselves are useless without a grammar. The string you provided would end in error, but as a result of a failed reduction when recursing through the grammar. Whether this happens in the tokenizer or the parser doesn't really matter in the end, as bad syntax is bad syntax. Perhaps we're just not understanding each other?
    I agree completely that just a tokenizer is useless without a (type 2) grammar parser. REs are inadequate for parsing programming languages such as Java; that was my whole point. b.t.w. what surprises me is that your tokenizer chops up literal strings. It has to preserve white space then, e.g. the string "foo bar baz" consists of seven tokens in your tokenizer: two double quotes, two spaces and the elementary tokens foo, bar and baz. I find that a bit unusual.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  5. #25
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    The whitespace is left in the string and the elements are clipped off one at a time. If anything other then whitespace is left in the string after all matches have been found, then the grammar was in error. The only preprocessing it did was to remove multi line comments, and end of line comments - the rest was handled with repeated match/removal with the removed tokens being pushed onto the stack. This allowed some errors (invalid characters or structure) to be found immediately before the grammar table was consulted.

  6. #26
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    Here is an example from my old compiler manual:
    Java Code:
    my_int	INT(	10 ):!=5 ;--Somecommenthere
    my_int	INT(	10 ):!=5 ;
    my_int INT ( 10 ) :!= 5 ;
     INT ( 10 ) :!= 5 ;
       ( 10 ) :!= 5 ;
        10 ) :!= 5 ;
         ) :!= 5 ;
          :!= 5 ;
          != 5 ;
          !5;
          !;
          !
    ! ERROR
    Where ! is an illegal character in the context of :=
    All of the matches are removed from the string one at a time, non-matches are left behind. Anything other than spaces left in the resulting string is automatically in error. This eliminates many syntax and some code structure errors (but not all).

    The rest is done during parsing, reduction, assignment table insertion, and code generation.

  7. #27
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,560
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by quad64bit View Post
    Here is an example from my old compiler manual:
    Java Code:
    my_int	INT(	10 ):!=5 ;--Somecommenthere
    my_int	INT(	10 ):!=5 ;
    my_int INT ( 10 ) :!= 5 ;
     INT ( 10 ) :!= 5 ;
       ( 10 ) :!= 5 ;
        10 ) :!= 5 ;
         ) :!= 5 ;
          :!= 5 ;
          != 5 ;
          !5;
          !;
          !
    ! ERROR
    Where ! is an illegal character in the context of :=
    All of the matches are removed from the string one at a time, non-matches are left behind. Anything other than spaces left in the resulting string is automatically in error. This eliminates many syntax and some code structure errors (but not all).

    The rest is done during parsing, reduction, assignment table insertion, and code generation.
    I don't understand that example because it allows for out of order parsing of the tokens; e.g. if your example would've been:

    Java Code:
    my_int	INT(	10 ):-=5 ;--Somecommenthere
    near the end of the reduction you'd end up with this:
    Java Code:
          :-= 5 ;
          -= 5 ;
          -5;
          ;
    <no error>
    As I wrote: I don't understand your example and I don't understand what parsing method you are describing; it certainly isn't LALR(1) or LL(k) because both of them are strictly left to right parsing methods ...

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  8. #28
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    Parsing hasn't occurred yet, this is simply scanning and tokenizing. Parsing happens after list of tokens has been produced.

  9. #29
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,560
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by quad64bit View Post
    Parsing hasn't occurred yet, this is simply scanning and tokenizing. Parsing happens after list of tokens has been produced.
    But, if I'm not mistaken, your tokenizing process allows (sub)tokens to be switched, e.g. the sequence :-= forms the token := (which is removed from the queue) and leaves the - in the queue. I don't understand the process at all.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  10. #30
    quad64bit's Avatar
    quad64bit is offline Moderator
    Join Date
    Jul 2009
    Location
    VA
    Posts
    1,323
    Rep Power
    7

    Default

    Essentially, we can recognize : we can recognize = and we can recognize := but not :!= . In fact, the ! is completely illegal in the syntax unless inside a string literal. Since :!= is invalid, the regex looks for the next biggest match, in this case just : .

    ! is illegal outside of a literal, so it is ignored and the next match occurs, which was = . Had the syntax been correct := then a token would have been created as := . But that pattern wasn't found. Only matches are removed, so anything that doesn't fit the pattern defined in the regex is ignored. At the end of the processing of a line, the system determines wether or not an error occurred (though, it doesn't know where in the line it happened, just that something isn't right) and if so, compilation halts, and the line that caused error is returned for the user to debug.

Page 2 of 2 FirstFirst 12

Similar Threads

  1. java compiler source code
    By vkolluru in forum New To Java
    Replies: 13
    Last Post: 02-24-2010, 07:28 AM
  2. [SOLVED] Source code of Java API
    By Gudradain in forum New To Java
    Replies: 3
    Last Post: 01-01-2009, 05:17 AM
  3. Can we Obtain Java Source Code?
    By tornado in forum New To Java
    Replies: 7
    Last Post: 12-10-2008, 07:23 PM
  4. Replies: 1
    Last Post: 11-28-2008, 06:27 PM
  5. open source java code
    By reena in forum Advanced Java
    Replies: 1
    Last Post: 04-19-2008, 06:57 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •