Results 21 to 30 of 30
- 01-23-2011, 03:21 PM #21
A regex could convert your string into a series of tokens, namely a stack:Sure, but what would be the value of recognizing a literal String in a mess such as this: ((;"foo))"
which could then be fed into a derivation table for a series of reductions and or code generation. That's the whole point of compiler. The scanner portion, simply tokenizes the input into discrete tokens which are a combination of literal values, terminal and non terminal symbols described in the grammar for the language.Java Code:" ) ) foo " ; ( (
- 01-23-2011, 04:06 PM #22
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,400
- Blog Entries
- 7
- Rep Power
- 17
No, that is not what the tokenizer would produce; it would produce a sequence of tokens like this: (, (, ;, "foo))". The compiler would strongly complain about this sequence but the literal String "foo))" would've been recognized by the tokenizer. That's why REs are of little use here and you do indeed need a full parser if one wants to recognize methods, etc.
kind regards,
JosWhen people rob a bank they get a penalty; when banks rob people they get a bonus.
- 01-24-2011, 12:45 AM #23
I don't know what to tell you. I wrote one. It works fine. It uses regex in java. It produces tokens like I described. It converts high level languages like PL/SQL into assembly. As I described, the tokens in and of themselves are useless without a grammar. The string you provided would end in error, but as a result of a failed reduction when recursing through the grammar. Whether this happens in the tokenizer or the parser doesn't really matter in the end, as bad syntax is bad syntax. Perhaps we're just not understanding each other?
- 01-24-2011, 07:49 AM #24
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,400
- Blog Entries
- 7
- Rep Power
- 17
I agree completely that just a tokenizer is useless without a (type 2) grammar parser. REs are inadequate for parsing programming languages such as Java; that was my whole point. b.t.w. what surprises me is that your tokenizer chops up literal strings. It has to preserve white space then, e.g. the string "foo bar baz" consists of seven tokens in your tokenizer: two double quotes, two spaces and the elementary tokens foo, bar and baz. I find that a bit unusual.
kind regards,
JosWhen people rob a bank they get a penalty; when banks rob people they get a bonus.
- 01-24-2011, 01:58 PM #25
The whitespace is left in the string and the elements are clipped off one at a time. If anything other then whitespace is left in the string after all matches have been found, then the grammar was in error. The only preprocessing it did was to remove multi line comments, and end of line comments - the rest was handled with repeated match/removal with the removed tokens being pushed onto the stack. This allowed some errors (invalid characters or structure) to be found immediately before the grammar table was consulted.
- 01-24-2011, 02:08 PM #26
Here is an example from my old compiler manual:
Where ! is an illegal character in the context of :=Java Code:my_int INT( 10 ):!=5 ;--Somecommenthere my_int INT( 10 ):!=5 ; my_int INT ( 10 ) :!= 5 ; INT ( 10 ) :!= 5 ; ( 10 ) :!= 5 ; 10 ) :!= 5 ; ) :!= 5 ; :!= 5 ; != 5 ; !5; !; ! ! ERROR
All of the matches are removed from the string one at a time, non-matches are left behind. Anything other than spaces left in the resulting string is automatically in error. This eliminates many syntax and some code structure errors (but not all).
The rest is done during parsing, reduction, assignment table insertion, and code generation.
- 01-24-2011, 03:07 PM #27
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,400
- Blog Entries
- 7
- Rep Power
- 17
I don't understand that example because it allows for out of order parsing of the tokens; e.g. if your example would've been:
near the end of the reduction you'd end up with this:Java Code:my_int INT( 10 ):-=5 ;--Somecommenthere
As I wrote: I don't understand your example and I don't understand what parsing method you are describing; it certainly isn't LALR(1) or LL(k) because both of them are strictly left to right parsing methods ...Java Code::-= 5 ; -= 5 ; -5; ; <no error>
kind regards,
JosWhen people rob a bank they get a penalty; when banks rob people they get a bonus.
- 01-24-2011, 03:24 PM #28
Parsing hasn't occurred yet, this is simply scanning and tokenizing. Parsing happens after list of tokens has been produced.
- 01-24-2011, 03:42 PM #29
- Join Date
- Sep 2008
- Location
- Voorschoten, the Netherlands
- Posts
- 11,400
- Blog Entries
- 7
- Rep Power
- 17
When people rob a bank they get a penalty; when banks rob people they get a bonus.
- 01-24-2011, 03:56 PM #30
Essentially, we can recognize : we can recognize = and we can recognize := but not :!= . In fact, the ! is completely illegal in the syntax unless inside a string literal. Since :!= is invalid, the regex looks for the next biggest match, in this case just : .
! is illegal outside of a literal, so it is ignored and the next match occurs, which was = . Had the syntax been correct := then a token would have been created as := . But that pattern wasn't found. Only matches are removed, so anything that doesn't fit the pattern defined in the regex is ignored. At the end of the processing of a line, the system determines wether or not an error occurred (though, it doesn't know where in the line it happened, just that something isn't right) and if so, compilation halts, and the line that caused error is returned for the user to debug.
Similar Threads
-
java compiler source code
By vkolluru in forum New To JavaReplies: 13Last Post: 02-24-2010, 07:28 AM -
[SOLVED] Source code of Java API
By Gudradain in forum New To JavaReplies: 3Last Post: 01-01-2009, 05:17 AM -
Can we Obtain Java Source Code?
By tornado in forum New To JavaReplies: 7Last Post: 12-10-2008, 07:23 PM -
MavenJava - browse source code of all open source projects online
By jirkacelak in forum Java SoftwareReplies: 1Last Post: 11-28-2008, 06:27 PM -
open source java code
By reena in forum Advanced JavaReplies: 1Last Post: 04-19-2008, 06:57 PM


LinkBack URL
About LinkBacks
Reply With Quote

Bookmarks