Results 1 to 6 of 6
  1. #1
    hedonist is offline Member
    Join Date
    Jun 2009
    Posts
    35
    Rep Power
    0

    Default Tokenizer related testing

    Hi,
    I am tokenizing java files. I have separated the java codes into keywords and identifiers(variables). Now i have to test whether the tokens generated are correct or not. It seems to be working for few lines of codes when i inspect them manually, but i am having problem when the file is very large. I cannot manually inspect large files and check every token. We could say that if the system works for smaller files then it should work for larger ones as well, but how do i know that for sure? Are there any ways to test large files?

    I thank you for your effort.

  2. #2
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,371
    Blog Entries
    1
    Rep Power
    20

  3. #3
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,371
    Blog Entries
    1
    Rep Power
    20

    Default

    Java provide various ways to work on with data, and processing. Java IO API provide the basic functionality for file processing as well.

    The InputStream class has two main methods for reading data from a file.

    1. int read()
    3. int read(byte[] b,int off,int len)

    The first method reads only one byte of data at a time, whereas the second one reads up to len bytes of data from the stream into an array of bytes. Obviously, the second method gains in performance. Means you can read a part of the stream at a time, a block at once, which is effective.

  4. #4
    hedonist is offline Member
    Join Date
    Jun 2009
    Posts
    35
    Rep Power
    0

    Default

    Thanks Eranga. I used StreamTokenizer to separate the tokens. The basic part is done but i am still having problem tokenizing codes that do not have any delimiters such as ';'. For example, i want to tokenize visual basic(VB) codes from java using StreamTokenizer so, what i did was, i tried checking for space as delimiter since VB may not have delimiters(such as ';') to separate different lines of codes.
    I was expecting the following code would separate tokens based on space (which has ascii value of 32 in integer). But it doesn't work.

    if(StreamTokenizer.ttype==32)
    //separate the codes.

    Thank you beforehand

  5. #5
    JosAH's Avatar
    JosAH is offline Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,728
    Blog Entries
    7
    Rep Power
    21

    Default

    A tokenizer for a programming language (such as Java) should at least know about literals (int, double, String, character), reserved words, identifier names and, yes, tokens (operators, parentheses etc). Everything that can't be separated by a token type classification is separated by white space, i.e. int3 is an identifier name and forx is too. Tokenizing is not difficult but it isn't as simple as separting groups of characters based on a single separator.

    kind regards,

    Jos

  6. #6
    Eranga's Avatar
    Eranga is offline Moderator
    Join Date
    Jul 2007
    Location
    Colombo, Sri Lanka
    Posts
    11,371
    Blog Entries
    1
    Rep Power
    20

Similar Threads

  1. String Tokenizer
    By redasu in forum Advanced Java
    Replies: 4
    Last Post: 02-19-2010, 04:30 AM
  2. java tokenizer
    By mia69 in forum New To Java
    Replies: 3
    Last Post: 02-06-2010, 07:41 PM
  3. building a tokenizer
    By Igor in forum New To Java
    Replies: 2
    Last Post: 01-20-2010, 06:45 PM
  4. string tokenizer
    By twinytwo in forum New To Java
    Replies: 2
    Last Post: 03-26-2009, 03:10 PM
  5. Parsing or Tokenizer??
    By hiklior in forum New To Java
    Replies: 15
    Last Post: 05-28-2008, 03:20 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •