Results 1 to 14 of 14
  1. #1
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default Seeking examples of text processing code

    Hello!
    Can someone point me to a book or a web site that has a lot of tutorial-type examples in the area of text (natural language) processing? Related to this is an exploding field of applications named "text analytics" and some of the major programs here are JAVA-based. Thanks in advance.

  2. #2
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default

    You know more about this than I do, so why don't you Google it? There is a reason why Google has it's own religion now.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  3. #3
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    Thanks. I was hoping to find a new colleague who is already "into" this class of JAVA-based applications and knows a good place where beginners would find annotated code.

  4. #4
    Dark's Avatar
    Dark is offline Senior Member
    Join Date
    Apr 2011
    Location
    Camp Lejuene, North Carolina
    Posts
    643
    Rep Power
    4

    Default

    Well before you jump into any special area of programming, its normally a good idea to understand the base language first.
    • Use [code][/code] tags when posting code. That way people don't want to stab their eyes out when trying to help you.
    • +Rep people for helpful posts.

  5. #5
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    Of course, and I **did** do that before posting my question in this Forum. I guess I am in the wrong place. Cheers!

  6. #6
    JosAH's Avatar
    JosAH is online now Moderator
    Join Date
    Sep 2008
    Location
    Voorschoten, the Netherlands
    Posts
    13,662
    Blog Entries
    7
    Rep Power
    21

    Default

    Quote Originally Posted by theordore View Post
    Hello!
    Can someone point me to a book or a web site that has a lot of tutorial-type examples in the area of text (natural language) processing? Related to this is an exploding field of applications named "text analytics" and some of the major programs here are JAVA-based. Thanks in advance.
    If you want to parse a natural language, grammar based, have a look at Daniel Sleator's 'link grammar'. Basically it's just a contect free grammar parser with a huge dictionary of words including their inflections. Sorry, no link here but Google is your friend.

    kind regards,

    Jos
    cenosillicaphobia: the fear for an empty beer glass

  7. #7
    Jodokus's Avatar
    Jodokus is offline Senior Member
    Join Date
    Jan 2011
    Location
    Amsterdam, the Netherlands
    Posts
    230
    Rep Power
    4

    Default

    jtmt - Java Text Mining Toolkit
    It is a long trail of articles on textmining starting in 2008 and will take a lot of work and downloading libraries, but I got most of it working.
    No bug ever had to calculate its fitnessfunction.

  8. #8
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    Thanks Jos, I will check your link. It will probably be useful if I can get at the source code. Yes, Google is helpful -- I found one site that specializes in string processing with JAVA. Cheers!

  9. #9
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    RE: jtmt - Java Text Mining Toolkit
    Jodokus, I think that with you I have hit the jackpot; because my primary interest was in finding a colleague who is "into" this type of programming, so that we can communicate. I have been ramping up my JAVA learning (I already write in several other languages) in order to get into the two most well-known open source NLP packages, both of which are JAVA-based: LUCENE and GATE. I am doing this for my org., which will not approve the spending to buy the commercial software being sold by people like IBM and SAS. Their programs, esp. what IBM+SPSS is selling, seem very very powerful but even a one-desk license costs an arm and a leg. So I'm trying to see if we could get done what we need to do by modifying LUCENE (esp.). I'm interested to know whether your toolkit is available in some way, and will send you a follow-up private message.

  10. #10
    Jodokus's Avatar
    Jodokus is offline Senior Member
    Join Date
    Jan 2011
    Location
    Amsterdam, the Netherlands
    Posts
    230
    Rep Power
    4

    Default

    @Theodore,
    I surely don't hope that I gave the impression that it's MY toolkit. It is from a Californian programmer, Sujit Pal, who did a very good job learning textmining and developing code in the process that he published on his weblog.
    When I said "I got it working" I ment reading it, downloading it and searching for the needed libraries. I'm willing to help you, but I'm not "The Sujit Pal" himself. I liked the trail because he developes it over the months and in my opinion it is impressive code, but I can't judge it compared with professional software.
    I'll read your message tomorrow more thorough to understand your intentions better.
    No bug ever had to calculate its fitnessfunction.

  11. #11
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    Thank you for your clarification, Jodokus. I look forward to hearing from you again once you have had a chance to reflect on the particular application that I have in mind.

    I visited Sujit Pal's site after sending you my previous message, and looked at the number of the files. It is an impressive collection; but for someone coming into the field preoccupied with the practical application I have in mind, I think I would need a lot of introductory documentation in order to collect and properly link the modules he has that would be useful for my work. Unfortunately, he says that he is paid very little attention to documentation beyond a lot of clarifying comments inserted into his code, and trying to make progress by working with these comments would represent too great a time investment for me.

    At the same time, I feel that I represent a class of professionals and small-business people that constitute an important opportunity to people like Sujit, if he's inclined to be entrepreneurial. Practical applications of natural language processing are " sprouting all over the place", and not only for the big companies in biotechnology and large-scale marketing to consumers that are supporting the related software work being done by giants like IBM and SAS.

    People like me have a decent shot at gaining benefits from such applications, since we are already reasonably tech-savvy; but they have to be packaged and delivered to us with appropriate documentation ( about organizing input data for example) and "buttons for us to press", so that we can quickly use a program to go through a vast block of electronic text, and come back out with information that can help us in our work.

    The time demands of our work are such that there is no chance for us to dig is deeply into the software design dimension as Sujit has, so if he wants to be successfully entrepreneurial in dealing with us he needs to meet us part-way between where he is as a software developer and where we are as people needing applications of natural language processing.

    Cheers!

  12. #12
    Jodokus's Avatar
    Jodokus is offline Senior Member
    Join Date
    Jan 2011
    Location
    Amsterdam, the Netherlands
    Posts
    230
    Rep Power
    4

    Default

    A warning first: it is considered very bad behaviour to take a discussion offline (away from the forum), especially one-sided. The normal reaction is to publish everything right away in the open, what I intend to do next time
    (and not guaranteeing that I won't when it becomes relevant in my opinion). I definitily don't plan to discuss anything offside the forum.

    People like me have a decent shot at gaining benefits from such applications, since we are already reasonably tech-savvy; but they have to be packaged and delivered to us with appropriate documentation ( about organizing input data for example) and "buttons for us to press", so that we can quickly use a program to go through a vast block of electronic text, and come back out with information that can help us in our work.

    The time demands of our work are such that there is no chance for us to dig is deeply into the software design dimension as Sujit has, so if he wants to be successfully entrepreneurial in dealing with us he needs to meet us part-way between where he is as a software developer and where we are as people needing applications of natural language processing.
    This sounds rather fishy and rude to me. There is no reason to lecture "people like Sujit" how to approach "people like you". They are a rare breed of people to be grateful to, publishing their effort in the open for everyone to learn and use.
    But if you think he would be willing to be "entrepreneurial with "us"" just send him a mail, you might be successful.
    Last edited by Jodokus; 05-23-2011 at 09:41 AM. Reason: spelling
    No bug ever had to calculate its fitnessfunction.

  13. #13
    theordore is offline Member
    Join Date
    May 2011
    Posts
    10
    Rep Power
    0

    Default

    RE: A warning first: it is considered very bad behaviour to take a discussion offline (away from the forum) ...
    First, I originated the discussion. Second, the Forum software clearly provides for a private message. Third, are we to believe that people either publish all their thoughts or say nothing -- there is no room for a private discussion? You've got to be joking.

  14. #14
    Jodokus's Avatar
    Jodokus is offline Senior Member
    Join Date
    Jan 2011
    Location
    Amsterdam, the Netherlands
    Posts
    230
    Rep Power
    4

    Default

    No, I'm not joking. It is irrelevant that you started the discussion. A thread is also there to be followed by other users who try to find solutions and to learn. It is inpolite to let them wonder where people are talking about because of them keeping information to themselves.

    It is irrelevant that a tool exists as an argument to use it anyway you want. I could use it for example to ask certain people to help me find that recent thread where somebody was ridiculed to the ground (and his code published) because he send lots of sensitive code as a private message. But then my private message is not informative for others.
    (????? Please help me find it or I send you a private message ?????)

    You can also see it from the perspective of the receiver (me in this case). On a forum I'm free to answer when it suits me, when it doesn't other people take over. Now I'm lured into a kind of personal commitment I didn't ask for. When I get to know somebody he (or she!) and me can decide to start something beautiful and leave the forum wondering what we are doing.

    One compliment: you could opt for "duck and cover", but you chose the confrontation.
    Last edited by Jodokus; 05-24-2011 at 12:16 PM. Reason: more questionmarks
    No bug ever had to calculate its fitnessfunction.

Similar Threads

  1. Seeking a special beginner's book
    By theordore in forum New To Java
    Replies: 5
    Last Post: 05-09-2011, 03:28 AM
  2. Replies: 0
    Last Post: 01-21-2011, 11:33 PM
  3. seeking advice on using paintImmediately()
    By gib65 in forum AWT / Swing
    Replies: 1
    Last Post: 06-28-2010, 05:33 PM
  4. Replies: 4
    Last Post: 11-04-2009, 01:02 PM
  5. Help with dice game...student seeking advice
    By waparson in forum New To Java
    Replies: 3
    Last Post: 07-21-2008, 03:31 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •