Results 1 to 1 of 1
  1. #1
    julia1004 is offline Member
    Join Date
    Nov 2012
    Posts
    1
    Rep Power
    0

    Default Graduate Research using NER

    Hello!

    I am currently a graduate student at a University in Pittsburgh, PA.

    I am trying to utilize a Named Entity Recognizer (NER) for use in JGAAP, a free software that analyzes text for authorship attribution. I am working on a project that uses the NER from Stanford University and utilizes it to 1) Locate the named entity, and 2) return the words an author uses that are before and after the named entity. I am doing this in order to see if there is any connection between authors who frequently use certain words before/after named entities and if so, can we use analysis measures to determine which author wrote a piece of work depending on the use of these before/after words.

    I am fairly new to java programming , have taken 2 courses and done other small projects, but I am not quite sure where my current code is failing.

    I have attached two text files, one with the code for 1) Recognizing the named entities in work and returning the list, and the second for 2) returning the words an author uses that are before and after the entity with a count of frequency.

    Currently, the first file is returning a list of null for each word. I can not manage to get the NER to actually recognize a named entity. This problem then leads to the problem with the second part of code.

    I have been working on this project for a few weeks and am stumped as to where to go next. If anyone has any input, please respond.

    Thank you very much for your time and assistance.


    FIRST CODE:
    Java Code:
    package com.jgaap.eventDrivers;
    
    import edu.stanford.nlp.ie.AbstractSequenceClassifier;
    import edu.stanford.nlp.ie.crf.*;
    import edu.stanford.nlp.ling.CoreLabel;
    
    import java.util.List;
    
    import com.jgaap.generics.Document;
    import com.jgaap.generics.EventDriver;
    import com.jgaap.generics.EventGenerationException;
    import com.jgaap.generics.EventSet;
    import com.jgaap.generics.Event;
    
    public class StanfordNamedEntityRecognizer extends EventDriver {
    
    	private volatile AbstractSequenceClassifier<CoreLabel> classifier;
    
    	@Override
    	public String displayName() {
    		return "Stanford Named Entity Recognizer";
    	}
    
    	@Override
    	public String tooltipText() {
    		return "A Named Entity Recognizer developed by the Stanford NLP Group http://nlp.stanford.edu";
    	}
    
    	@Override
    	public boolean showInGUI() {
    		return true;
    	}
    
    	@Override
    	synchronized public EventSet createEventSet(Document doc)
    			throws EventGenerationException {
    		EventSet eventSet = new EventSet();
    //		String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz";  original classifier
    		String serializedClassifier = "/com/jgaap/resources/models/ner/english.muc.7class.distsim.crf.ser.gz";  // Runs with this one too.  Still no output besides list of null
    		if (classifier == null)  
    			synchronized (this) {
    				if (classifier == null) {   
    					try {
    						classifier = CRFClassifier.getJarClassifier(serializedClassifier, null);
    					} catch (Exception e) {
    						e.printStackTrace();
    						throw new EventGenerationException(
    								"Classifier failed to load");
    					}
    				}
    			}
    		
    		String fileContents = doc.stringify();
    		List<List<CoreLabel>> out = classifier.classify(fileContents);
    		for (List<CoreLabel> sentence : out) {
    			for (CoreLabel word : sentence) {
    				System.out.println(word.ner());  // Added to see if it is finding any words.  Everything is being returned null.
    				if (word.ner() != null) {
    					eventSet.addEvent(new Event(word.word()));
    					System.out.println(word.toString() + "\t" + word.word()
    							+ "\t" + word.ner());
    				}
    			}
    		}
    		return eventSet;
    	}
    
    }
    SECOND CODE:

    Java Code:
    package com.jgaap.eventDrivers;
    
    import edu.stanford.nlp.ie.AbstractSequenceClassifier;
    import edu.stanford.nlp.ie.crf.CRFClassifier;
    import edu.stanford.nlp.ling.CoreLabel;
    
    import java.util.List;
    
    import com.jgaap.generics.Document;
    import com.jgaap.generics.Event;
    import com.jgaap.generics.EventDriver;
    import com.jgaap.generics.EventGenerationException;
    import com.jgaap.generics.EventSet;
    
    public class WordsBeforeAfterNamedEntities extends EventDriver {
    
    	private volatile AbstractSequenceClassifier<CoreLabel> classifier;
    
    	String serializedClassifier = "com.jgaap.generics.Document";
    
    	@Override
    	public String displayName() {
    		return "Words Before and After Named Entities";
    	}
    
    	@Override
    	public String tooltipText() {
    		return "Counts the words used before and after named entities";
    	}
    
    	@Override
    	public boolean showInGUI() {
    		return true;
    	}
    
    	@Override
    	public EventSet createEventSet(Document doc)
    			throws EventGenerationException {
    		EventSet eventSet = new EventSet();
    		String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz";
    		if (classifier == null)
    			synchronized (this) {
    				if (classifier == null) {
    					try {
    						classifier = CRFClassifier.getJarClassifier(
    								serializedClassifier, null);
    					} catch (Exception e) {
    						e.printStackTrace();
    						throw new EventGenerationException(
    								"Classifier failed to load");
    					}
    				}
    			}
    		String fileContents = doc.stringify();
    		List<List<CoreLabel>> out = classifier.classify(fileContents);
    
    		for (int i = 0; i < out.size(); i++) {
    			for (int j = 0; j < out.get(i).size(); j++) {
    				if (out.get(i).get(j).ner() != null) {
    					if (j > 0) {
    						eventSet.addEvent(new Event("B"
    								+ out.get(i).get(j - 1).word()));
    					}
    					if (j < out.get(i).size() - 1) {
    						eventSet.addEvent(new Event("A"
    								+ out.get(i).get(j + 1).word()));
    					}
    				}
    			}
    		}
    		return eventSet;
    	}
    }
    Last edited by JosAH; 11-21-2012 at 04:26 PM. Reason: added [ciode] ... [/code] tags

Similar Threads

  1. Graduate Software Developer, Buckinghamshire, UK
    By JAllenby in forum Jobs Offered
    Replies: 0
    Last Post: 02-18-2010, 05:39 PM
  2. Creative graduate programmers needed!
    By kirstyjenifer in forum Jobs Offered
    Replies: 2
    Last Post: 08-25-2009, 02:05 PM
  3. Looking for creative graduate geeks!
    By kirstyjenifer in forum Jobs Discussion
    Replies: 2
    Last Post: 07-17-2009, 05:33 PM
  4. Research Ideas
    By hawaiifiver in forum Forum Lobby
    Replies: 2
    Last Post: 02-03-2009, 04:43 AM
  5. Research Survey
    By Undergrad in forum New To Java
    Replies: 10
    Last Post: 11-02-2008, 06:08 PM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •