Results 1 to 1 of 1
Thread: Graduate Research using NER
- 11-20-2012, 09:07 PM #1
Member
- Join Date
- Nov 2012
- Posts
- 1
- Rep Power
- 0
Graduate Research using NER
Hello!
I am currently a graduate student at a University in Pittsburgh, PA.
I am trying to utilize a Named Entity Recognizer (NER) for use in JGAAP, a free software that analyzes text for authorship attribution. I am working on a project that uses the NER from Stanford University and utilizes it to 1) Locate the named entity, and 2) return the words an author uses that are before and after the named entity. I am doing this in order to see if there is any connection between authors who frequently use certain words before/after named entities and if so, can we use analysis measures to determine which author wrote a piece of work depending on the use of these before/after words.
I am fairly new to java programming , have taken 2 courses and done other small projects, but I am not quite sure where my current code is failing.
I have attached two text files, one with the code for 1) Recognizing the named entities in work and returning the list, and the second for 2) returning the words an author uses that are before and after the entity with a count of frequency.
Currently, the first file is returning a list of null for each word. I can not manage to get the NER to actually recognize a named entity. This problem then leads to the problem with the second part of code.
I have been working on this project for a few weeks and am stumped as to where to go next. If anyone has any input, please respond.
Thank you very much for your time and assistance.
FIRST CODE:
SECOND CODE:Java Code:package com.jgaap.eventDrivers; import edu.stanford.nlp.ie.AbstractSequenceClassifier; import edu.stanford.nlp.ie.crf.*; import edu.stanford.nlp.ling.CoreLabel; import java.util.List; import com.jgaap.generics.Document; import com.jgaap.generics.EventDriver; import com.jgaap.generics.EventGenerationException; import com.jgaap.generics.EventSet; import com.jgaap.generics.Event; public class StanfordNamedEntityRecognizer extends EventDriver { private volatile AbstractSequenceClassifier<CoreLabel> classifier; @Override public String displayName() { return "Stanford Named Entity Recognizer"; } @Override public String tooltipText() { return "A Named Entity Recognizer developed by the Stanford NLP Group http://nlp.stanford.edu"; } @Override public boolean showInGUI() { return true; } @Override synchronized public EventSet createEventSet(Document doc) throws EventGenerationException { EventSet eventSet = new EventSet(); // String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz"; original classifier String serializedClassifier = "/com/jgaap/resources/models/ner/english.muc.7class.distsim.crf.ser.gz"; // Runs with this one too. Still no output besides list of null if (classifier == null) synchronized (this) { if (classifier == null) { try { classifier = CRFClassifier.getJarClassifier(serializedClassifier, null); } catch (Exception e) { e.printStackTrace(); throw new EventGenerationException( "Classifier failed to load"); } } } String fileContents = doc.stringify(); List<List<CoreLabel>> out = classifier.classify(fileContents); for (List<CoreLabel> sentence : out) { for (CoreLabel word : sentence) { System.out.println(word.ner()); // Added to see if it is finding any words. Everything is being returned null. if (word.ner() != null) { eventSet.addEvent(new Event(word.word())); System.out.println(word.toString() + "\t" + word.word() + "\t" + word.ner()); } } } return eventSet; } }
Java Code:package com.jgaap.eventDrivers; import edu.stanford.nlp.ie.AbstractSequenceClassifier; import edu.stanford.nlp.ie.crf.CRFClassifier; import edu.stanford.nlp.ling.CoreLabel; import java.util.List; import com.jgaap.generics.Document; import com.jgaap.generics.Event; import com.jgaap.generics.EventDriver; import com.jgaap.generics.EventGenerationException; import com.jgaap.generics.EventSet; public class WordsBeforeAfterNamedEntities extends EventDriver { private volatile AbstractSequenceClassifier<CoreLabel> classifier; String serializedClassifier = "com.jgaap.generics.Document"; @Override public String displayName() { return "Words Before and After Named Entities"; } @Override public String tooltipText() { return "Counts the words used before and after named entities"; } @Override public boolean showInGUI() { return true; } @Override public EventSet createEventSet(Document doc) throws EventGenerationException { EventSet eventSet = new EventSet(); String serializedClassifier = "/com/jgaap/resources/models/ner/english.all.3class.distsim.crf.ser.gz"; if (classifier == null) synchronized (this) { if (classifier == null) { try { classifier = CRFClassifier.getJarClassifier( serializedClassifier, null); } catch (Exception e) { e.printStackTrace(); throw new EventGenerationException( "Classifier failed to load"); } } } String fileContents = doc.stringify(); List<List<CoreLabel>> out = classifier.classify(fileContents); for (int i = 0; i < out.size(); i++) { for (int j = 0; j < out.get(i).size(); j++) { if (out.get(i).get(j).ner() != null) { if (j > 0) { eventSet.addEvent(new Event("B" + out.get(i).get(j - 1).word())); } if (j < out.get(i).size() - 1) { eventSet.addEvent(new Event("A" + out.get(i).get(j + 1).word())); } } } } return eventSet; } }Last edited by JosAH; 11-21-2012 at 04:26 PM. Reason: added [ciode] ... [/code] tags
Similar Threads
-
Graduate Software Developer, Buckinghamshire, UK
By JAllenby in forum Jobs OfferedReplies: 0Last Post: 02-18-2010, 05:39 PM -
Creative graduate programmers needed!
By kirstyjenifer in forum Jobs OfferedReplies: 2Last Post: 08-25-2009, 02:05 PM -
Looking for creative graduate geeks!
By kirstyjenifer in forum Jobs DiscussionReplies: 2Last Post: 07-17-2009, 05:33 PM -
Research Ideas
By hawaiifiver in forum Forum LobbyReplies: 2Last Post: 02-03-2009, 04:43 AM -
Research Survey
By Undergrad in forum New To JavaReplies: 10Last Post: 11-02-2008, 06:08 PM


LinkBack URL
About LinkBacks
Reply With Quote
Bookmarks