Skip to content

SDK NLP

André Santos edited this page Nov 20, 2016 · 2 revisions

Users can apply NLP processing techniques to a document. There are 5 parsing levels for NLP processing: tokenization, part-of-speech (POS) tagging, lemmatization, chunking and dependency parsing.

The following source code snippet shows how to apply NLP processing techniques to a document and retrieve its results, by creating a processing pipeline and using the data provided on the "example" folder.

// Set files
String documentFile = "example/annotate/in/22528326.txt";
String outputFile = "example/annotate/out/22528326.conl";

 // Create reader
Reader reader = new RawReader();

// Parsing level
//ParserLevel parsingLevel = ParserLevel.TOKENIZATION;
//ParserLevel parsingLevel = ParserLevel.POS;
//ParserLevel parsingLevel = ParserLevel.LEMMATIZATION;
ParserLevel parsingLevel = ParserLevel.CHUNKING;
//ParserLevel parsingLevel = ParserLevel.DEPENDENCY;

// Create parser
Parser parser = new GDepParser(ParserLanguage.ENGLISH, parsingLevel, 
new LingpipeSentenceSplitter(), false).launch();

// Create NLP        
NLP nlp = new NLP(parser);

// Create Writer
Writer writer = new CoNLLWriter();

// Set document stream
InputStream documentStream = new FileInputStream(documentFile);

// Run pipeline to get annotations
Pipeline pipeline = new DefaultPipeline()
       .add(reader)
       .add(nlp)
       .add(writer);

OutputStream outputStream = pipeline.run(documentStream).get(0);

// Write annotations to output file
FileUtils.writeStringToFile(new File(outputFile), outputStream.toString());

// Close streams
documentStream.close();
outputStream.close();

// Close parser
parser.close();
Clone this wiki locally