-
Notifications
You must be signed in to change notification settings - Fork 23
SDK NLP
André Santos edited this page Nov 20, 2016
·
2 revisions
Users can apply NLP processing techniques to a document. There are 5 parsing levels for NLP processing: tokenization, part-of-speech (POS) tagging, lemmatization, chunking and dependency parsing.
The following source code snippet shows how to apply NLP processing techniques to a document and retrieve its results, by creating a processing pipeline and using the data provided on the "example" folder.
// Set files
String documentFile = "example/annotate/in/22528326.txt";
String outputFile = "example/annotate/out/22528326.conl";
// Create reader
Reader reader = new RawReader();
// Parsing level
//ParserLevel parsingLevel = ParserLevel.TOKENIZATION;
//ParserLevel parsingLevel = ParserLevel.POS;
//ParserLevel parsingLevel = ParserLevel.LEMMATIZATION;
ParserLevel parsingLevel = ParserLevel.CHUNKING;
//ParserLevel parsingLevel = ParserLevel.DEPENDENCY;
// Create parser
Parser parser = new GDepParser(ParserLanguage.ENGLISH, parsingLevel,
new LingpipeSentenceSplitter(), false).launch();
// Create NLP
NLP nlp = new NLP(parser);
// Create Writer
Writer writer = new CoNLLWriter();
// Set document stream
InputStream documentStream = new FileInputStream(documentFile);
// Run pipeline to get annotations
Pipeline pipeline = new DefaultPipeline()
.add(reader)
.add(nlp)
.add(writer);
OutputStream outputStream = pipeline.run(documentStream).get(0);
// Write annotations to output file
FileUtils.writeStringToFile(new File(outputFile), outputStream.toString());
// Close streams
documentStream.close();
outputStream.close();
// Close parser
parser.close();