Word2vec-For-NER-In-Low-Resource-Languages

An efficient language-free distributionnal representation for the projection of named entities from a rich language to low-resource languages. Checkout the paper for more details

Process flow diagram

Repository description

The data folder contains the annotations of the corpus using Brat.
The result folder contains the results (F1, Precision, Recall) by tag for the 10 repetitions of the training. The file (results.csv)[/~https://github.com/frankl1/Word2vec-For-NER-In-Low-Resource-Languages/blob/master/results.csv] constains the averaged results for the 10 repetitions.
The jupyter notebook brat2corpus.ipynb is used to convert brat annotation to two parallel copora corpus-ewo.txt (for the Ewondo side) and corpus-en.txt (for the English side). Each line in these file is a token with its associated annotation. Blank lines denote the end of a phrase (verse in our case). The two files are aligned such that a phrase and its translation are at the same position.
The jupyter notebooks fnn-ne-projection-corpus-based-tf.ipynb, fnn-ne-projection-phrase-based.ipynb, and fnn-ne-projection-side-based.ipynb are the source codes used to train the model and compute performances using respectively the corpus-based term frequency, the phrase-based term frequency, and the side-based term frequency.

Reproduce the experiment

After cloning this repository, conda can be used to install all the dependencies with the command conda env create -f requirements.yml

Citing this work

@article{mbouopda2020named,
  title={Named Entity Recognition in Low-resource Languages using Cross-lingual distributional word representation},
  author={Mbouopda, Michael Franklin and Melatagia Yonta, Paulin},
  journal={Revue Africaine de la Recherche en Informatique et Math{\'e}matiques Appliqu{\'e}es},
  volume={33},
  year={2020},
  publisher={Episciences. org}
}

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
results		results
README.md		README.md
_DS_Store		_DS_Store
best-model-conll.hdfs		best-model-conll.hdfs
brat2corpus.ipynb		brat2corpus.ipynb
corpus-en.txt		corpus-en.txt
corpus-ewo.txt		corpus-ewo.txt
desktop.ini		desktop.ini
en-corpus.png		en-corpus.png
ewo-corpus.png		ewo-corpus.png
fnn-ne-projection-corpus-based-tf.ipynb		fnn-ne-projection-corpus-based-tf.ipynb
fnn-ne-projection-phrase-based.ipynb		fnn-ne-projection-phrase-based.ipynb
fnn-ne-projection-side-based.ipynb		fnn-ne-projection-side-based.ipynb
method.png		method.png
model.png		model.png
requirements.yml		requirements.yml
results.csv		results.csv
sigmoid-vs-relu.png		sigmoid-vs-relu.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word2vec-For-NER-In-Low-Resource-Languages

Process flow diagram

Repository description

Reproduce the experiment

Citing this work

About

Releases

Packages

Languages

frankl1/Word2vec-For-NER-In-Low-Resource-Languages

Folders and files

Latest commit

History

Repository files navigation

Word2vec-For-NER-In-Low-Resource-Languages

Process flow diagram

Repository description

Reproduce the experiment

Citing this work

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages