spaCy ● neuralcoref by huggingface ● Neo4j DB ● MongoDB Atlas ● NetworkX
- Introduction
- Setup Instructions
- Getting Started
- Abstract Idea and Problem
- Tradeoff between RAM size and DB access time with the solution
- Implementation
- Future Development
- Results
- License
- Acknowledgements
This repo aims to analyze the Wikipedia pages to understand and extract the information crux, which can make our web searches a lot easier. I am exploiting the opportunity to mine the massive information available on famous personalities around the world on the internet (through Wikipedia pages).
I have used the Knowledge Graph technique to analyze, discover patterns and trends. This repo makes use of spaCy for natural language processing and its compiled language model (only for English textual data) for named entities recognition (NER) and extraction. And it also makes use of the "Fast Coreference Resolution in spaCy with Neural Networks" package neuralcoref to resolve the referencing issues while extracting NER. The extracted knowledge graph is finally stored in a graph database Neo4j for better visualization of links between the collected information.
- Clone this repo:
$ git clone /~https://github.com/Dhyeythumar/Knowledge-Graph-with-Neo4j.git
- Create and activate the python virtual environment (Use python 3.8):
$ virtualenv KG_env -p path/to/your/python/3.8/exe/file
$ KG_env\Scripts\activate
- Install the requirements:
$ cd Knowledge-Graph-with-Neo4j
$ pip install -r requirements.txt
- Install spaCy's English language model:
$ python -m spacy download en_core_web_md
- Build the neuralcoref from the source code (because the package from PyPI is not compatible with spaCy's language model)
$ git clone /~https://github.com/huggingface/neuralcoref.git
$ cd neuralcoref
$ pip install -r requirements.txt
$ pip install -e .
Run this project by just executing:
$ python main.py
With the above setting you are good to go ✌.
But if you want a deeper understanding of how the project is implemented and what trade-off I faced while developing 😀, then check this README file. Other tables of contents are in that file.
Licensed under the MIT License.
- Medium article on "Auto-Generated Knowledge Graphs" for Knowledge Graph implementation and visualization using networkx.