Skip to content

Creating knowledge graphs by scraping wiki pages and storing data in the Neo4j Graph DB.

License

Notifications You must be signed in to change notification settings

dhyeythumar/Knowledge-Graph-with-Neo4j

Repository files navigation

Knowledge Graph with Neo4j

Total alerts Language grade: Python

Analyzing Knowledge Graph by scraping Wikipedia pages based on famous personalities.

What’s In This Document

Introduction

This repo aims to analyze the Wikipedia pages to understand and extract the information crux, which can make our web searches a lot easier. I am exploiting the opportunity to mine the massive information available on famous personalities around the world on the internet (through Wikipedia pages).

I have used the Knowledge Graph technique to analyze, discover patterns and trends. This repo makes use of spaCy for natural language processing and its compiled language model (only for English textual data) for named entities recognition (NER) and extraction. And it also makes use of the "Fast Coreference Resolution in spaCy with Neural Networks" package neuralcoref to resolve the referencing issues while extracting NER. The extracted knowledge graph is finally stored in a graph database Neo4j for better visualization of links between the collected information.

Setup Instructions

  • Clone this repo:
$ git clone /~https://github.com/Dhyeythumar/Knowledge-Graph-with-Neo4j.git
  • Create and activate the python virtual environment (Use python 3.8):
$ virtualenv KG_env -p path/to/your/python/3.8/exe/file
$ KG_env\Scripts\activate
  • Install the requirements:
$ cd Knowledge-Graph-with-Neo4j
$ pip install -r requirements.txt
$ python -m spacy download en_core_web_md
  • Build the neuralcoref from the source code (because the package from PyPI is not compatible with spaCy's language model)
$ git clone /~https://github.com/huggingface/neuralcoref.git
$ cd neuralcoref
$ pip install -r requirements.txt
$ pip install -e .

Getting Started

Run this project by just executing:

$ python main.py

With the above setting you are good to go ✌.


But if you want a deeper understanding of how the project is implemented and what trade-off I faced while developing 😀, then check this README file. Other tables of contents are in that file.


License

Licensed under the MIT License.

Acknowledgements

  1. Medium article on "Auto-Generated Knowledge Graphs" for Knowledge Graph implementation and visualization using networkx.

About

Creating knowledge graphs by scraping wiki pages and storing data in the Neo4j Graph DB.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages