The dataset was first scraped from an Arabic book library website, then cleansed of non-Arabic words, numerals, and tags before going through multiple preprocessing steps. Second, after applying the TF-IDF vectorizer, we used the Random Forest Classifier (RFC) algorithm to be trained on the data. Finally, two Question-Answer datasets were create…
-
Updated
Mar 2, 2022 - Jupyter Notebook