-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error in How to eval KAZU-NER model #1
Comments
Hi @Kik099 Thank you for your interest in our work. The But I just found out that I only wrote the sample scripts for the test split. They can be applied to all splits. export DATA_DIR=${HOME}/KAZU-NER-exp/BC5CDR_test # Please use the absolute path to avoid unexpected errors If you have set ls ${DATA_DIR} This should display the train.tsv, dev.tsv, and test.tsv files. Please run the following code for each of the train.tsv, dev.tsv, and test.tsv splits:
Please let me know if this does not work. I will update Readme too. Thanks, |
Hi @wonjininfo With what you said it worked for that error. Like you can see it started running:
But then a few minutes later a error appeared. I put in the text file the output, soo it is easy to see. Do you know what i need to do? And thanks for your help |
Hi @Kik099 Thank you for sharing the log file. I found it a bit challenging to trace the issue at the moment, but I suspect it may be related to the number of labels. Let's start with the simplest approach. If you’re only trying to evaluate the model, and not training it, we can copy the test.prob_conll file and use it to create train.prob_conll and dev.prob_conll. This will give us three identical files with different names. Please try running the process again with these files. Additionally, could you please share the exact command line you used in the shell and the version of transformers you're working with? This will help me to replicate and trace the error. If you’re uncomfortable sharing this information here (as this is a publicly open space), feel free to email me by |
Hi @wonjininfo, I followed your instructions but encountered the same error when I have three identical files with different names. Regarding the libraries I have installed, here is the list: datasets 1.18.3 pypi_0 pypi When I attempted to install the following libraries: torch==1.8.2 Here’s the error message:
To resolve this, I installed torch==1.8.1 instead. However, this led to another error:
Please let me know how I can resolve this issue. Best regards, |
Thanks for sharing. I will work on that and get back to you soon. |
I just have another question, i already i trained the model. How do i run it know? I am developing a thesis and i will talk about your article there. I have the following files that are in the dir "_tmp\output\MultiLabelNER-test": all_results.json |
Hi @Kik099 ,
Does this mean that the previous error is no longer occurring? Regarding your other question:
Could you please clarify what you mean? I noticed there might be some typos, so I want to make sure I understand your question correctly. |
Hi @wonjininfo
Unfortunately, the error is still appearing during the evaluation phase.
Additionally, after training the model, I would like to use it for token classification. How can I input text to the model to obtain token classifications? |
Hi @wonjininfo Best, Rodrigo Saraiva |
Hi Rodrigo , I spent a few hours resolving the dependency issues and identified some points that needed updating because they no longer worked. I have updated them in the README.
Install If you use higher version of python, we need another version of tokenizers. If you encounter the error
# Tested torch version: torch==2.1.0 CUDA 12.1
#pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
pip install transformers==4.16.2 tokenizers==0.12 datasets==1.18.3 seqeval==1.2.2 After that, I followed my README and, for testing purposes, copied the development file to the test file using |
To input text into the model for token classification, you need to preprocess your text to match the format of the
|
Hi @wonjininfo Thank you so much for your reply—I really appreciate it. To summarize, it seems that there was a demo website for testing KAZU (http://kazu.korea.ac.kr/), which is currently not working. On that website, we could input a simple phrase and obtain the token classifications. To achieve this in the KAZU project (/~https://github.com/AstraZeneca/KAZU), we can run the following code: def kazu_test(cfg): Running this script produces the token classifications for the words in the text. Are you saying that to achieve this in this project, we need to convert the phrase to the test.prob_conll format? If so, which code should I run—the evaluation code? |
For the demo website, it is currently managed by my former colleagues, who are postgraduate students at Korea University. I have asked them to reboot the server. Regarding the KAZU project, that’s a good point—I was primarily focused on this repository. You can certainly use the KAZU repository (/~https://github.com/AstraZeneca/KAZU), but please note that KAZU is designed for industrial use. It includes additional matching algorithms using ontologies and various features, including preprocessing (from plain text to final output in JSON format). Some of these features might not be suitable for other domains (i.e. non-biomedical/clinical domain), and removing them could be challenging due to the large codebase. In contrast, this repository focuses exclusively on the core module, emphasizing the neural model aspect of NER recognition (without linking). It is more academically oriented and does not offer end-to-end processing from plain text to final output, so users will need to manage preprocessing and post-processing themselves. Our label2prob.py script provides a conversion from CoNLL format to our prob_conll format. Once you have your data in CoNLL format, you can use this script to convert it into the required input format. For converting plain text to CoNLL format, other researchers might have shared scripts online, but we did not include such scripts here as our full pipeline is intended to be used with the KAZU repository. |
Hi @wonjininfo So you are saying that I cannot use this model to predict plain texts, to do that I need to have the plain text in the format of.prob_conll? If this is the case how can I predict that plain text? Or did I understood wrong? |
You can use this model with the training and evaluation codes to predict any text, but you'll need to write or find some preprocessing code to convert plain text into a CoNLL-like format. So, the short answer is no—you can't use it as-is. You'll need to write a few dozen lines of code to get it working. I haven't used these myself, but you might find these resources useful: spacy-conll or this Stack Overflow answer. Still, a few tweaks are required. |
hi @wonjininfo I have trained the model. Will this be enough? file |
Hi @wonjininfo sorry but can you help me pls? |
Sorry I do not get what your question is... but I tried to analyze your error log. If you are questioning about whether you can use your own finetuned model instead of BERT model, than yes. Put the location in An error I see from your log file, errorEval.json, is that your As you can see in this API document: In my case, the key is "_" and this might have changed if you are using different settings or different version of sklearn. I think you need to check type of your variable KAZU-NER-module/evaluate_multi_label.py Lines 160 to 165 in 51c592b
I suggest you use pdb to check it: |
Thanks for the answer @wonjininfo. I will try to do what you told me, thanks a lot for your help. |
I got this error "FileNotFoundError: Unable to find '\Users\kkiko\KAZU-NER-exp\BC5CDR_test\dev.prob_conll' at C:\Users\kkiko\KAZU-NER-exp\BC5CDR_test\prob_conll".
I found that following the steps in the README.md does not create the file dev.prob_conll.
What do i need to do?
The text was updated successfully, but these errors were encountered: