Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValidationError from haystack API when attempting curl request to doc-qa #387

Closed
aaronbriel opened this issue Sep 16, 2020 · 9 comments
Closed
Assignees
Labels
type:bug Something isn't working

Comments

@aaronbriel
Copy link

aaronbriel commented Sep 16, 2020

Describe the bug
This appears to be the same issue as #101 , but I'm able to reproduce with the latest release 0.3.0 and the latest master branch. When executing a curl request against the haystack API, I see a ValidationError when requesting to doc-qa and an AttributeError when requesting to faq_qa.

Error message
Here is the stack trace from the haystack server when I attempt to run:
curl --request POST --url 'http://127.0.0.1:8000/models/1/doc-qa' --data '{"questions": ["Where is covid from?"]}':

(.venv) Aarons-MacBook-Pro-3:haystack aaronbriel$ gunicorn rest_api.application:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker
[2020-09-16 16:19:39 -0500] [74657] [INFO] Starting gunicorn 20.0.4
[2020-09-16 16:19:39 -0500] [74657] [INFO] Listening at: http://0.0.0.0:8000 (74657)
[2020-09-16 16:19:39 -0500] [74657] [INFO] Using worker: uvicorn.workers.UvicornWorker
[2020-09-16 16:19:39 -0500] [74660] [INFO] Booting worker with pid: 74660
09/16/2020 16:19:42 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.015s]
09/16/2020 16:19:43 - INFO - elasticsearch -   HEAD http://localhost:9200/label [status:200 request:0.006s]
09/16/2020 16:19:43 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.010s]
09/16/2020 16:19:43 - INFO - elasticsearch -   HEAD http://localhost:9200/label [status:200 request:0.003s]
09/16/2020 16:19:43 - INFO - farm.utils -   device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None
09/16/2020 16:19:43 - INFO - farm.infer -   Could not find `deepset/roberta-base-squad2` locally. Try to download from model hub ...
09/16/2020 16:19:47 - WARNING - farm.modeling.language_model -   Could not automatically detect from language model name what language it is.
	 We guess it's an *ENGLISH* model ...
	 If not: Init the language model by supplying the 'language' param.
09/16/2020 16:19:51 - WARNING - farm.modeling.prediction_head -   Some unused parameters are passed to the QuestionAnsweringHead. Might not be a problem. Params: {"loss_ignore_index": -1}
09/16/2020 16:19:51 - WARNING - farm.utils -   Failed to log params: Could not find experiment with ID 0
09/16/2020 16:19:53 - WARNING - farm.utils -   Failed to log params: Could not find experiment with ID 0
09/16/2020 16:19:53 - INFO - farm.utils -   device: cpu n_gpu: 0, distributed training: False, automatic mixed precision training: None
09/16/2020 16:19:53 - INFO - farm.infer -   Got ya 4 parallel workers to do inference ...
09/16/2020 16:19:53 - INFO - farm.infer -    0    0    0    0
09/16/2020 16:19:53 - INFO - farm.infer -   /w\  /w\  /w\  /w\
09/16/2020 16:19:53 - INFO - farm.infer -   /'\  / \  /'\  /'\
09/16/2020 16:19:53 - INFO - farm.infer -
09/16/2020 16:19:53 - INFO - elasticsearch -   HEAD http://localhost:9200/document [status:200 request:0.011s]
09/16/2020 16:19:53 - INFO - elasticsearch -   HEAD http://localhost:9200/label [status:200 request:0.006s]
09/16/2020 16:19:53 - INFO - rest_api.application -   Open http://127.0.0.1:8000/docs to see Swagger API Documentation.
09/16/2020 16:19:53 - INFO - rest_api.application -
Or just try it out directly: curl --request POST --url 'http://127.0.0.1:8000/models/1/doc-qa' --data '{"questions": ["What is the capital of Germany?"]}'

[2020-09-16 16:19:53 -0500] [74660] [INFO] Started server process [74660]
[2020-09-16 16:19:53 -0500] [74660] [INFO] Waiting for application startup.
[2020-09-16 16:19:53 -0500] [74660] [INFO] Application startup complete.
09/16/2020 16:20:13 - INFO - haystack.retriever.sparse -   Got 1 candidates from retriever
09/16/2020 16:20:13 - INFO - haystack.finder -   Reader is looking for detailed answer in 705 chars ...
Inferencing Samples: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  3.44 Batches/s]
09/16/2020 16:20:14 - INFO - rest_api.controller.search -   {"request": {"questions": ["How is the virus spreading?"], "filters": null, "top_k_reader": 5, "top_k_retriever": 1}, "results": [{"question": "How is the virus spreading?", "no_ans_gap": 17.944154739379883, "answers": [{"answer": "from person-to-person", "score": 12.883939743041992, "probability": 0.8334797478865303, "context": "This virus was first detected in Wuhan City, Hubei Province, China. The first infections were linked to a live animal market, but the virus is now spreading from person-to-person. It\u2019s important to note that person-to-person spread can happen on a continuum. Some viruses are highly contagious (like measles), while other viruses are less so.\n\nThe virus that causes COVID-19 seems to be spreading easily and sustainably in the community (\u201ccommunity spread\u201d) in some affected geographic areas. Communi", "offset_start": 157, "offset_end": 178, "offset_start_in_doc": 157, "offset_end_in_doc": 178, "document_id": "764dbd34-a152-4ae6-9d87-d1aaf3d2b11c", "meta": {"question_emb": [0.5515517847878593, -1.12009368624006, -0.3748682567051479, -0.5144797733851841, 0.2756530387060983, 
... 0.43540845598493305, -0.7300209999084473, 0.1662832157952445, -0.11270188433783394, 0.0368795565196446], "answer_html": "<p>This virus was first detected in Wuhan City, Hubei Province, China. The first infections were linked to a live animal market, but the virus is now spreading from person-to-person. It&rsquo;s important to note that person-to-person spread can happen on a continuum. Some viruses are highly contagious (like measles), while other viruses are less so.</p>\n<p>The virus that causes COVID-19 seems to be spreading easily and sustainably in the community (&ldquo;community spread&rdquo;) in <a href=\"/coronavirus/2019-ncov/about/transmission.html#geographic\">some affected geographic areas</a>. Community spread means people have been infected with the virus in an area, including some who are not sure how or where they became infected.</p>\n<p>Learn what is known about the <a href=\"/coronavirus/2019-ncov/about/transmission.html\">spread of newly emerged coronaviruses</a>.</p>", "link": "\nhttps://www.cdc.gov/coronavirus/2019-ncov/faq.html", "source": "Center for Disease Control and Prevention (CDC)", "category": "How It Spreads", "country": "USA", "region": "", "city": "", "lang": "en", "last_update": "2020/03/17", "name": "Frequently Asked Questions"}}, {"answer": null, "score": -5.060214996337891, "probability": 0.34693779767057226, "context": null, "offset_start": 0, "offset_end": 0, "document_id": null, "meta": {}}]}], "time": "0.74"}
[2020-09-16 16:20:14 -0500] [74660] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/applications.py", line 179, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/cors.py", line 78, in __call__
    await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/routing.py", line 199, in app
    is_coroutine=is_coroutine,
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/routing.py", line 111, in serialize_response
    raise ValidationError(errors, field.type_)
pydantic.error_wrappers.ValidationError: 1 validation error for Answers
response -> results -> 0 -> answers -> 0 -> meta -> question_emb
  str type expected (type=type_error.str)

Here is the haystack stack trace when I attempt to run:
curl --request POST --url 'http://127.0.0.1:8000/models/1/faq-qa' --data '{"questions": ["Where is covid from?"]}':

09/16/2020 16:23:07 - INFO - haystack.retriever.sparse -   Got 10 candidates from retriever
[2020-09-16 16:23:07 -0500] [74660] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 390, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/applications.py", line 179, in __call__
    await super().__call__(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/middleware/cors.py", line 78, in __call__
    await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/routing.py", line 183, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/fastapi/routing.py", line 135, in run_endpoint_function
    return await run_in_threadpool(dependant.call, **values)
  File "/Users/aaronbriel/chatbot/.venv/lib/python3.7/site-packages/starlette/concurrency.py", line 34, in run_in_threadpool
    return await loop.run_in_executor(None, func, *args)
  File "/usr/local/opt/python@3.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/Users/aaronbriel/chatbot/chatbots/haystack/rest_api/controller/search.py", line 190, in faq_qa
    question=question, top_k_retriever=request.top_k_retriever, filters=filters,
  File "/Users/aaronbriel/chatbot/.venv/src/farm-haystack/haystack/finder.py", line 107, in get_answers_via_similar_questions
    if self.retriever.embedding_model:  # type: ignore
AttributeError: 'ElasticsearchRetriever' object has no attribute 'embedding_model'

Expected behavior
I expected a response.

Additional context
I've tried the latest release of haystack from pip as well as the latest directly installed from the github repo master branch.

To Reproduce
Start gunicorn server like: gunicorn rest_api.application:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker
Execute curl request like: curl --request POST --url 'http://127.0.0.1:8000/models/1/doc-qa' --data '{"questions": ["Where is covid from?"]}':

System:

  • OS: MacOS Mojave 10.14.6
  • GPU/CPU:
  • Haystack version (commit or version number): Latest master branch
  • DocumentStore: ElasticSearch in docker container, installed based on medium article
  • Reader:
  • Retriever:
@aaronbriel
Copy link
Author

aaronbriel commented Sep 17, 2020

Fix verified for doc-qa route with latest fixes from #390. Thank for for the quick response to this!

However, running faq-qa route (ie curl --request POST --url 'http://127.0.0.1:8000/models/1/faq-qa' --data '{"questions": ["How is the virus spreading?"]}') still throws:
File "/Users/aaronbriel/chatbot/.venv/src/farm-haystack/haystack/finder.py", line 107, in get_answers_via_similar_questions if self.retriever.embedding_model: # type: ignore

I figured this was due to EMBEDDING_MODEL_PATH not being set, so I tried setting it as env variable and also as default value in rest_api/config.py (EMBEDDING_MODEL_PATH = os.getenv("EMBEDDING_MODEL_PATH", 'deepset/sentence_bert')) but the error persists.

If this is a separate issue I can log a new defect.

@tholor
Copy link
Member

tholor commented Sep 17, 2020

Yep, these were indeed two separate problems.
We tackled the first one in #390 and are working on the second one in #389

@aaronbriel
Copy link
Author

Beautiful. 2nd issue verified fixed from #389. Thanks!

@Graduo
Copy link

Graduo commented Apr 13, 2021

Hi, I'm coming again.There may be still a bug need to be fixed~
when I attempt to run:
curl --request POST --url 'http://127.0.0.1:8000/models/1/doc-qa' --data '{
"questions": [
"where did remember the titans camp take place"
],
"top_k_reader": 1,
"top_k_retriever": 1
}':

[2021-04-13 08:50:20 +0000] [352950] [INFO] Handling signal: winch
04/13/2021 08:50:43 - INFO - haystack.finder -   Got 1 candidates from retriever
04/13/2021 08:50:43 - INFO - haystack.finder -   Reader is looking for detailed answer in 1218 chars ...
Inferencing Samples: 100%|████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 50.66 Batches/s]
04/13/2021 08:50:43 - INFO - haystack -   {"request": {"questions": ["where did remember the titans camp take place"], "filters": null, "top_k_reader": 1, "top_k_retriever": 1}, "results": [{"query": "where did remember the titans camp take place", "no_ans_gap": 13.019956588745117, "answers": [{"answer": "Gettysburg College", "score": 8.597307205200195, "probability": 0.7454827536430236, "context": "The black students have a meeting in the gymnasium in auditioning to play for the team until Boone arrives, but the meeting turns into a fiasco when Yoast and white students interrupt. On August 15, 1971, the players gather and journey to Gettysburg College, where their training camp takes place. As their days of training camp progress, black and white football team members frequently clash in racially motivated conflicts, including some between captains Gerry Bertier and Julius Campbell. But af", "offset_start": 239, "offset_end": 257, "offset_start_in_doc": 239, "offset_end_in_doc": 257, "document_id": "cff9092d-a85c-4628-92ff-7266ead95514", "meta": {"emb": [0.07882298529148102, ...,-0.08937514573335648], "name": "Remember the Titans"}}], "question": "where did remember the titans camp take place"}], "time": "0.72"}
[2021-04-13 08:50:43 +0000] [352963] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.6/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/fastapi/applications.py", line 201, in __call__
    await super().__call__(scope, receive, send)  # pragma: no cover
  File "/usr/local/lib/python3.6/site-packages/starlette/applications.py", line 111, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/cors.py", line 86, in __call__
    await self.simple_response(scope, receive, send, request_headers=headers)
  File "/usr/local/lib/python3.6/site-packages/starlette/middleware/cors.py", line 142, in simple_response
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.6/site-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 566, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 227, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.6/site-packages/starlette/routing.py", line 41, in app
    response = await func(request)
  File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 218, in app
    is_coroutine=is_coroutine,
  File "/usr/local/lib/python3.6/site-packages/fastapi/routing.py", line 126, in serialize_response
    raise ValidationError(errors, field.type_)
pydantic.error_wrappers.ValidationError: 1 validation error for Answers
response -> results -> 0 -> answers -> 0 -> meta -> emb

@Timoeller
Copy link
Contributor

Hey @Graduo do you use latest haystack master?

Because there we removed the doc-qa endpoint:
/doc-qa & /faq-qa endpoints are replaced with a more generic POST /query endpoint. This new endpoint uses Pipelines under-the-hood, that can be configured at rest_api/pipeline.yaml.

@Graduo
Copy link

Graduo commented Apr 13, 2021

Hey @Graduo do you use latest haystack master?

Because there we removed the doc-qa endpoint:
/doc-qa & /faq-qa endpoints are replaced with a more generic POST /query endpoint. This new endpoint uses Pipelines under-the-hood, that can be configured at rest_api/pipeline.yaml.

OK I'll try it ~

@Graduo
Copy link

Graduo commented Apr 14, 2021

Hey @Graduo do you use latest haystack master?

Because there we removed the doc-qa endpoint:
/doc-qa & /faq-qa endpoints are replaced with a more generic POST /query endpoint. This new endpoint uses Pipelines under-the-hood, that can be configured at rest_api/pipeline.yaml.

And by the way , is there any way that I can fix it just by using such as "filter"(don't return " "meta": {"emb": " in results) without updating the latest haystack master?

@Timoeller
Copy link
Contributor

Sorry I do not understand what you mean by "it".
You can use an older version of haystack and your old curl command or you update to current master but then you have to use the /query endpoint instead of /doc-qa endpoint.
Why would you not return embeddings now?

@Graduo
Copy link

Graduo commented Apr 14, 2021

Sorry I do not understand what you mean by "it".
You can use an older version of haystack and your old curl command or you update to current master but then you have to use the /query endpoint instead of /doc-qa endpoint.
Why would you not return embeddings now?

I'm sorry I didn't explain my expectation clearly. I want to use the older version of haystack. I meet a bug as describe above, it's appears to be like as #387.The problem may be that the returned value in ‘’result‘‘ must be a string, but in my result :
"meta": {"emb": [0.07882298529148102, ...,-0.08937514573335648], the value of the key "emb" is a list.
Now I have solved it ~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants