-
Notifications
You must be signed in to change notification settings - Fork 10.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow spiders to return dicts. #1081
Conversation
good job @kmike ! |
nice! |
* start with scrapy.Spider, then mention spider arguments, then describe generic spiders; * change wording regarding start_urls/start_requests; * show an example of start_requests vs start_urls; * show an example of dicts as items; * as defining Item is an optional step now, docs for Items are moved below Spider docs.
Please check - all docs except for overview & tutorial should be updated. What do you think about adding FEED_EXPORT_FIELDS option, to allow defining a list of fields to export? Without Item classes CSV exporter can't figure out the header robustly (currently fields of a first item are used). @nramirezuy also mentioned this feature here. |
With |
|
|
||
:: | ||
user input or other changing conditions you can return regular Python | ||
dicts from spiders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about making it like you can return regular Python dicts from spiders since Scrapy 1.0. For older versions, you can dynamically create Item classes::
?
The dynamic creation just doesn't seem to make much sense when you have the ability to return arbitrary dicts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good catch. I just removed the whole section. I don't think we should document workarounds for limitations of older Scrapy versions.
It was a hack, and dicts-as-items cover most use cases. Dicts don't allow to attach metadata to fields, but e.g. adding "_meta" key and removing it in a custom serializer is no worse than creating classes dynamically.
This PR doesn't allow items to be arbitrary dict-like objects, like @shaneaevans proposes in #1064 (comment) - item must be either a subclass of BaseItem or a dict / subclass of a dict. Maybe instead of checking for dict/BaseItem explicitly we can start checking for MutableMapping, but it is more risky. I think that starting with more strict requirements on spider output is better - be can make them less strict in future. |
Tests are failing because of https://bitbucket.org/hpk42/tox/issue/230/tox-191-ignores-package-versions-specified. |
Allow spiders to return dicts.
A PR to fix GH-1064.
Docs are missing.