Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow spiders to return dicts. #1081

Merged
merged 5 commits into from
Mar 27, 2015
Merged

Allow spiders to return dicts. #1081

merged 5 commits into from
Mar 27, 2015

Conversation

kmike
Copy link
Member

@kmike kmike commented Mar 18, 2015

A PR to fix GH-1064.

Docs are missing.

@pablohoffman
Copy link
Member

good job @kmike !

@dangra
Copy link
Member

dangra commented Mar 18, 2015

nice!

kmike added 2 commits March 19, 2015 05:16
* start with scrapy.Spider, then mention spider arguments,
  then describe generic spiders;
* change wording regarding start_urls/start_requests;
* show an example of start_requests vs start_urls;
* show an example of dicts as items;
* as defining Item is an optional step now, docs for Items are
  moved below Spider docs.
@kmike
Copy link
Member Author

kmike commented Mar 19, 2015

Please check - all docs except for overview & tutorial should be updated.

What do you think about adding FEED_EXPORT_FIELDS option, to allow defining a list of fields to export? Without Item classes CSV exporter can't figure out the header robustly (currently fields of a first item are used). @nramirezuy also mentioned this feature here.

@nramirezuy
Copy link
Contributor

With FEED_EXPORT_FIELDS you can also set less fields than you have in your item.

@pablohoffman
Copy link
Member

FEED_EXPORT_FIELDS 👍


::
user input or other changing conditions you can return regular Python
dicts from spiders.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making it like you can return regular Python dicts from spiders since Scrapy 1.0. For older versions, you can dynamically create Item classes:: ?

The dynamic creation just doesn't seem to make much sense when you have the ability to return arbitrary dicts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good catch. I just removed the whole section. I don't think we should document workarounds for limitations of older Scrapy versions.

It was a hack, and dicts-as-items cover most use cases.

Dicts don't allow to attach metadata to fields,
but e.g. adding "_meta" key and removing it in a custom serializer
is no worse than creating classes dynamically.
@kmike
Copy link
Member Author

kmike commented Mar 23, 2015

This PR doesn't allow items to be arbitrary dict-like objects, like @shaneaevans proposes in #1064 (comment) - item must be either a subclass of BaseItem or a dict / subclass of a dict.

Maybe instead of checking for dict/BaseItem explicitly we can start checking for MutableMapping, but it is more risky. I think that starting with more strict requirements on spider output is better - be can make them less strict in future.

@kmike
Copy link
Member Author

kmike commented Mar 23, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow spiders to return dicts instead of Items
5 participants