Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CssSelectorBridge] Improvements (#3537) #3573

Merged
merged 4 commits into from
Jul 26, 2023
Merged

[CssSelectorBridge] Improvements (#3537) #3573

merged 4 commits into from
Jul 26, 2023

Conversation

ORelio
Copy link
Contributor

@ORelio ORelio commented Jul 26, 2023

Continuation from #3457 including suggestions from #3537

Improve parameter documentation / add tooltips

image

Allow extracting content from home page instead of article page

image

Using site https://yohane.net/news/ as suggested by @antermin in #3537:

Home page:

image

Bridge parameters:

?action=display&bridge=CssSelectorBridge&home_page=https%3A%2F%2Fyohane.net%2Fnews%2F&url_selector=article&content_cleanup=div.p-article__body&format=Html

Resulting feed:
image

Feed is generated from home page using only one HTTP request to the website.

Keep titles from home page when every page <title> is the same

Now let's assume we want expanded content but each article page <title> is the same:

image
(First tab is home page, others are 3 first articles. Title is the same).

In that case, the bridge detects that each article <title> is useless and keeps titles taken from links in home page:

Bridge parameters:

?action=display&bridge=CssSelectorBridge&home_page=https%3A%2F%2Fyohane.net%2Fnews%2F&url_selector=article&content_selector=div.c-detail__article__bloc&content_cleanup=.c-detail__article_ttl,%20.c-detail__article_time,%20.c-detail__line,%20.c-detail__article_share&format=Html

Resulting feed:
image

Compatibility notes

Changes contained in this pull request should not affect compatibility of existing feeds. The only side effect I think of is that existing feeds targeting link parent element may now show some content while they previously did not.

* Improve parameter documentation / add tooltips
* Allow extracting content from home page instead of article page
* Keep titles from home page when every page <title> is the same
@github-actions
Copy link

github-actions bot commented Jul 26, 2023

Pull request artifacts

file last change
CssSelectorBridge-current-context1 2023-07-26, 17:41:14
CssSelectorBridge-pr-context1 2023-07-26, 17:41:14

@dvikan dvikan merged commit 977c0db into RSS-Bridge:master Jul 26, 2023
@ORelio
Copy link
Contributor Author

ORelio commented Jul 26, 2023

@dvikan Thanks for reviewing my PR so fast 🙂

@dvikan
Copy link
Contributor

dvikan commented Jul 29, 2023

getting this

Trying to get property 'href' of non-object at bridges/CssSelectorBridge.php line 202 
Trying to get property 'plaintext' of non-object at bridges/CssSelectorBridge.php line 203 
Trying to get property 'href' of non-object at bridges/CssSelectorBridge.php line 209

@ORelio
Copy link
Contributor Author

ORelio commented Jul 31, 2023

getting this

Trying to get property 'href' of non-object at bridges/CssSelectorBridge.php line 202 
Trying to get property 'plaintext' of non-object at bridges/CssSelectorBridge.php line 203 
Trying to get property 'href' of non-object at bridges/CssSelectorBridge.php line 209

@dvikan Managed to reproduce your issue. This happens when providing a selector to a DOM element that is not a link and does not contain any link. Fixing in #3585.

dvikan pushed a commit that referenced this pull request Jul 31, 2023
When using parent element as URL selector:

* If no <a> inside some elements, ignore them
* If no <a> inside ALL elements, report an error

Fixes #3573 #issuecomment-1656943318
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants