We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey 👋
Amazing lib. This is the first time I get any issues.
I am trying to scrape the https://gett.com/uk/about/ webpapage to get all the cities they are in but I get an error:
❯ yarn seed yarn run v1.12.3 $ node data/seed.js (get) loaded [get] https://gett.com/uk/about (find) no results for "#section3" [] (get) stack: 0, requests: 1 (0 queued), RAM: 36.09Mb (+36.09Mb), libxml: 0.0% (44 nodes), heap: 60% of 16.83Mb ✨ Done in 0.70s. ~/Projects/uber-cities master* ❯ yarn seed yarn run v1.12.3 $ node data/seed.js (get) loaded [get] https://gett.com/uk/about Document { errors: [ { Error: htmlParseEntityRef: expecting ';' at Object.module.exports.fromHtml (/Users/saravieira/Projects/uber-cities/node_modules/libxmljs/lib/document.js:143:21) at next (/Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:51:31) at /Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:99:13 at done (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:432:14) at PassThrough.<anonymous> (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:671:11) at PassThrough.emit (events.js:180:13) at endReadableNT (_stream_readable.js:1106:12) at process._tickCallback (internal/process/next_tick.js:178:19) domain: 5, code: 23, level: 2, column: 341, file: 'https://gett.com/uk/about', line: 1 }, { Error: htmlParseEntityRef: expecting ';' at Object.module.exports.fromHtml (/Users/saravieira/Projects/uber-cities/node_modules/libxmljs/lib/document.js:143:21) at next (/Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:51:31) at /Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:99:13 at done (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:432:14) at PassThrough.<anonymous> (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:671:11) at PassThrough.emit (events.js:180:13) at endReadableNT (_stream_readable.js:1106:12) at process._tickCallback (internal/process/next_tick.js:178:19) domain: 5, code: 23, level: 2, column: 473, file: 'https://gett.com/uk/about', line: 1 }, { Error: htmlParseEntityRef: expecting ';' at Object.module.exports.fromHtml (/Users/saravieira/Projects/uber-cities/node_modules/libxmljs/lib/document.js:143:21) at next (/Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:51:31) at /Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:99:13 at done (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:432:14) at PassThrough.<anonymous> (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:671:11) at PassThrough.emit (events.js:180:13) at endReadableNT (_stream_readable.js:1106:12) at process._tickCallback (internal/process/next_tick.js:178:19) domain: 5, code: 23, level: 2, column: 516, file: 'https://gett.com/uk/about', line: 1 }, { Error: htmlParseEntityRef: expecting ';' at Object.module.exports.fromHtml (/Users/saravieira/Projects/uber-cities/node_modules/libxmljs/lib/document.js:143:21) at next (/Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:51:31) at /Users/saravieira/Projects/uber-cities/node_modules/osmosis/lib/Request.js:99:13 at done (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:432:14) at PassThrough.<anonymous> (/Users/saravieira/Projects/uber-cities/node_modules/needle/lib/needle.js:671:11) at PassThrough.emit (events.js:180:13) at endReadableNT (_stream_readable.js:1106:12) at process._tickCallback (internal/process/next_tick.js:178:19) domain: 5, code: 23, level: 2, column: 525, file: 'https://gett.com/uk/about', line: 1 } ],
I assume this is because the HTML is malformatted on their page.
Is there any way to go arround this and return the HTML even if as a string?
Thank you
The text was updated successfully, but these errors were encountered:
Caused by HTML entities missing semicolon, such as:
Editors’ Choice on App Store
One option would be using the preprocess option to fix these.
The only other option would be to set libxml to ignore HTML entities.
Sorry, something went wrong.
No branches or pull requests
Hey 👋
Amazing lib. This is the first time I get any issues.
I am trying to scrape the https://gett.com/uk/about/ webpapage to get all the cities they are in but I get an error:
I assume this is because the HTML is malformatted on their page.
Is there any way to go arround this and return the HTML even if as a string?
Thank you
The text was updated successfully, but these errors were encountered: