You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This bug has probably always been there in all versions/branches of crawler4j.
I opened a pull request with more explanation + workaround in the Javadoc of the tests. The workaround could probably also be applied inside of edu.uci.ics.crawler4j.parser.Parser -> line 89 -> test if getHtml() returns "<html></html>" and don't extract urls if that's the case. Looks like a safe bet?
Feel free to use as it fits the project best. Tests are in JUnit5 instead of Spock, but didn't use mocking frameworks or added other dependencies like that. The test that verifies that the wrong extraction is done should be useful as it can also give feedback when it would be safe to get rid of the "fix code".
The text was updated successfully, but these errors were encountered:
This bug has probably always been there in all versions/branches of crawler4j.
I opened a pull request with more explanation + workaround in the Javadoc of the tests. The workaround could probably also be applied inside of edu.uci.ics.crawler4j.parser.Parser -> line 89 -> test if getHtml() returns
"<html></html>"
and don't extract urls if that's the case. Looks like a safe bet?Feel free to use as it fits the project best. Tests are in JUnit5 instead of Spock, but didn't use mocking frameworks or added other dependencies like that. The test that verifies that the wrong extraction is done should be useful as it can also give feedback when it would be safe to get rid of the "fix code".
The text was updated successfully, but these errors were encountered: