Skip to content

Releases: HHN/crawler4j

v4.7.0

25 Oct 08:28
Compare
Choose a tag to compare

Breaking Changes

Robots

  • Replaces homebrew robotstxt code with crawler-commons

Normalization

  • Replaces homebrew URL normalization with crawler-commons

You now need to pass a BasicURLNormalizer into the PageFetcher and the CrawlController, e.g.

BasicURLNormalizer normalizer = BasicURLNormalizer.newBuilder().idnNormalization(BasicURLNormalizer.IdnNormalization.NONE).build();

Please note, that this BasicURLNormalizer can support IdnNormalization.

Dependency Upgrades

  • Updates Tika to 2.1.0 (check/update your excludes, if you are importing crawler4j into your own code-base)
  • Updates Jackson to 2.13.0 (test scope only)
  • Updates PostgreSQL driver to 42.3.0 (examples only)
  • Updates Flyway to 8.0.1 (examples only)
  • Updates Guava to 31.0.1-jre
  • Updates Groovy to 3.0.9 (test only)

Additional Notes

Full Changelog: v4.6.0...v4.7.0

v4.6.0

26 Jul 13:53
Compare
Choose a tag to compare

Breaking Dependency Upgrades

  • Updates Tika to 2.0.0 (check/update your excludes, if you are importing crawler4j into your own code-base)
  • Updates HttpClient to 5.1.x

v4.5.1

31 May 14:22
Compare
Choose a tag to compare

Breaking Changes

  • Updates Bytecode Level to Java 11

Dependency Upgrades

v4.5.0

19 Dec 16:16
Compare
Choose a tag to compare

First release of my crawler4j fork.

It includes some breaking changes:

  • New module structure
  • New groupId to be able to deploy to Maven Central
  • New artifactId's for core artifacts, see README.md

Other changes:

  • Frontier (database) abstraction layer
  • First draft of an HSQLDB-based frontier implementation