Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

jameshou · 2016-08-05T21:25:29Z

We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using spark.executor.extraClassPath and spark.driver.extraClassPath environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (*.s3.amazonaws.com).

This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host.

codecov-io · 2016-08-05T22:12:41Z

Current coverage is 89.33% (diff: 100%)

Merging #254 into master will increase coverage by 0.09%

@@             master       #254   diff @@
==========================================
  Files            12         12          
  Lines           641        647     +6   
  Methods         559        564     +5   
  Messages          0          0          
  Branches         82         83     +1   
==========================================
+ Hits            572        578     +6   
  Misses           69         69          
  Partials          0          0

Powered by Codecov. Last update 95d92cd...b936648

JoshRosen · 2016-08-05T22:14:18Z

src/main/scala/com/databricks/spark/redshift/Utils.scala

+    } catch {
+      case e: java.lang.IllegalArgumentException => {
+        if (e.getMessage().
+          startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint")) {


Is there a way to programatically get the AWS SDK version so that you don't have to use exceptions for control flow?

It turns out that you can do this with https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/VersionInfoUtils.html#getVersion--, but it looks like the implementation of that method relies on being able to load a properties file from the SDK JAR and that might break under certain repackaging scenarios, so while a bit ugly I think this is fine for now. Just go ahead and clean things up so we don't have to explicitly re-throw.

JoshRosen · 2016-08-05T22:15:57Z

I'm going to test this out later by running all of the integration tests with an old AWS SDK. It might be cool to reconfigure the build so we can add an old SDK version to our test matrix. If you'd like to take a shot at this and don't mind SBT magic, take a look at how we handle the Spark and Avro versions in the build and test configurations (the key bit of trickiness is how we build against one version and test against another so that we catch binary compatibility problems)

JoshRosen · 2016-08-07T01:46:05Z

src/main/scala/com/databricks/spark/redshift/Utils.scala

+   */
+  def addEndpointToUrl(url: String, domain: String = "s3.amazonaws.com"): String = {
+    val uri = new URI(url)
+    val hostWithEndpoint = uri.getHost() + "." + domain


You can omit the () parents in these get*() calls; they're not necessary and cause style suggestions in IntelliJ.

JoshRosen · 2016-08-07T02:24:26Z

I fear that this will break if we don't test it, so in 39776a6 I've added end-to-end integration tests that run against the 1.7.4 version of the SDK. Could you go ahead and cherry-pick that change, address my comments, and push, then I'll retest and merge?

JoshRosen · 2016-08-08T22:12:14Z

src/main/scala/com/databricks/spark/redshift/Utils.scala

+      // try to instantiate AmazonS3URI with url
+      new AmazonS3URI(url)
+    } catch {
+      case e: IllegalArgumentException if e.getMessage.startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint") => {


Scalastyle is complaining that this line is too long. Try wrapping the string to the next line.

JoshRosen · 2016-08-09T01:02:30Z

LGTM, so I'm going to merge this to master and will try to backport it to branch-1.x. Thanks @jameshou!

jameshou · 2016-08-09T03:28:22Z

Yay! Thanks for merging this in so quickly. Can't wait to start using the official branch in our builds rather than including this in our patched up jar. Let me know if there are any issues that come up and I will try to address them.

JoshRosen · 2016-08-09T17:45:55Z

I'll try to package either a 2.0.1 or 2.1.0 release soon, but I'm currently blocking on resolving databricks/spark-avro#147 so that I can bump the spark-avro version in that next release.

We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using `spark.executor.extraClassPath` and `spark.driver.extraClassPath` environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (`*.s3.amazonaws.com`). This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host. Author: James Hou <jameshou@data101.udemy.com> Author: James Hou <james.hou@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #254 from jameshou/feature/add-s3-full-endpoint-v1.

jameshou added 2 commits August 5, 2016 14:23

Handle invalid s3 hostname exceptions with older aws-java-sdk lib

3ca4ed8

Fix scalastyle errors

7f715bd

JoshRosen reviewed Aug 5, 2016
View reviewed changes

JoshRosen reviewed Aug 7, 2016
View reviewed changes

James Hou and others added 3 commits August 8, 2016 14:57

Address PR comments

1c5bb18

Test with old AWS Java SDK version in Travis.

bcd11e5

Address missed PR comment

0550d12

JoshRosen reviewed Aug 8, 2016
View reviewed changes

Fix scalastyle issue

b936648

JoshRosen added this to the 2.0.1 milestone Aug 8, 2016

JoshRosen added the enhancement label Aug 8, 2016

JoshRosen changed the title ~~Handle invalid s3 hostname exceptions with older aws-java-sdk lib~~ Handle invalid S3 hostname exceptions with older aws-java-sdk versions Aug 9, 2016

JoshRosen closed this in a560a2d Aug 9, 2016

JoshRosen added the to-backport label Aug 9, 2016

JoshRosen mentioned this pull request Aug 9, 2016

S3 endpoint URI invalid (independent of region issue) #135

Closed

JoshRosen mentioned this pull request Aug 23, 2016

Amazon Invalid operation: S3ServiceException:The specified bucket does not exist #257

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

jameshou commented Aug 5, 2016 •

edited by JoshRosen

Loading

codecov-io commented Aug 5, 2016 •

edited

Loading

JoshRosen Aug 5, 2016

JoshRosen Aug 7, 2016

JoshRosen commented Aug 5, 2016

JoshRosen Aug 7, 2016

JoshRosen commented Aug 7, 2016

JoshRosen Aug 8, 2016

JoshRosen commented Aug 9, 2016

jameshou commented Aug 9, 2016

JoshRosen commented Aug 9, 2016

Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254

Conversation

jameshou commented Aug 5, 2016 • edited by JoshRosen Loading

codecov-io commented Aug 5, 2016 • edited Loading

Current coverage is 89.33% (diff: 100%)

JoshRosen Aug 5, 2016

Choose a reason for hiding this comment

JoshRosen Aug 7, 2016

Choose a reason for hiding this comment

JoshRosen commented Aug 5, 2016

JoshRosen Aug 7, 2016

Choose a reason for hiding this comment

JoshRosen commented Aug 7, 2016

JoshRosen Aug 8, 2016

Choose a reason for hiding this comment

JoshRosen commented Aug 9, 2016

jameshou commented Aug 9, 2016

JoshRosen commented Aug 9, 2016

jameshou commented Aug 5, 2016 •

edited by JoshRosen

Loading

codecov-io commented Aug 5, 2016 •

edited

Loading