-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle invalid S3 hostname exceptions with older aws-java-sdk versions #254
Conversation
Current coverage is 89.33% (diff: 100%)@@ master #254 diff @@
==========================================
Files 12 12
Lines 641 647 +6
Methods 559 564 +5
Messages 0 0
Branches 82 83 +1
==========================================
+ Hits 572 578 +6
Misses 69 69
Partials 0 0
|
} catch { | ||
case e: java.lang.IllegalArgumentException => { | ||
if (e.getMessage(). | ||
startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to programatically get the AWS SDK version so that you don't have to use exceptions for control flow?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that you can do this with https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/VersionInfoUtils.html#getVersion--, but it looks like the implementation of that method relies on being able to load a properties file from the SDK JAR and that might break under certain repackaging scenarios, so while a bit ugly I think this is fine for now. Just go ahead and clean things up so we don't have to explicitly re-throw.
I'm going to test this out later by running all of the integration tests with an old AWS SDK. It might be cool to reconfigure the build so we can add an old SDK version to our test matrix. If you'd like to take a shot at this and don't mind SBT magic, take a look at how we handle the Spark and Avro versions in the build and test configurations (the key bit of trickiness is how we build against one version and test against another so that we catch binary compatibility problems) |
*/ | ||
def addEndpointToUrl(url: String, domain: String = "s3.amazonaws.com"): String = { | ||
val uri = new URI(url) | ||
val hostWithEndpoint = uri.getHost() + "." + domain |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can omit the ()
parents in these get*()
calls; they're not necessary and cause style suggestions in IntelliJ.
I fear that this will break if we don't test it, so in 39776a6 I've added end-to-end integration tests that run against the 1.7.4 version of the SDK. Could you go ahead and cherry-pick that change, address my comments, and push, then I'll retest and merge? |
// try to instantiate AmazonS3URI with url | ||
new AmazonS3URI(url) | ||
} catch { | ||
case e: IllegalArgumentException if e.getMessage.startsWith("Invalid S3 URI: hostname does not appear to be a valid S3 endpoint") => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scalastyle is complaining that this line is too long. Try wrapping the string to the next line.
LGTM, so I'm going to merge this to master and will try to backport it to |
Yay! Thanks for merging this in so quickly. Can't wait to start using the official branch in our builds rather than including this in our patched up jar. Let me know if there are any issues that come up and I will try to address them. |
I'll try to package either a |
We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using `spark.executor.extraClassPath` and `spark.driver.extraClassPath` environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (`*.s3.amazonaws.com`). This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host. Author: James Hou <jameshou@data101.udemy.com> Author: James Hou <james.hou@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #254 from jameshou/feature/add-s3-full-endpoint-v1.
We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using `spark.executor.extraClassPath` and `spark.driver.extraClassPath` environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (`*.s3.amazonaws.com`). This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host. Author: James Hou <jameshou@data101.udemy.com> Author: James Hou <james.hou@gmail.com> Author: Josh Rosen <joshrosen@databricks.com> Closes #254 from jameshou/feature/add-s3-full-endpoint-v1.
We've seen a lot of messages lately regarding the "Invalid S3 URI: hostname does not appear to be a valid S3 endpoint" exception and so thought we should contribute our two cents and the code changes that worked for us. We've tried many approaches listed in that thread including using
spark.executor.extraClassPath
andspark.driver.extraClassPath
environment variables to prepend to the classpath, including it in the assembled jar or as a shaded jar, Unfortunately many of these approaches failed mainly because we have on the machines themselves the older aws-java-sdk jar and that usually takes precedence. We ended up going with what Josh mentioned earlier about changing the S3 url in the spark-redshift code to add the endpoint to the host (*.s3.amazonaws.com
).This logic will try to instantiate a new AmazonS3URI and if it fails, it'll try to add the default S3 Amazon domain to the host.