Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EXE-1558] Setup build and deploy; add pushdown #1

Merged
merged 10 commits into from
May 24, 2023

Conversation

dorisZ017
Copy link
Collaborator

@dorisZ017 dorisZ017 commented May 19, 2023

Noticed some issue with partially invalid queries, documented here: https://actioniq.atlassian.net/browse/EXE-1614, will continue digging as part of bringing generic jdbc pushdown to flame.

Tested reading from redshift in /~https://github.com/ActionIQ/flame/pull/440/files

sc.setLogLevel("info")
// spark.conf.set("spark.aiq.sql.enable_jdbc_pushdown", true)
val redshiftOpts = Map(
  "aws_iam_role" -> "arn:aws:iam::938824509083:role/journeys-redshift-dev-redshift-role",
  "tempDir" -> s"s3a://aiq-1d-expire-dev/redshift/109",
  "driver" -> "com.amazon.redshift.jdbc42.Driver",
  "dbtable" -> "qahybridredshift.rd_dim_item",
  "queryGroup" -> "123456678908",
  "aiq_partner" -> "partnerName",
  "ApplicationName" -> "appName",
  "password" -> <>
  "user" -> <>,
  "url" -> "jdbc:redshift://test-hybridcompute.938824509083.us-east-1.redshift-serverless.amazonaws.com:5439/dev?"
)
val df = spark.read.format("io.github.spark_redshift_community.spark.redshift").options(redshiftOpts).load()
df.createOrReplaceTempView("t")
val qe = spark.sql("select color from t where lower(color) = 'snow'").queryExecution
qe.sparkPlan.asInstanceOf[io.github.spark_redshift_community.spark.redshift.pushdowns.RedshiftPushDownPlan].statement
res8: String = SELECT ( SUBQUERY_1.color ) AS SUBQUERY_2_COL_0 FROM ( SELECT * FROM ( SELECT * FROM qahybridredshift.rd_dim_item AS REDSHIFT_QUERY_ALIAS ) AS SUBQUERY_0 WHERE ( ( SUBQUERY_0.color IS NOT NULL ) AND ( LOWER ( SUBQUERY_0.color ) = ?(Some(snow)) ) ) ) AS SUBQUERY_1
Screenshot 2023-05-24 at 12 19 11 PM
sbt test
sbt publish

@dorisZ017 dorisZ017 changed the title setup deploy [EXE-1558] Setup build and deploy May 22, 2023
@dorisZ017 dorisZ017 marked this pull request as ready for review May 22, 2023 20:13
@dorisZ017 dorisZ017 requested a review from MasterDDT May 22, 2023 20:19
@dorisZ017 dorisZ017 marked this pull request as draft May 23, 2023 17:38
@MasterDDT
Copy link

Hmm how did Github let you request me for review if its still in draft mode 😄

@dorisZ017
Copy link
Collaborator Author

Hmm how did Github let you request me for review if its still in draft mode 😄

No I was just adding the build and publish stuff but later figured I could just add pushdowns on top of it so converted it to draft... I split the pushdowns to another PR if you want to review the build part first

@dorisZ017 dorisZ017 changed the title [EXE-1558] Setup build and deploy [EXE-1558] Setup build and deploy; add pushdown May 24, 2023
@dorisZ017 dorisZ017 marked this pull request as ready for review May 24, 2023 16:23
@dorisZ017 dorisZ017 requested a review from ShaoFuWu May 24, 2023 16:23
Copy link

@ShaoFuWu ShaoFuWu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good 👍 one thing to follow up, can you test the pushdowns that we listed in our connector confluence and make sure those still work in spark 3, I guess can wait until you deploy spark and run from AIQ if thats easier.
Also, can you please fix the formatting in RedshiftPushDownQuery.scala please.

@@ -1,7 +1,4 @@
-Dfile.encoding=UTF8
-Xms1024M
-Xmx1024M
-Xss6M
-XX:MaxPermSize=512m

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These dont work with java 17?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build.sbt Outdated
// DON't UPGRADE AWS-SDK-JAVA if not compatible with hadoop version
val testAWSJavaSDKVersion = sys.props.get("aws.testVersion").getOrElse("1.11.1033")
val testAWSJavaSDKVersion = sys.props.get("aws.testVersion").getOrElse("1.11.1026")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use aws java 1.12.x, upgrade this one? Or is it just for tests and we dont care

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referencing the dependencies of hadoop-aws: https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws/3.3.2 but yeah I try upgrading here

}

if (jdbcProps.get("aiq_testing").exists(_.toBoolean)) {
RedshiftPushDownSqlStatement.capturedQueries.append(finalQuery)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prob dont need this anymore 🤷 , now that your exposed the statement in the plan. But I guess its fine to keep.

I dont see it used below in any tests?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will try adding some unit tests later

@dorisZ017 dorisZ017 merged commit 5cfff18 into spark-redshift-community-3-3 May 24, 2023
@delete-merged-branch delete-merged-branch bot deleted the setup-aiq branch May 24, 2023 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants