-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add spark k8s operator launcher #1225
Conversation
/kind feature |
e77ebd4
to
a5f7960
Compare
/retest |
a5f7960
to
51a63f4
Compare
51a63f4
to
ca466af
Compare
Signed-off-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
ca466af
to
d855776
Compare
Signed-off-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
bef6877
to
d463fba
Compare
@@ -265,7 +264,7 @@ def _stage_file(self, file_path: str, job_id: str) -> str: | |||
return blob_uri_str | |||
|
|||
def dataproc_submit( | |||
self, job_params: SparkJobParameters | |||
self, job_params: SparkJobParameters, extra_properties: Dict[str, str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you saw #1198. Should we close that PR after yours has been merged, or first merge that one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No i haven't. It looks like it is solving a slightly different problem, namely user-specified extra options. Don't feel strongly which one to merge first (and which one to rebase).
Signed-off-by: Oleg Avdeev <oleg.v.avdeev@gmail.com>
I'd rather add tests in a followup PR. I plan to add another integration test, alongside with |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oavdeev, woop The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
What this PR does / why we need it:
This allows Feast to run Spark jobs using spark-k8s-operator.
To enable it:
spark_launcher="k8s"
spark_k8s_use_incluster_config
toTrue
orFalse
, depending on whether spark is running in the same k8s cluster or not. Make sure that feast serviceaccount has permissions to createSparkApplication
resources. Provide KUBECONFIG if running Feast outside of the cluster.spark_staging_location
andhistorical_feature_output_location
to be set and uses3a://
URL scheme if using S3.spark_k8s_job_template_path
to point to YAML template containing spark application configuration. Feast comes with one out of the box, but in production you'll likely want to provide a custom one.Additional changes:
s3a://
URL scheme (used by OSS Spark to access S3, we didn't need this for EMR since it understandss3://
scheme just fine)Does this PR introduce a user-facing change?: