-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add local env runtime variables override for local testing #690
Conversation
api/py/ai/chronon/repo/run.py
Outdated
@@ -326,7 +329,12 @@ def set_runtime_env(args): | |||
environment["cli_args"]["CHRONON_DRIVER_JAR"] = args.chronon_jar | |||
environment["cli_args"]["CHRONON_ONLINE_JAR"] = args.online_jar | |||
environment["cli_args"]["CHRONON_ONLINE_CLASS"] = args.online_class | |||
order = ["conf_env", "team_env", "default_env", "common_env", "cli_args"] | |||
# If the job is running on airflow, ignore the dev team environment. | |||
if 'AIRFLOW_CTX_EXECUTION_DATE' in os.environ: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to leave the Airbnbism code within our data repo. So when we are ready to open source, it is more general here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! If we cannot add the airflow related logic in chronon code, I am thinking on some other options:
-
Ask users to add an extra test flag when they run the python job for local testing, like
python3 ~/.local/bin/run.py --mode=backfill --conf=production/joins/zipline_test/test_online_join_small.v2 --ds=2022-07-02 --test
And then in run.py file we can prioritize the dev_team_env if the test flag is set here. -
Ask users to put the compiled config files which they want to test into the test folder. And they can add environment variables as key test in the teams.json file so that run.py will automatically pick the test parameter. No code change is needed in run.py
-
Nothing needed to be changed from the user side, but we add something in data repo code, like add --env=prod in airflow scheduling code. In run.py we try to get env from the command, if env is set to be prod, then use production env otherwise use dev_team_env. This will help the airbnb internal users but will introduce extra complexity for outside users(since they may not use airflow and they may need to add this env parameter)
Both option 1 and 2 needs users to change the way they do the testing. Wander do you have any other suggestions here? @better365
@@ -326,7 +336,7 @@ def set_runtime_env(args): | |||
environment["cli_args"]["CHRONON_DRIVER_JAR"] = args.chronon_jar | |||
environment["cli_args"]["CHRONON_ONLINE_JAR"] = args.online_jar | |||
environment["cli_args"]["CHRONON_ONLINE_CLASS"] = args.online_class | |||
order = ["conf_env", "team_env", "default_env", "common_env", "cli_args"] | |||
order = ["conf_env", "team_env", "production_team_env", "default_env", "common_env", "cli_args"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put production_team_env
before team_env
?
order = ["conf_env", "team_env", "production_team_env", "default_env", "common_env", "cli_args"] | |
order = ["conf_env", "production_team_env", "team_env", "default_env", "common_env", "cli_args"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @better365 the PR has been moved to here #698
For this change if we put production_team_env over team_env, then even for local testing, as long as production environment is provided in team.json file it will always be picked right? We want to use dev environment if we set env=dev.
The logic is:
env=dev: production_team_env = production environment, team_env = dev team environment. and we want to use dev team environment.
PR has been moved to #698 |
Summary
Currently Chronon set the runtime environment variables from several sources based on the below priority order:
- Environment variables existing already.
- Environment variables derived from args (like app_name)
- conf.metaData.modeToEnvMap for the mode (set on config)
- team environment per context and mode set on teams.json
- default team environment per context and mode set on teams.json
- Common Environment set in teams.json
The team environment per context mode set is a team level environment setting which defined in the json file teams.json. Team could put the environment settings(like EMR clusters and queue) based on their need, and these settings will be applied to all the jobs within the team folder.
However, sometimes our users may want to test their new job locally, the local tests may use different environment settings comparing with the prod run on airflow. Currently they need to add their job config into a different folder(like the test folder) other than the production folder(which will be parsed as context), and add context specific environment variables, but this may bring new complexity to the testing process since they still have to copy their changes to the production config to make it work in production environment.
This PR is trying to solve this problem by introducing a new arg env whose default value is dev. Users can define runtime environment variables for different environment in teams.json file, for example, for production environment and dev environment:
--env=production
, so the production environment will be used.Why / Goal
Test Plan
Local tests with
python3 scripts/run.py --mode=backfill --conf=production/joins/zipline_test/test_online_join_small.v2 --ds=2024-02-20 --env=production
Local tests with
python3 scripts/run.py --mode=backfill --conf=production/joins/zipline_test/test_online_join_small.v2 --ds=2024-02-20 --env=dev
Local tests with
python3 scripts/run.py --mode=backfill --conf=production/joins/zipline_test/test_online_join_small.v2 --ds=2024-02-20
Local tests with
python3 scripts/run.py --mode=backfill --conf=production/joins/zipline_test/test_online_join_small.v2 --ds=2024-02-20 --env=dev
Checklist
Reviewers
@better365