Skip to content
This repository has been archived by the owner on May 31, 2024. It is now read-only.

Passing environment variables to increase tolerance for metadata serv… #80

Merged
merged 1 commit into from
Oct 7, 2021

Conversation

AndreyDovydenko
Copy link
Contributor

During large scatter steps (e.g. nextflow sarek workflow), container jobs may fail to get credentials. In Cromwell this was due to jobs swamping the host metadata service. The fix in that case was to include the following in the container environment:

export AWS_METADATA_SERVICE_TIMEOUT=10
export AWS_METADATA_SERVICE_NUM_ATTEMPTS=10

Description of Changes
Passing the environment variables alongside the other ones we have

Description of how you validated changes
I've deployed the new CDK code into my own test account, ran a workflow and verified that newly provided env variables are printed out to me in the job log

Checklist

  • If this change would make any existing documentation invalid, I have included those updates within this PR
  • I have added unit tests that prove my fix is effective or that my feature works
  • I have linted my code before raising the PR

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license

@AndreyDovydenko AndreyDovydenko merged commit a93b0d6 into aws:main Oct 7, 2021
tneely added a commit that referenced this pull request Nov 11, 2021
* Ahmaalba/Adapter Role Least Privilege Design (#57)

* Reduce role privileges for adapter role

* Added adapter role output bucket permissions

* Adjusted Nextflow SubmitJobBatchPolicy props

* CodeQL Security Analysis (#62)

* Display message if no contexts deployed (#64)

* Removed redundant command "agc version" (#67)

* Pull request template (#65)

Creates a PR template for github

* Update issue templates (#63)

Updates the issue templates to provide baseline guidance for customers

* update dependencies (#69)

Co-authored-by: Pang, Lee <pwyming@.amazon.com>

* Updates the documentation for generating minimal permissions (#71)

* Add run-id flag to workflow status command (#74)

`agc workflow status` command was missing the `-r` flag to indicate that the string we are passing in is the run-id.

* Update workflow documentation to include more info about the URL format

* Update workflows.md

Updates the documentation for workflows since we are parsing them when deploying workflows.

* Update workflows.md

* add attribution for GATK best practices workflows (#78)

* add instructions to use AGC local CDK for bootstrapping (#79)

* Passing environment variables to increase tolerance for metadata service endpoint timeouts (#80)

* [Bug] Creating a cromwell Spot Context also creates an on demand Compute Env (#66)

Avoid creating an on demand compute env for cromwell Spot Context

* markjschreiber/better-account-not-activated-message (#75)

* Better error message when attempting to deploy a context with no activated account.

* Ensure context names are unique and sorted so deployment is in consistent order

* LPD Full Managed Policy Permission Descope (#61)

* LPD Batch permission descope

* Ahmaalba/Adapter Role Least Privilege Design (#57)

* Reduce role privileges for adapter role

* Added adapter role output bucket permissions

* Adjusted Nextflow SubmitJobBatchPolicy props

* Prettier fix

* Removal of BatchFullAccess managed policy

* Code deduplication

* LPD Full Implementation

* Env removal from engine options

* Regin and account parameter adjustment

* Nextflow onspot instance bug fix

* Usage of Arn.Format rather then custom ARN creation

* Made roles retrieve account and region through props rather then ArnComponents

* Removal of account and region to use default values

* Added batch:ListJobs permissions to nextflow adapter

* workflow engine documentation (#60)

* Adds windows 10 as an OS option (#81)

Tested AGC on a windows 10 machine running Ubuntu

* Adds amd instance types (#84)

* Corrected configuration of read workflow so that no MANIFEST is required (#86)

* Fixing acg typo to agc (#91)

Moving two instances of `acg` to `agc`.

* rnaseq pipeline to use proper inputs.json file (#90)

The rnaseq pipeline was referencing the inputs.json file from the atacseq example. This PR switches it to the proper inputs.json file.

The old inputs.json file:
```

{
  "input": "s3://healthai-public-assets-us-east-1/agc-demo-data/atacseq/design.csv",
  "genome": "GRCh38",
  "single_end": true
}
```
The new inputs.json file:
```
{
  "reads": "s3://1000genomes/phase3/data/HG00243/sequence_read/SRR*_{1,2}.filt.fastq.gz",
  "genome": "GRCh37",
  "skip_qc": true
}
```

* Add workflow output command (#85)

* workflow output command implementation

* Workflow and Context autocomplete implementation (#82)

* Workflow and Context autocomplete implementation

* Support Max vCpu in project contexts (#89)

feat: Configurable max vCpu for compute environments

Add maxVCpu as a property of Context to control the maximum number of vCpus a compute environment can have at a given time

Support a default Context values, set when a Context is unmarshalled, any value set in agc-project.yml will override the default.

refs: #31

* Latest Release Link (#92)

* Cleaned up Readme (#93)

* Added version checker to AGC (#94)

* Added version checker to AGC

* Addressed feedback on the PR

* Simplified version checker code

* Go compiler version 1.16.0 -> 1.17.2 (#95)

* Tabular text implementation and tests (#88)

* Tabular text implementation and tests

* Add Stale Issue Handling (#96)

* added project validate command (#97)

* markjschreiber/engine-in-contex-list (#99)

* add engine name to context list command output

* markjschreiber/clean-up-codebase (#98)

* chore: Update Pull Request Template to Follow Conventional Commits (#100)

Co-authored-by: Angela Li <dzl@amazon.com>

* ci: Improved ci workflow (#102)

* Builds the CDK project and validates eslint, also formats and fails if any formatting changes are detected.
* Checks for format changes in the CLI project

* ci: Add semantics behavior overrides (#106)

* fix: Shows the relevant error if the workflow logs can't be retrieved (#103)

* fix: workflows from demo-wdl-project should run without errors out of the box (#108)

* test: use go 1.17 features to simplify unit tests (#110)

* fix: show logs for workflows with more than 100 tasks (#114)

* fix: use proper go tags for windows build (#117)

* fix: use proper go tags for windows build

* use nf-core for this workflow (#123)

* feat: context destroy --force flag (#118)

* context destroy --force flag

* fix: Pass engine endpoint directly the wes adapter (#122)

* chore: clean up project init code (#126)

* ci: Add standard version, conventional changelog and bump script (#119)

* ci: Add standard version, conventional changelog and bump script

* fix: Fixes how users interact with the context commands (#115)

Fixes how users interact with the context commands by allowing contexts to be passed in without the -c command

* fix: invalid AWS Health url (#130)

correctly point AWS health link to `aws.amazon.com/health`

* build: Revamp build and release process (#127)

We are updating our build pipeline to better automate the release process. This requires a few build related changes in our source code.

* fix: Use correct context name (#132)

the context name in `/examples/demo-wdl-project` is `myContext`, which is used by the examples here.

* build: use latest build images (#134)

* feat: Initial infrastructure for MiniWdl support (#125)

Adds a MiniWdl stack which creates the appropriate batch resources and job definition to run MiniWdl jobs.

* test: Added context deploy benchmarking script (#111)

* Context deploy benchmark script

* fix: Adds a message when new logs aren't shown to the user immediately (#131)

* Adds a message when new logs aren't shown to the user immediately

* fix: correctly link to core app (#133)

* fix: temporary folder potential leak in some error scenarios. unit test for cdk command execution (#140)

* fix: temporary folder potential leak in some error scenarios. unit test for cdk command execution

* fixed typo in method name, updated implementation for channel waiter

* fix: updates context describe to be consistent with context destroy (#143)

* fix: updates context describe to be consistent with context destroy

* Best practice is to avoid mutation of inputs. Therefore, copy instead of move input (#145)

* build: move release files one folder down (#147)

* fix: miniwdl interpolation workaround

The gatk4-rnaseq-germline-snps-indels workflow revealed a possible bug in miniwdl where it doesn't correctly handle string interpolation of optional values used in a calculation. This change to the workflow works around the problem in miniwdl.

* fix: updates how the logs are shown from cloudwatch (#142)

fix: updates how the logs are shown from cloudwatch

* fix: improve contrast in docs (#149)

* docs: Add information about example inputs and runtimes (#146)

* add information about example inputs and runtimes

* fix: Asserts order deterministically (#153)

* docs: ongoing cost details (#152)

* added ongoing costs section to contexts.md
* added cost estimate links

* fix: Workflow status now ignores unqueryable stacks (#138)

fix: Workflow status now ignores unqueryable stacks

* docs: miniwdl engine docs and example project for GATK best practices (#158)

* add engine docs
* add miniwdl examples

* feat: Introducing AWS Lambda based WES Adapter for running the workflows (#155)

* Introducint AWS Lambda based WES Adapter for running the workflows

* Addressing the comments from PR review

* fix: Deregionalize min permissions (#128)

* add route53:ListHostedZonesByName

* de-regionalize resource arns

* split out CDK specific s3 permissions

* chore(release): 1.1.0

Co-authored-by: AhmadBassyiouni <30308260+abassyiouni@users.noreply.github.com>
Co-authored-by: Guy Hawkins <2242982+ghawk1ns@users.noreply.github.com>
Co-authored-by: Taylor <tneely@users.noreply.github.com>
Co-authored-by: Illya Yalovyy <IllyaYalovyy@users.noreply.github.com>
Co-authored-by: elliot-smith <elliotsm@amazon.com>
Co-authored-by: W. Lee Pang, PhD <wleepang@gmail.com>
Co-authored-by: Pang, Lee <pwyming@.amazon.com>
Co-authored-by: Drew Dresser <andrewjdresser@gmail.com>
Co-authored-by: Andrey Dovydenko <dovydenk@amazon.com>
Co-authored-by: Mark Schreiber <mrschre@amazon.com>
Co-authored-by: Sean Smith <seaam@amazon.com>
Co-authored-by: a-li <7497012+a-li@users.noreply.github.com>
Co-authored-by: Angela Li <dzl@amazon.com>
Co-authored-by: nbraid <braidn@amazon.com>
tneely pushed a commit to tneely/amazon-genomics-cli that referenced this pull request Nov 11, 2021
tneely added a commit to tneely/amazon-genomics-cli that referenced this pull request Nov 11, 2021
* Ahmaalba/Adapter Role Least Privilege Design (aws#57)

* Reduce role privileges for adapter role

* Added adapter role output bucket permissions

* Adjusted Nextflow SubmitJobBatchPolicy props

* CodeQL Security Analysis (aws#62)

* Display message if no contexts deployed (aws#64)

* Removed redundant command "agc version" (aws#67)

* Pull request template (aws#65)

Creates a PR template for github

* Update issue templates (aws#63)

Updates the issue templates to provide baseline guidance for customers

* update dependencies (aws#69)

Co-authored-by: Pang, Lee <pwyming@.amazon.com>

* Updates the documentation for generating minimal permissions (aws#71)

* Add run-id flag to workflow status command (aws#74)

`agc workflow status` command was missing the `-r` flag to indicate that the string we are passing in is the run-id.

* Update workflow documentation to include more info about the URL format

* Update workflows.md

Updates the documentation for workflows since we are parsing them when deploying workflows.

* Update workflows.md

* add attribution for GATK best practices workflows (aws#78)

* add instructions to use AGC local CDK for bootstrapping (aws#79)

* Passing environment variables to increase tolerance for metadata service endpoint timeouts (aws#80)

* [Bug] Creating a cromwell Spot Context also creates an on demand Compute Env (aws#66)

Avoid creating an on demand compute env for cromwell Spot Context

* markjschreiber/better-account-not-activated-message (aws#75)

* Better error message when attempting to deploy a context with no activated account.

* Ensure context names are unique and sorted so deployment is in consistent order

* LPD Full Managed Policy Permission Descope (aws#61)

* LPD Batch permission descope

* Ahmaalba/Adapter Role Least Privilege Design (aws#57)

* Reduce role privileges for adapter role

* Added adapter role output bucket permissions

* Adjusted Nextflow SubmitJobBatchPolicy props

* Prettier fix

* Removal of BatchFullAccess managed policy

* Code deduplication

* LPD Full Implementation

* Env removal from engine options

* Regin and account parameter adjustment

* Nextflow onspot instance bug fix

* Usage of Arn.Format rather then custom ARN creation

* Made roles retrieve account and region through props rather then ArnComponents

* Removal of account and region to use default values

* Added batch:ListJobs permissions to nextflow adapter

* workflow engine documentation (aws#60)

* Adds windows 10 as an OS option (aws#81)

Tested AGC on a windows 10 machine running Ubuntu

* Adds amd instance types (aws#84)

* Corrected configuration of read workflow so that no MANIFEST is required (aws#86)

* Fixing acg typo to agc (aws#91)

Moving two instances of `acg` to `agc`.

* rnaseq pipeline to use proper inputs.json file (aws#90)

The rnaseq pipeline was referencing the inputs.json file from the atacseq example. This PR switches it to the proper inputs.json file.

The old inputs.json file:
```

{
  "input": "s3://healthai-public-assets-us-east-1/agc-demo-data/atacseq/design.csv",
  "genome": "GRCh38",
  "single_end": true
}
```
The new inputs.json file:
```
{
  "reads": "s3://1000genomes/phase3/data/HG00243/sequence_read/SRR*_{1,2}.filt.fastq.gz",
  "genome": "GRCh37",
  "skip_qc": true
}
```

* Add workflow output command (aws#85)

* workflow output command implementation

* Workflow and Context autocomplete implementation (aws#82)

* Workflow and Context autocomplete implementation

* Support Max vCpu in project contexts (aws#89)

feat: Configurable max vCpu for compute environments

Add maxVCpu as a property of Context to control the maximum number of vCpus a compute environment can have at a given time

Support a default Context values, set when a Context is unmarshalled, any value set in agc-project.yml will override the default.

refs: aws#31

* Latest Release Link (aws#92)

* Cleaned up Readme (aws#93)

* Added version checker to AGC (aws#94)

* Added version checker to AGC

* Addressed feedback on the PR

* Simplified version checker code

* Go compiler version 1.16.0 -> 1.17.2 (aws#95)

* Tabular text implementation and tests (aws#88)

* Tabular text implementation and tests

* Add Stale Issue Handling (aws#96)

* added project validate command (aws#97)

* markjschreiber/engine-in-contex-list (aws#99)

* add engine name to context list command output

* markjschreiber/clean-up-codebase (aws#98)

* chore: Update Pull Request Template to Follow Conventional Commits (aws#100)

Co-authored-by: Angela Li <dzl@amazon.com>

* ci: Improved ci workflow (aws#102)

* Builds the CDK project and validates eslint, also formats and fails if any formatting changes are detected.
* Checks for format changes in the CLI project

* ci: Add semantics behavior overrides (aws#106)

* fix: Shows the relevant error if the workflow logs can't be retrieved (aws#103)

* fix: workflows from demo-wdl-project should run without errors out of the box (aws#108)

* test: use go 1.17 features to simplify unit tests (aws#110)

* fix: show logs for workflows with more than 100 tasks (aws#114)

* fix: use proper go tags for windows build (aws#117)

* fix: use proper go tags for windows build

* use nf-core for this workflow (aws#123)

* feat: context destroy --force flag (aws#118)

* context destroy --force flag

* fix: Pass engine endpoint directly the wes adapter (aws#122)

* chore: clean up project init code (aws#126)

* ci: Add standard version, conventional changelog and bump script (aws#119)

* ci: Add standard version, conventional changelog and bump script

* fix: Fixes how users interact with the context commands (aws#115)

Fixes how users interact with the context commands by allowing contexts to be passed in without the -c command

* fix: invalid AWS Health url (aws#130)

correctly point AWS health link to `aws.amazon.com/health`

* build: Revamp build and release process (aws#127)

We are updating our build pipeline to better automate the release process. This requires a few build related changes in our source code.

* fix: Use correct context name (aws#132)

the context name in `/examples/demo-wdl-project` is `myContext`, which is used by the examples here.

* build: use latest build images (aws#134)

* feat: Initial infrastructure for MiniWdl support (aws#125)

Adds a MiniWdl stack which creates the appropriate batch resources and job definition to run MiniWdl jobs.

* test: Added context deploy benchmarking script (aws#111)

* Context deploy benchmark script

* fix: Adds a message when new logs aren't shown to the user immediately (aws#131)

* Adds a message when new logs aren't shown to the user immediately

* fix: correctly link to core app (aws#133)

* fix: temporary folder potential leak in some error scenarios. unit test for cdk command execution (aws#140)

* fix: temporary folder potential leak in some error scenarios. unit test for cdk command execution

* fixed typo in method name, updated implementation for channel waiter

* Move release files one folder down

* fix: updates context describe to be consistent with context destroy (aws#143)

* fix: updates context describe to be consistent with context destroy

* Best practice is to avoid mutation of inputs. Therefore, copy instead of move input (aws#145)

* fix: miniwdl interpolation workaround

The gatk4-rnaseq-germline-snps-indels workflow revealed a possible bug in miniwdl where it doesn't correctly handle string interpolation of optional values used in a calculation. This change to the workflow works around the problem in miniwdl.

* fix: updates how the logs are shown from cloudwatch (aws#142)

fix: updates how the logs are shown from cloudwatch

* fix: improve contrast in docs (aws#149)

* docs: Add information about example inputs and runtimes (aws#146)

* add information about example inputs and runtimes

* fix: Asserts order deterministically (aws#153)

* docs: ongoing cost details (aws#152)

* added ongoing costs section to contexts.md
* added cost estimate links

* fix: Workflow status now ignores unqueryable stacks (aws#138)

fix: Workflow status now ignores unqueryable stacks

* docs: miniwdl engine docs and example project for GATK best practices (aws#158)

* add engine docs
* add miniwdl examples

* feat: Introducing AWS Lambda based WES Adapter for running the workflows (aws#155)

* Introducint AWS Lambda based WES Adapter for running the workflows

* Addressing the comments from PR review

* fix: Deregionalize min permissions (aws#128)

* add route53:ListHostedZonesByName

* de-regionalize resource arns

* split out CDK specific s3 permissions

* fix for installation.md (aws#161)

* feat: Improved Workflow logs (aws#156)

* feat: Improved Workflow logs

By default, workflow logs for a run will log out run status and individual task status.
Tasks logs can be emitted with `--task <taskId>` for a single task log, `--all-tasks` for all task logs, and `--failed-tasks` for failed task logs.

* chore(release): 1.1.0

Co-authored-by: AhmadBassyiouni <30308260+abassyiouni@users.noreply.github.com>
Co-authored-by: Guy Hawkins <2242982+ghawk1ns@users.noreply.github.com>
Co-authored-by: Illya Yalovyy <IllyaYalovyy@users.noreply.github.com>
Co-authored-by: elliot-smith <elliotsm@amazon.com>
Co-authored-by: W. Lee Pang, PhD <wleepang@gmail.com>
Co-authored-by: Pang, Lee <pwyming@.amazon.com>
Co-authored-by: Drew Dresser <andrewjdresser@gmail.com>
Co-authored-by: Andrey Dovydenko <dovydenk@amazon.com>
Co-authored-by: Mark Schreiber <mrschre@amazon.com>
Co-authored-by: Sean Smith <seaam@amazon.com>
Co-authored-by: a-li <7497012+a-li@users.noreply.github.com>
Co-authored-by: Angela Li <dzl@amazon.com>
Co-authored-by: nbraid <braidn@amazon.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants