Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pubsys: added ami validation #3020

Merged
merged 2 commits into from
Apr 20, 2023

Conversation

mjsterckx
Copy link
Contributor

@mjsterckx mjsterckx commented Apr 14, 2023

Description of changes:

Added a validate-ami subcommand to the pubsys tool. This command has the following signature:

pubsys-validate-ami 0.1.0
Validates EC2 images

USAGE:
    pubsys --infra-config-path <infra-config-path> validate-ami [FLAGS] [OPTIONS] --expected-amis-path <expected-amis-path>

FLAGS:
        --json       If this argument is given, print the validation results summary as a JSON object instead of a
                     plaintext table
    -h, --help       Prints help information
    -V, --version    Prints version information

OPTIONS:
        --expected-amis-path <expected-amis-path>           File holding the expected amis
        --write-results-path <write-results-path>           Optional path where the validation results should be written
        --write-results-filter <write-results-filter>...
            Optional filter to only write validation results with these statuses to the above path The available
            statuses are: `Correct`, `Incorrect`, `Missing`
        --log-level <log-level>
            How much detail to log; from least to most: ERROR, WARN, INFO, DEBUG, TRACE [default: INFO]

The function takes a path to a file containing an image object per region. An example file looks like this:

{
  "us-west-2": {
    "id": "ami-012345678",
    "name": "bottlerocket-aws-k8s-1.24-x86_64-v1.14.0-abcdef-dirty",
    "public": true,
    "launch_permissions": [
      {
        "group": "all"
      }
    ]
  }
}

The function will call the describe-images function on each image in the file to ensure that the retrieved id, name, and publicity match the expected values. If public is false, then the specific launch_permissions are retrieved using describe-image-attribute and included in the comparison. Aside from publicity and launch_permissions, the function will check that sriov_net_support is simple and ena_support is true.

The result of the function looks like this:

+-----------+---------+-----------+---------+------------+
| String    | correct | incorrect | missing | unreachable |
+-----------+---------+-----------+---------+------------+
| us-west-2 | 0       | 0         | 1       | 0          |
+-----------+---------+-----------+---------+------------+
  • correct indicates that all retrieved values match the expected values
  • incorrect indicates that at least one value does not match the expected value
  • missing indicates that the specific image failed to be retrieved
  • unreachable indicates that the images in this region could not be retrieved because the region could not be reached

--write-results-path and --write-results-filter allow the user to write the results of the ami validation to a JSON file. The filter determines which statuses will be written to this file. The file looks like this:

[
  {
    "id": "ami-012345678",
    "expected_image_def": {
      "id": "ami-012345678",
      "name": "bottlerocket-aws-k8s-1.24-x86_64-v1.14.0-abcdef-dirty",
      "public": true,
      "launch_permissions": null,
      "ena_support": true,
      "sriov_net_support": "simple"
    },
    "actual_image_def": null,
    "region": "us-west-2",
    "status": "Missing"
  }
]

Modified the SSM validation results to represent the new Unreachable status as well.

Testing done:

  • Unit tests
  • Ran the program on an unregistered ami (results seen above) and on registered public ones, which showed as correct.

Terms of contribution:

By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.

@mjsterckx mjsterckx requested review from cbgbt, rpkelly and webern April 14, 2023 19:05
Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Most of these are just nitpicks or tips. The thing that I care the most to see changed is to add a test for the AMI file parsing code. The function is a pure function and the logic inside is complex enough to warrant it.

write_results_path: Option<PathBuf>,

#[structopt(long, requires = "write-results-path")]
/// Optional filter to only write validation results with these statuses to the above path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but the help text looks funky without this.

Suggested change
/// Optional filter to only write validation results with these statuses to the above path
/// Optional filter to only write validation results with these statuses to the above path.

Comment on lines 61 to 82

pub(crate) async fn describe_images<'a>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a docstring here to explain why we need this.

region: &Region,
client: &Ec2Client,
image_ids: Vec<String>,
expected_image_public: &HashMap<String, bool>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd document what the purpose of this parameter is in the docstring, since it's not obvious from the function signature.

correct: i32,
incorrect: i32,
missing: i32,
accessible: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have strong feelings about rewriting this, but I'm realizing that because accessible removes meaning from the other values here, I think a more maintainable way to write this would be to:

  • Change AmiValidationRegionSummary to an enum of something like
enum AmiValiationRegionSummary {
    Results {
        correct: u64, // etc
    },
    Unreachable,
  • Modify display() to render this by copying it to a private implementation which uses the marker values, or n/a or something similar in the table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's currently implemented the same way in the ssm-validation so I will address those together in a different PR.

}),
)
})
.collect::<HashMap<Region, ami::Result<HashSet<AmiValidationResult>>>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit of a gnarly turbofish that probably isn't needed, since there's a type specifier on the variable binding.

Suggested change
.collect::<HashMap<Region, ami::Result<HashSet<AmiValidationResult>>>>();
.collect();

})
.collect::<HashMap<Region, ami::Result<HashSet<AmiValidationResult>>>>();

let validation_results = AmiValidationResults::new(results);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but usually new isn't appropriate when you're initializing something that's computation heavy like that. A better name would be from_result_map or evaluate_results() or something.

Suggested change
let validation_results = AmiValidationResults::new(results);
let validation_results = AmiValidationResults::new(results);

Comment on lines 199 to 203
// If the value for each Region is a JSON object instead of an array, wrap it in an array
let vectored_images = expected_amis
.into_iter()
.map(|(region, value)| {
let image_values = match value.as_array() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we write a test for this behavior? As an alternative tip, you can get Serde to parse this for you "for free" via #[serde(untagged)] on an enum.

enum ImageDef {
    Image(Image),
    ImageList(Vec<Image>),
}

impl ImageDef {
    fn images(&self) -> Vec<Image> {
        match self {
            // etc
        }
    }
}

Copy link
Contributor

@ecpullen ecpullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still need to check validate_ami/results.rs

let mut requests = Vec::with_capacity(clients.len());
for region in clients.keys() {
trace!("Requesting images in {}", region);
let ec2_client: &Ec2Client = &clients[region];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let ec2_client: &Ec2Client = &clients[region];
let ec2_client: &Ec2Client = &clients.get(region).context()...;

Indexing with [] can panic.

) -> HashMap<&'a Region, Result<HashMap<String, ImageDef>>> {
// Build requests for images; we have to request with a regional client so we split them by
// region
let mut requests = Vec::with_capacity(clients.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using an iterator for this function could simplify some of the logic below.

region.as_ref()
);
let mut image_def = ImageDef::from(image.to_owned());
if !*expected_public {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this go above 141 and then make ImageDef::from() take in the bool so image_def doesn't need to be mut?

let infra_config = InfraConfig::from_path_or_lock(&args.infra_config_path, false)
.context(error::ConfigSnafu)?;

let aws = infra_config.aws.clone().unwrap_or_default();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need .clone()

info!("Parsed expected ami file");

// Create a HashMap of AmiClients, one for each region where validation should happen
let base_region = &Region::new(aws.regions[0].clone());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let base_region = &Region::new(aws.regions[0].clone());
let base_region = &Region::new(aws.regions.get(0).context()...);

[] can panic

Comment on lines 155 to 156
/// Validates EC2 images in a single region, based on a Vec (ImageDef) of expected images
/// and a HashMap (AmiId, ImageDef) of actual retrieved images. Returns a HashSet of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit

Suggested change
/// Validates EC2 images in a single region, based on a Vec (ImageDef) of expected images
/// and a HashMap (AmiId, ImageDef) of actual retrieved images. Returns a HashSet of
/// Validates EC2 images in a single region, based on a `Vec<ImageDef>` of expected images
/// and a `HashMap<AmiId, ImageDef>` of actual retrieved images. Returns a `HashSet<AmiValidationResult>` containing ...

Comment on lines 159 to 160
expected_images: Vec<ImageDef>,
actual_images: &HashMap<AmiId, ImageDef>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason 1 is a reference and the other is taking ownership?

Comment on lines 169 to 171
if image.public {
image.launch_permissions = None;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't image.launch_permissions set to None when the ImageDef is created? If that's not possible can launch_permissions be a function?

Comment on lines 173 to 174
image.id.clone(),
image.clone(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to duplicate the image.id?

)
.context(error::ParseExpectedImagesFileSnafu)?;

// If the value for each Region is a JSON object instead of an array, wrap it in an array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would cause the value to be an object instead of an array? This is a file that another pubsys command is writing right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, this takes a file that is written by pubsys ami where each region has a single AMI object. However, I wanted to add functionality where the command can take a file containing an array of AMI objects per region as well.

Copy link
Contributor

@ecpullen ecpullen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the suggested AmiValidationRegionSummary refactor. Otherwise just a few small things.

Comment on lines 24 to 26
impl Display for AmiValidationResultStatus {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
AmiValidationResultStatus::Correct => write!(f, "Correct"),
AmiValidationResultStatus::Incorrect => write!(f, "Incorrect"),
AmiValidationResultStatus::Missing => write!(f, "Missing"),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl Display for AmiValidationResultStatus {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
match self {
AmiValidationResultStatus::Correct => write!(f, "Correct"),
AmiValidationResultStatus::Incorrect => write!(f, "Incorrect"),
AmiValidationResultStatus::Missing => write!(f, "Missing"),
}
}
}
derive_display_from_serialize!(AmiValidationResultStatus);

That will have the same effect. And expand if new variants of the enum are added.

Comment on lines 34 to 47
impl FromStr for AmiValidationResultStatus {
type Err = super::Error;

fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s {
"Correct" => Ok(Self::Correct),
"Incorrect" => Ok(Self::Incorrect),
"Missing" => Ok(Self::Missing),
filter => Err(Self::Err::InvalidStatusFilter {
filter: filter.to_string(),
}),
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
impl FromStr for AmiValidationResultStatus {
type Err = super::Error;
fn from_str(s: &str) -> std::result::Result<Self, Self::Err> {
match s {
"Correct" => Ok(Self::Correct),
"Incorrect" => Ok(Self::Incorrect),
"Missing" => Ok(Self::Missing),
filter => Err(Self::Err::InvalidStatusFilter {
filter: filter.to_string(),
}),
}
}
}
derive_fromstr_from_deserialize!(AmiValidationResultStatus)

pub(crate) region: Region,

/// The validation status of the image
pub(crate) status: AmiValidationResultStatus,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just be a function since it is described by other fields of this enum.

Copy link
Contributor

@cbgbt cbgbt Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I suggested this be added as a field in a prior revision, since it's otherwise not really a Result being computed, but rather a validation computation being created which is later used to compute the result.

I guess if we renamed this AmiValidation with a method AmiValidation::get_result() or something, I'd be ok with that.

But aren't these serialized? Do we want to keep the status so that the computation result is specifically kept?

expected_images
.get(region)
.map(|e| e.to_owned())
.unwrap_or(vec![]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
.unwrap_or(vec![]),
.unwrap_or_default();,


#[derive(Debug, Snafu)]
#[snafu(visibility(pub(super)))]
pub(crate) enum Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the error enum variants and ensure they're being used. I see a few that aren't being used anywhere. e.g. ReadValidationConfig, ParseValidationConfig, MissingField etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the error enum variants and ensure they're being used. I see a few that aren't being used anywhere. e.g. ReadValidationConfig, ParseValidationConfig, MissingField etc.

As an aside, the reason you have to check is because the code that Snafu generates "uses" each enum variant so the compiler can't warn you about unused variants.

/// Represents a single EC2 image validation result
#[derive(Debug, Eq, Hash, PartialEq, Serialize)]
pub(crate) struct AmiValidationResult {
/// The id of the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: proper capitalization

Suggested change
/// The id of the image
/// The ID of the image

and where ever else this may apply

/// The id of the image
pub(crate) id: String,

/// ImageDef containing expected values for the image
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:

Suggested change
/// ImageDef containing expected values for the image
/// `ImageDef` containing expected values for the image

and where ever else this may apply.

Copy link
Contributor

@webern webern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the testing.

use std::path::PathBuf;
use structopt::{clap, StructOpt};

/// Validates EC2 images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Describe this in more detail. What's the input, what's the output, etc.


let validation_results = AmiValidationResults::new(results);

// If a path was given to write the results to, write the results
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing nit

Suggested change
// If a path was given to write the results to, write the results
// If a path was given, write the results


#[derive(Debug, Snafu)]
#[snafu(visibility(pub(super)))]
pub(crate) enum Error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check the error enum variants and ensure they're being used. I see a few that aren't being used anywhere. e.g. ReadValidationConfig, ParseValidationConfig, MissingField etc.

As an aside, the reason you have to check is because the code that Snafu generates "uses" each enum variant so the compiler can't warn you about unused variants.

Comment on lines 169 to 202
#[snafu(display("Failed to describe images in {}: {}", region, source))]
DescribeImages {
region: String,
source: SdkError<DescribeImagesError>,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where-ever we're using a SdkError as source we need to do this to get the full error message:

Suggested change
#[snafu(display("Failed to describe images in {}: {}", region, source))]
DescribeImages {
region: String,
source: SdkError<DescribeImagesError>,
},
use aws_smithy_types::error::display::DisplayErrorContext;
#[snafu(display("Failed to describe images in {}: {}", region, DisplayErrorContext(source)))]
DescribeImages {
region: String,
source: SdkError<DescribeImagesError>,
},

See https://docs.rs/aws-smithy-client/latest/aws_smithy_client/enum.SdkError.html

When logging an error from the SDK, it is recommended that you either wrap the error in DisplayErrorContext, use another error reporter library that visits the error’s cause/source chain, or call Error::source for more details about the underlying cause.

Note: This needs to be a tree wide change, but might as well start here.

Please create an issue for the other places if you're willing.

@mjsterckx mjsterckx force-pushed the pubsys-ami-validation branch from 57dc749 to 0e83ef4 Compare April 19, 2023 20:54
Copy link
Contributor

@cbgbt cbgbt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Sorry to pile on more feedback -- this is a substantial change! Just some tidying up.

}
}

/// Fetches all images with IDs in `image_ids`. The map `expected_image_public` is used to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the docstrings for this and describe_images_in_region may not have been updated for some subsequent changes to the code. I would fix this before merging.

Comment on lines 92 to 94
match result {
Ok(result) => Ok(result),
Err(_) => Err(error::Error::UnreachableRegion {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tip (don't feel compelled to change this): I think this is the same as

result.map_err(|_| error::Error::UnreachableRegion { region: region.to_string() })

match on Result and Option when the operation to be carried out is already expressed as a method on those structs tends to result in deep nesting.

Comment on lines 91 to 94
incorrect: i32,
missing: i32,
unexpected: i32,
accessible: bool,
unreachable: i32,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these should probably be unsigned.

Comment on lines 91 to 93
Err(_) => Err(error::Error::UnreachableRegion {
region: region.to_string(),
}),
Copy link
Contributor

@cbgbt cbgbt Apr 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense for this to hold onto the original error?

I think we just want to make sure the reason for the unavailability can be surfaced. Ensuring that it is logged somewhere at ERROR level probably suffices, since it's not like the deeper result information is stored in our serialized format.

@mjsterckx mjsterckx force-pushed the pubsys-ami-validation branch from 0e83ef4 to 2e8b67d Compare April 20, 2023 00:14
@mjsterckx mjsterckx merged commit 99b5bf1 into bottlerocket-os:develop Apr 20, 2023
@mjsterckx mjsterckx deleted the pubsys-ami-validation branch April 20, 2023 02:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants