Skip to content

rouault/cog_validator

Repository files navigation

Cloud Optimized GeoTIFF validator

This is a standalone (Python / Flask) service that allows users to submit GeoTIFF files (preferably by URL) and check their compliance with the Cloud Optimized GeoTIFF (COG) specification: https://trac.osgeo.org/gdal/wiki/CloudOptimizedGeoTIFF

This utility is also compatible of being deployed as a AWS Lambda function, through the AWS API Gateway.

API endpoint: /api/validate

GET request, with the following query parameters :

  • url (required): URL to the GeoTIFF file
  • use_vsicurl=true/false (optional, defaults to true): if true, the file is read using the GDAL /vsicurl/ subsystem (using HTTP GET range requests). If false, the file is locally downloaded in its entirety before being validated (note: when the service run as a AWS Lambda function, only up to 500 MB can be downloaded)

For example: /api/validate?url=http://path/to/my.tif

POST request, with a form encoded with multipart/form-data

  • file: file content as multipart attachment

POST request, with a form encoded with application/x-www-form-urlencoded

  • url (exclusive with file): URL to the GeoTIFF file
  • use_vsicurl=true/false (defaults to true). See above
  • filename (optional, recommended): file name
  • file_b64: file content as a Base64 encoded string

This later interface is mostly needed to overcome a current limitation of the AWS API Gateway interface that does not accept multipart/form-data

For all the above interfaces, the query will return a JSON document with the following keys:

  • status (required): 'success' or 'failure'
  • error (optional): error message. present when the request is invalid, or the file cannot be read
  • validation_errors (optional): array of errors. Only present if the file is a GeoTIFF file but does not comply with the COG requirements
  • gdal_info (optional): dictionary with the output of "gdalinfo -json". Only present if the file is a GeoTIFF file
  • details (optional): dictionary with file offsets of IFDs and first data block of each IFD. Only present if the file is a GeoTIFF file

HTML endpoint: /html

The service expose a basic HTML page for users to submit their GeoTIFF files and display the result of the validation

AWS Lambda / API Gateway

The service can be deployed as a AWS Lamba function, accessible through the AWS API Gateway.

Running "make" will generate a cog_validator.zip that contains the Python code of this service, the Python dependencies as well as a GDAL 2.2 build. This requires Docker to be available, to generate the cog_validator_deps.zip (which contains the Python dependencies as well as a GDAL 2.2 build)

Assuming you have a AWS account with initial setup, follow the following steps to deploy the service:

  • Role creation

    • Go to the AWS IAM management console
    • Click on "Roles"
    • Click on "Create new role"
    • Click on the Select button of "AWS Lambda"
    • In the Filter enter "AWSLambdaBasicExecutionRole" and check the corresponding checkbox
    • Click on "Next Step"
    • Enter "lambda_basic_execution" as role name
    • Click on "Create role"
  • Lambda function creation

    • Go to the AWS Lambda management console
    • "Create function"
    • In "Select Blueprint" step, select "Author from scratch"
    • Skip Add Trigger with "Next"
    • Give a name to the function, for example "cog_validator"
    • Select "Python 2.7" as Runtime
    • Select "Upload a .ZIP file" as "Code entry type"
    • In "Function package", click on Upload an select the generated cog_validator.zip
    • Enter "lambda_main.handle" in "Handler"
    • In "Existing role", select "lambda_basic_execution"
    • Click on Next, and Creation function to proceed on file uploading and lambda function creation
    • Edit the Configuration / Advanced settings, to increase the timeout to 5 minutes and the memory to 512 MB, and Save
    • To test everything works, in Actions dropdown list, choose "Configure test event" and enter the following payload.
        {
            "headers": { "Host": "foo" },
            "httpMethod": "GET",
            "queryStringParameters": { "url": "http://svn.osgeo.org/gdal/trunk/autotest/gcore/data/byte.tif" },
            "path": "/api/validate"
        }
  • API Gateway deployment

    • Go to the AWS API Gateway management console
    • In APIs tab, click on "Create API"
    • Enter "cog_validator" as API name
    • Click on "Create API"
    • In Resources tab, in Actions dropdown list, select "Create Resource"
    • Check the "Configure as Proxy resource" checkbox and click on "Create Resource"
    • In the "/{proxy+} - ANY - Setup" form that is now displayed, keep the "Lambda Function Proxy" integration type
    • Select the appropriate Lambda region (the one in which you created the Lambda function in the above steps)
    • In "Lambda Function" entry, type "cog_validator"
    • Click on "Save" and confirm that you add permission to the API Gateway to invoke your Lambda function
    • To test everything works, click on the TEST icon
    • In Resources tab, in Actions dropdown list, select "Deploy API"
    • In Deployment stage, select "New stage"
    • Enter "prod" as stage name
    • Click on Deploy
    • A new form is displayed with an invoke URL like https://some_value_here.execute-api.eu-central-1.amazonaws.com/prod
    • Copy-paste it in your browser and add "/html" at the end. A HTML page "Cloud optimized GeoTIFF validator" should now be displayed !

Development

GDAL 2.2 with its Python (2.7) bindings must be installed, as well as the Python flask and requests modules.

A basic self test is available with the ./test.sh script

Credits

The following resources have served as inspiration for AWS Lamba and API Gateway deployment