Here you can find additional components for running Gaarf tool in Google Cloud.
There are the following components provided:
- Cloud Functions (in functions folder) - CFs that you can use for running gaarf with queries located on GCS
- Cloud Workflow (in workflow folder) - a Cloud Workflow that orchestrates enumerating queries on GCS and calling CFs with each of them
- Interactive generator that creates all required components and shell scrips for you
So for using Cloud Workflow you need to deploy the workflow and cloud functions. But you can use cloud functions independently if you need.
The easiest way to initialize a Google Cloud infrastructure for automated running of your queries is to use our interactive generator create-gaarf-wf.
npm init gaarf-wf@latest
To better understand what the generator creates for you we recommend to read the following sections.
NOTE: if you're installing on Google Cloud Shell then it's worth cleaning npm cache first to make sure you're using the latest versions:
rm -rf ~/.npm/
To deploy workflow for gaarf you can use gcp/workflow/setup.sh
script. It accepts parameters:
-n
or--name
- name of the workflow, by default it will begaarf-wf
-l
or--location
- location region for workflow, be default it will beus-central1
As soon as you deployed your workflow (you can use the same workflow for many projects by the way) you can call it directly (via gcloud workflows run
) or create a Scheduler job for calling it regularly.
For both methods you'll need to provide parameters for workflow.
Executing workflow directly:
gcloud workflows run gaarf-wf \
--data='{
"cloud_function":"gaarf",
...
}'
Create a Scheduler job:
PROJECT_ID=$(gcloud config get-value project 2> /dev/null)
PROJECT_NUMBER=$(gcloud projects describe $PROJECT_ID --format="csv(projectNumber)" | tail -n 1)
# be default Cloud Worflows run under the Compute Engine default service account:
SERVICE_ACCOUNT=$PROJECT_NUMBER-compute@developer.gserviceaccount.com
REGION=us-central1
WORKFLOW_NAME=gaarf-wf
JOB_NAME=$WORKFLOW_NAME
data='{
\"cloud_function\":\"gaarf\",
...
}'
# you might need to delete job first:
#gcloud scheduler jobs delete $JOB_NAME --location $REGION
# daily at midnight
gcloud scheduler jobs create http $JOB_NAME \
--schedule="0 8 * * *" \
--uri="https://workflowexecutions.googleapis.com/v1/projects/$PROJECT_ID/locations/$REGION/workflows/$WORKFLOW_NAME/executions" \
--location=$REGION \
--message-body="{\"argument\": \"$data\"}" \
--oauth-service-account-email="$SERVICE_ACCOUNT"
Please notice the escaping of quotes for job's argument.
cloud_function
- name for gaarf cloud function (by defaultgaarf
but could be customized during deployment)gcs_bucket
- GCS bucket name where queries are stored, by default your GCP project idads_queries_path
- relative GCS path for ads queries, e.g. "gaarf/ads-queries/" (then workflow will fetch all files from gs://your_bucket/gaarf/adds-queries/*) (required)bq_queries_path
- relative GCS path for BigQuery queries, e.g. "gaarf/bq-queries" (required)dataset
- BigQuery dataset id for writing results of ads queries (required)bq_dataset_location
- BigQuery dataset location, e.g. "europe", by default "us" (optional)cid
- Ads customer id, can be either a MCC or child account, without dashes (required), or a list of CIDs comma separated (required)cid_ignore
- Ads customer id (or a comma separated list) to exclude from the resulting listcustomer_ids_query
- a path to a file with GAQL query that refines for which accounts to execute scripts (optional)customer_ids_batchsize
- a batch size for customer ids (cids), if not specified accounts will be processed by 1000 accounts (see gaarf-getids CF)customer_ids_offset
- an offset in resulting list of accounts if you need to implemented an external batching - i.e. execute workflow only for a subset of accounts from specified seed account(s). It differs from internal batching where accounts processed by batches to workaround the maximum steps limitation of Cloud Workflows (100K runtime steps).ads_config_path
- a full GCS path to your google-ads.yaml config, e.g. "gs://MYPROJECT/path/to/google-ads.yaml" (required)ads_macro
- an object with macro for Ads queries, see the root README (optional)bq_macro
- an object with macro for BigQuery queries, see the root README (optional)bq_sql
- an object with sql parameters for BigQuery queries, see the root README (optional)bq_writer_options
- additional options (as object) for BqWriter, see the root README (optional)concurrency_limit
- a custom concurrency level to use instead of the default one (20) - it's a number of concurrent threads for parallel loop over scripts and over accounts. For example, if you specify 5 then there will be 5 parallel executions of scripts for each of them there will be 5 parallel running CF executions, so in total in a moment there will be 5*5=25 parallel executions of the CF. (optional)workflow_ads_id
- a workflow id for child Ads workflow, by default it's the parent name with the '-ads' suffix (optional)disable_strict_views
- an option to pass to gaarf-bq-view CF to disable adding the WHERE condition with a list of accounts for views (optional)
We provide several cloud functions that correlate to cli tools: gaarf, gaarf-getcids, gaarf-bq and gaarf-bq-view. Default name of main function is 'gaarf' and other ones' names based on it with suffixes. You can change the base name during deployment by supplying a --name
/-n
parameter for deploy.sh
and/or setup.sh
.
To deploy cloud functions you need to execute setup.sh
which enables required APIs, sets up permissions and does actual deployment of functions using deploy.sh
. If later on something changes in functions' code or Ads API settings you can redeploy by executing deploy.sh
.
Additionally you can customize region (-r
) and memory (-m
).
There are following functions provided:
- gaarf-getcids - fetches and returns customer account ids (CIDs) for further processing
- gaarf - the main function for executing Ads queries, correlates to the
gaarf
cli tool - gaarf-bq-view - function for creating unified views for all customer-based tables in BQ with data from Ads API
- gaarf-bq - function for executing post-processing queries in BigQuery, correlates to the
gaarf-bq
cli tool
Please note that you need to copy your google-ads.yaml
somewhere where functions (gaarf and gaarf-getcids) can find it. Preferably onto Cloud Storage. Then you should provide ads_config_path
query argument or ADS_CONFIG
environment variable with a path to the Ads config (e.g. gs://myproject/path/to/google-ads.yaml). As a last resort the function will search for google-ads.yaml
locally, so we can just copy your config to function' folder before deployment.
Alternatively you can provide all configuration values for Ads API via environment variables (you can supply env vars via --env-vars-file env.yaml
cli option for gcloud tool - you'll need to adapt the deploy.sh
for this):
- DEVELOPER_TOKEN,
- LOGIN_CUSTOMER_ID
- CLIENT_ID
- CLIENT_SECRET
- REFRESH_TOKEN
Normally your functions are called by Cloud Workflows workflow and you don't need to bother about their parameters'.
Note that actual functions names are chosen on deployment. Here they are mentioned under the default ones.
Body:
script
- an object with two fieldsquery
andname
, wherequery
contains an Ads query text (as alternative to providing it via thescript_path
query argument) andname
is a name (normally base file name would use) used for target table with data retuned by the scriptmacro
- an object with macros for ads querieswriter_options
- an object with additional options for a writer
Query string:
ads_config_path
- a path to Ads config, e.g. gs://bucket/path/to/google-ads.yaml, if not passed then a local file will be tried to used (should be deployed with the function)script_path
- a path to Ads query file, currently only GCS paths are supported, e.g. gs://bucket/path/to/file, it's mandatory if script/name were not provided via bodywriter
- writer to use: "bq", "json", "csv". By default - "bq" (BigQuery)customer_id
- customer id (CID), without '-', can be specified in google-ads.yaml as well, if so then can be omittedexpand_mcc
- true to expand account incustomer_id
argument. By default (if false) it also disables creating union viewsbq_project_id
- BigQuery project id for outputbq_dataset
- BiQuery dataset id for outputbq_dataset_location
- BigQuery dataset locationoutput_path
- output path for interim data (for BigQueryWriter) or generated data (Csv/Json writers)
Returns:
- A map of CID (customer account id) to row counts.
Body:
script
- an object with two fieldsquery
andname
, wherequery
contains an Ads query text (as alternative to providing it via thescript_path
query argument) andname
is a name (normally base file name would use) used for target table with data retuned by the scriptmacro
- an object with macros for queriessql
- an object with sql parameters for queries
Query string:
project_id
- a GCP project iddataset_location
- BigQuery dataset location
Returns:
- an object with
rowCount
field with a row count if the script returned data, otherwise (the script is a DDL like create view) response is empty
Query string:
ads_config_path
- a path to Ads config, same as for gaarfcustomer_id
- customer id (CID), without '-', can be specified in google-ads.yaml as well, if so then can be omittedcustomer_ids_query
- custom Ads query to filter customer accounts expanded fromcustomer_id
, same as same-name argument for gaarf cli tool. Query's first column should be a customer id (CID)customer_ids_ignore
- a list of customer ids to exclude from the resultcustomer_ids_batchsize
- a size of batches into which account ids list will be split.customer_ids_offset
- an offset in the customer ids list resulted from the seed CIDs and optional query incustomer_ids_query
, it allows to implement an external batching.flatten
- flatten the list of customer ids. Ifcustomer_ids_offset
is provided then the list will be a subset of CIDs otherwise it will be the whole list of accounts, ignoring batching (regadless of the customer_ids_batchsize's value) Body:customer_ids_query
- same as QueryString's argument as an alternative
Returns:
if no flatten
specifiedd then the CF returns an object:
- accounts - array of arrays with CIDs - i.e. CIDs grouped in batches, but if customer_ids_offset provided then there will be only one batch e.g. [1, 2, ..., 10_000] => [ [1, 2, ..., 5_000], [5_001, 10_000] ]
- batchCount - number of batches (items in root array)
- batchSize - number of ids per batch (items in nested arrays)
If flatten
provided then respose is simply an array with CIDs
Query string:
project_id
- a GCP project iddataset
- BiQuery dataset iddataset_location
- BigQuery dataset locationscript_path
- a path to Ads query file to get table id from (file base name will be used)table
- table id, not needed ifscript_path
is specified
For deployment we recommend using our interactive generator create-gaarf-wf. It greatly simplifies deployment. As a result it creates a bunch of shell scripts that reviewed here.
Common deployment scripts are:
deploy-queries.sh
- copy queries and google-ads.yaml to GCS (where they will be picked up by the workflow during execution)deploy-wf.sh
- deploy Cloud functions and Cloud workflowschedule-wf.sh
- create a schedule job with parametersrun-wf.sh
- execute workflow synchronously with the same parameters as the job
For deployment of all components you can use the setup.sh script with an argument with a name of your project.
Providing you supplied a name "myproject" the script will deploy functions with base name "myproject" ("myproject-bq", "myproject-getcids", "myproject-bq-view") and a workflow "myproject-wf". You can always customize names and other settings for components by adjusting setup.sh
scripts in components' folders.