Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provision cross-device FL cloud infrastructure #336

Merged
merged 48 commits into from
Dec 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
68d88e8
feat: Adding exclusion of Terraform generated files
laurentgrangeau Nov 7, 2023
accd698
feat: Add Tensorboard Dockerfile
laurentgrangeau Nov 7, 2023
adbad26
feat: Add server Dockerfile
laurentgrangeau Nov 7, 2023
eeeb68e
feat: Add client Dockerfile
laurentgrangeau Nov 7, 2023
b0f8926
fix: Superlinter errors
laurentgrangeau Nov 7, 2023
3277b92
feat: Add manifests for deploying cross-device example
laurentgrangeau Nov 14, 2023
568551b
feat: Add TF script for cross-device example
laurentgrangeau Nov 14, 2023
e836e90
fix: Remove unwanted folder
laurentgrangeau Nov 14, 2023
37af830
fix: Moving manifests in a separate PR
laurentgrangeau Nov 14, 2023
1f64d3d
fix: Remove test file
laurentgrangeau Nov 14, 2023
2f491ae
Merge branch 'main' into cross-device-fl
laurentgrangeau Nov 14, 2023
8a23d18
feat: Make cross-device an optional module
laurentgrangeau Nov 14, 2023
be17718
Merge branch 'cross-device-fl' of /~https://github.com/GoogleCloudPlatf…
laurentgrangeau Nov 14, 2023
aa0e5bb
fix: Move to a module under the folder
laurentgrangeau Nov 14, 2023
6e76395
feat: Finalize cross-device module
laurentgrangeau Nov 14, 2023
0276a49
fix: Superlinter errors
laurentgrangeau Nov 14, 2023
71008ca
fix: Superlinter errors
laurentgrangeau Nov 14, 2023
4b6ca93
Merge branch 'main' into cross-device-fl
laurentgrangeau Nov 15, 2023
d671043
fix: PR comments
laurentgrangeau Nov 15, 2023
f85fe91
fix: PR comments
laurentgrangeau Nov 15, 2023
553a386
fix: Review of the PR
laurentgrangeau Nov 15, 2023
e4c3f1e
Merge branch 'main' into cross-device-fl
laurentgrangeau Nov 16, 2023
eb6fd5d
fix: PR comments
laurentgrangeau Nov 16, 2023
0c50238
fix: Remove gitignore tfvars
laurentgrangeau Nov 16, 2023
dd4624d
fix: Superlinter errors
laurentgrangeau Nov 16, 2023
3c91ee3
fix: PR comments
laurentgrangeau Nov 17, 2023
6db1900
fix: PR comments
laurentgrangeau Nov 17, 2023
0efd503
fix: PR comments
laurentgrangeau Nov 20, 2023
da80dd3
fix: Unused variable
laurentgrangeau Nov 20, 2023
8882783
fix: PR comments
laurentgrangeau Nov 27, 2023
d542df0
fix: Roles for SA in WI
laurentgrangeau Nov 27, 2023
617afa2
fix: Don't analyze SQL
laurentgrangeau Nov 27, 2023
27fa931
fix: Typo
laurentgrangeau Nov 27, 2023
3e7adbb
feat: Add roles to SA in namespace
laurentgrangeau Nov 30, 2023
1c39775
fix: Bugs
laurentgrangeau Nov 30, 2023
1fcc9f1
fix: Superlinter
laurentgrangeau Nov 30, 2023
c544812
fix: PR reviews
laurentgrangeau Nov 30, 2023
c7da50a
fix: SA namespace
laurentgrangeau Nov 30, 2023
ef89730
feat: Add instructions in the README
laurentgrangeau Nov 30, 2023
54dc3f2
Merge branch 'main' into cross-device-fl
laurentgrangeau Nov 30, 2023
311f983
fix: PR comments
laurentgrangeau Dec 4, 2023
f5b2e9d
fix: Rewrite README.md
laurentgrangeau Dec 4, 2023
2e4d78f
fix: Refactor README.md
laurentgrangeau Dec 4, 2023
935c699
fix: PR comments
laurentgrangeau Dec 5, 2023
9d2a443
Merge branch 'main' into cross-device-fl
laurentgrangeau Dec 5, 2023
c108fb0
fix: Move prerequisites
laurentgrangeau Dec 5, 2023
dadcb1f
Merge branch 'cross-device-fl' of /~https://github.com/GoogleCloudPlatf…
laurentgrangeau Dec 5, 2023
bb3b387
fix: Remove minimum nodes
laurentgrangeau Dec 5, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 11 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
# Local .terraform directories
**/.terraform/*

# .tfstate files
*.tfstate
*.tfstate.*

# Crash log files
crash.log
crash.*.log

__pycache__/
.Python
build/
Expand All @@ -6,8 +17,6 @@ build/
target/
venv/

.terraform
*.tfstate*
*.out

tmp/
Expand Down
2 changes: 2 additions & 0 deletions .sqlfluffignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Ignore Spanner DDL file as SQlFluff can't handle GoogleSQL yet
terraform/cross-device/files/spanner.ddl.sql
49 changes: 49 additions & 0 deletions terraform/cross-device/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Cross-device Federated Learning

This module is an example of an end to end demo for cross-device Federated Learning. This example deploys 6 different workloads:
ferrarimarco marked this conversation as resolved.
Show resolved Hide resolved
- `aggregator`: this is a job that reads device gradients and calculates aggregated result with Differential Privacy
- `collector`: this is a job that runs periodically to query active task and encrypted gradients, resulting in deciding when to kick off aggregating
- `modelupdater`: this is a job that listens to events and publishes results so that device can download
- `task-assignment`: this is a front end service that distributes training tasks to devices
- `task-management`: this is a job that manages tasks
- `task-scheduler`: this is a job that either runs periodically or is triggered by some events

## Infrastructure

It creates:
- A spanner instance for storing the status of training
- Pubsub topics that act as buses for messages between microservices
- Buckets for storing the trained models

ferrarimarco marked this conversation as resolved.
Show resolved Hide resolved
### Prerequisites

- A POSIX-compliant shell
- Git (tested with version 2.41)
- Docker (tested with version 20.10.21)

### Deploy the blueprint

This example builds on top of the infrastructure that the
[blueprint provides](../../../../README.md), and follows the best practices the
blueprint establishes.

To deploy this solution with end-to-end confidentiality:
- Set the `cross_device` Terraform variable to `true`
- Set the `enable_confidential_nodes` Terraform variable to `true`
- Set the `cluster_tenant_pool_machine_type` Terraform variable to `n2d-standard-8`
- Set the `cross_device_workloads_kubernetes_namespace` Terraform variable to prepare the namespace for future deployments

### Containers running in different namespaces, in the same GKE cluster

1. Provision infrastructure by following the instructions in the [main README](../../../../README.md).
1. From Cloud Shell, change the working directory to the `terraform` directory.
1. Initialize the following Terraform variables:

```hcl
enable_confidential_nodes = true
cluster_tenant_pool_machine_type = "n2d-standard-4"
cluster_default_pool_machine_type = "n2d-standard-4"
cross_device = true
```

1. Run `terraform apply`, and wait for Terraform to complete the provisioning process.
20 changes: 20 additions & 0 deletions terraform/cross-device/files/spanner.ddl.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
CREATE TABLE Task(PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, TotalIteration INT64, MinAggregationSize INT64, MaxAggregationSize INT64, Status INT64, CreatedTime TIMESTAMP, StartTime TIMESTAMP, StopTime TIMESTAMP, StartTaskNoEarlierThan TIMESTAMP, DoNotCreateIterationAfter TIMESTAMP, MaxParallel INT64, CorrelationId STRING(MAX), MinClientVersion STRING(32), MaxClientVersion STRING(32)) PRIMARY KEY(PopulationName,TaskId)
CREATE INDEX TaskMinCorrelationIdIndex ON Task(CorrelationId)
CREATE INDEX TaskMinClientVersionIndex ON Task(MinClientVersion)
CREATE INDEX TaskMaxClientVersionIndex ON Task(MaxClientVersion)
CREATE TABLE TaskStatusHistory(PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, StatusId INT64 NOT NULL, Status INT64 NOT NULL, CreatedTime TIMESTAMP NOT NULL) PRIMARY KEY(PopulationName, TaskId, StatusId), INTERLEAVE IN PARENT Task ON DELETE CASCADE
CREATE INDEX TaskStatusHistoryStatusIndex ON TaskStatusHistory(Status)
CREATE INDEX TaskStatusHistoryCreatedTimeIndex ON TaskStatusHistory(CreatedTime)
CREATE TABLE Iteration (PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, IterationId INT64 NOT NULL, AttemptId INT64 NOT NULL, Status INT64 NOT NULL, BaseIterationId INT64 NOT NULL, BaseOnResultId INT64 NOT NULL, ReportGoal INT64 NOT NULL, ExpirationTime TIMESTAMP, ResultId INT64 NOT NULL) PRIMARY KEY(PopulationName, TaskId, IterationId, AttemptId), INTERLEAVE IN PARENT Task ON DELETE CASCADE
CREATE INDEX InterationStatusIndex on Iteration(Status)
CREATE INDEX InterationExpirationTimeIndex on Iteration(ExpirationTime)
CREATE TABLE IterationStatusHistory( PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, IterationId INT64 NOT NULL, AttemptId INT64 NOT NULL, StatusId INT64 NOT NULL, Status INT64 NOT NULL, CreatedTime TIMESTAMP NOT NULL) PRIMARY KEY(PopulationName, TaskId, IterationId, AttemptId, StatusId), INTERLEAVE IN PARENT Iteration ON DELETE CASCADE
CREATE INDEX IterationStatusHistoryStatusIndex ON IterationStatusHistory(Status)
CREATE INDEX IterationtStatusHistoryCreatedTimeIndex ON IterationStatusHistory(CreatedTime)
CREATE TABLE Assignment(PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, IterationId INT64 NOT NULL, AttemptId INT64 NOT NULL, SessionId STRING(64) NOT NULL, CorrelationId STRING(MAX), Status INT64 NOT NULL, CreatedTime TIMESTAMP NOT NULL OPTIONS (allow_commit_timestamp=true)) PRIMARY KEY(PopulationName, TaskId, IterationId, AttemptId, SessionId), INTERLEAVE IN PARENT Iteration ON DELETE CASCADE
CREATE INDEX AssignmentCorrelationIdIndex ON Assignment(CorrelationId)
CREATE INDEX AssignmentCreateTimeIndex ON Assignment(CreatedTime)
CREATE INDEX AssignmentStatusIndex ON Assignment(Status)
CREATE TABLE AssignmentStatusHistory(PopulationName STRING(64) NOT NULL, TaskId INT64 NOT NULL, IterationId INT64 NOT NULL, AttemptId INT64 NOT NULL,SessionId STRING(64) NOT NULL, StatusId INT64 NOT NULL, Status INT64 NOT NULL, CreatedTime TIMESTAMP NOT NULL) PRIMARY KEY(PopulationName, TaskId, IterationId, AttemptId, SessionId, StatusId), INTERLEAVE IN PARENT Assignment ON DELETE CASCADE
CREATE INDEX AssignmentStatusHistoryStatusIndex ON AssignmentStatusHistory(Status)
CREATE INDEX AssignmentStatusHistoryCreatedTimeIndex ON AssignmentStatusHistory(CreatedTime)
27 changes: 27 additions & 0 deletions terraform/cross-device/iam.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

module "project-iam-bindings" {
source = "terraform-google-modules/iam/google//modules/projects_iam"
version = "7.6.0"
projects = [data.google_project.project.project_id]

bindings = {
"roles/spanner.admin" = var.list_apps_sa_iam_emails,
"roles/logging.logWriter" = var.list_apps_sa_iam_emails,
"roles/iam.serviceAccountTokenCreator" = var.list_apps_sa_iam_emails,
"roles/storage.objectAdmin" = var.list_apps_sa_iam_emails,
"roles/pubsub.admin" = var.list_apps_sa_iam_emails
}
}
17 changes: 17 additions & 0 deletions terraform/cross-device/project.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

data "google_project" "project" {
project_id = var.project_id
}
ferrarimarco marked this conversation as resolved.
Show resolved Hide resolved
66 changes: 66 additions & 0 deletions terraform/cross-device/pubsub.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

ferrarimarco marked this conversation as resolved.
Show resolved Hide resolved
locals {
topics = {
aggregator_topic = "aggregator-${var.environment}"
modelupdater_topic = "modelupdater-${var.environment}"
}
}

module "pubsub_dl" {
for_each = local.topics
source = "terraform-google-modules/pubsub/google"
version = "6.0.0"
project_id = data.google_project.project.project_id
topic = "${each.value}-topic-dead-letter"
create_subscriptions = true
create_topic = true

pull_subscriptions = [
{
name = "${each.value}-dlq-subscription"
topic_message_retention_duration = "604800s"
retain_acked_messages = true
ack_deadline_seconds = 600
enable_exactly_once_delivery = true
service_account = "service-${data.google_project.project.number}@gcp-sa-pubsub.iam.gserviceaccount.com"
expiration_policy = ""
}
]
}

module "pubsub" {
for_each = local.topics
source = "terraform-google-modules/pubsub/google"
version = "6.0.0"
project_id = data.google_project.project.project_id
topic = "${each.value}-topic"
create_subscriptions = true
create_topic = true

pull_subscriptions = [
{
name = "${each.value}-subscription"
dead_letter_topic = "${var.project_id}/topics/${each.value}-topic-dead-letter"
topic_message_retention_duration = "604800s"
retain_acked_messages = true
ack_deadline_seconds = 600
max_delivery_attempts = 10
enable_exactly_once_delivery = true
service_account = "service-${data.google_project.project.number}@gcp-sa-pubsub.iam.gserviceaccount.com"
expiration_policy = ""
}
]
}
25 changes: 25 additions & 0 deletions terraform/cross-device/services.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

module "project-services" {
source = "terraform-google-modules/project-factory/google//modules/project_services"
version = "14.3.0"

project_id = var.project_id
disable_services_on_destroy = false
activate_apis = [
"pubsub.googleapis.com",
"spanner.googleapis.com"
]
}
35 changes: 35 additions & 0 deletions terraform/cross-device/spanner.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

locals {
file_contents = file("${path.module}/files/spanner.ddl.sql")
string_list = split("\n", local.file_contents)
}

resource "google_spanner_instance" "fcp_task_spanner_instance" {
name = "fcp-task-${var.environment}"
display_name = "fcp-task-${var.environment}"
project = data.google_project.project.project_id
config = var.spanner_instance_config
processing_units = var.spanner_processing_units
}

resource "google_spanner_database" "fcp_task_spanner_database" {
instance = google_spanner_instance.fcp_task_spanner_instance.name
name = "fcp-task-db-${var.environment}"
project = data.google_project.project.project_id
version_retention_period = var.spanner_database_retention_period
deletion_protection = var.spanner_database_deletion_protection
ddl = local.string_list
}
53 changes: 53 additions & 0 deletions terraform/cross-device/storage.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

module "buckets" {
source = "terraform-google-modules/cloud-storage/google"
version = "5.0.0"
project_id = data.google_project.project.project_id
location = var.region
prefix = "fcp-${var.environment}"
names = ["model-0", "aggregated-gradient-0", "client-gradient-0"]
force_destroy = {
model-0 = var.model_bucket_force_destroy,
aggregated-gradient-0 = var.aggregated_gradient_bucket_force_destroy
client-gradient-0 = var.client_gradient_bucket_force_destroy
}
versioning = {
model-0 = var.model_bucket_versioning
aggregated-gradient-0 = var.aggregated_gradient_bucket_versioning
client-gradient-0 = var.client_gradient_bucket_versioning
}
public_access_prevention = "enforced"
bucket_policy_only = {
model-0 = true
aggregated-gradient-0 = true
client-gradient-0 = true
}
lifecycle_rules = [{
action = {
type = "Delete"
}
condition = {
age = 60
}
}, {
action = {
type = "Delete"
}
condition = {
days_since_noncurrent_time = 10
}
}]
}
Loading