Skip to content

Commit

Permalink
feat: sic metadata gen and ingestion (#34)
Browse files Browse the repository at this point in the history
Co-authored-by: Balaji Subramaniam <balajismaniam@google.com>
  • Loading branch information
balajismaniam and Balaji Subramaniam authored Jul 12, 2023
1 parent f6e2b7d commit ccfe4a4
Show file tree
Hide file tree
Showing 7 changed files with 253 additions and 48 deletions.
66 changes: 24 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,58 +1,46 @@
# terraform-google-gen-ai-document-summarization
# Generative AI Document Summarization

## Description
### Tagline
This is an auto-generated module.
Create summaries of a large corpus of documents using Generative AI.

### Detailed
This module was generated from [terraform-google-module-template](/~https://github.com/terraform-google-modules/terraform-google-module-template/), which by default generates a module that simply creates a GCS bucket. As the module develops, this README should be updated.

The resources/services/activations/deletions that this module will create/trigger are:

- Create a GCS bucket with the provided name
This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an
end-to-end demonstration of document summarization going all the way from raw documents, detecting text
in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical
Character Recognition (OCR) and BigQuery.

### PreDeploy
To deploy this blueprint you must have an active billing account and billing permissions.

## Architecture
![alt text for diagram](https://www.link-to-architecture-diagram.com)
1. Architecture description step no. 1
2. Architecture description step no. 2
3. Architecture description step no. N
![Document Summarization using Generative AI](https://www.gstatic.com/pantheon/images/solutions/gen_ai_document_summarization_architecture_v1.svg)
1. The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
2. The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
3. The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
4. The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
5. The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
6. The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
7. As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage
bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
8. As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.

## Documentation
- [Hosting a Static Website](https://cloud.google.com/storage/docs/hosting-static-website)
- [Generative AI Document Summary](https://cloud.google.com/architecture/ai-ml/generative-ai-document-summarization)

## Deployment Duration
Configuration: X mins
Deployment: Y mins
Configuration: 1 mins
Deployment: 10 mins

## Cost
[Blueprint cost details](https://cloud.google.com/products/calculator?id=02fb0c45-cc29-4567-8cc6-f72ac9024add)

## Usage

Basic usage of this module is as follows:

```hcl
module "gen_ai_document_summarization" {
source = "terraform-google-modules/gen-ai-document-summarization/google"
version = "~> 0.1"
project_id = "<PROJECT ID>"
bucket_name = "gcs-test-bucket"
}
```

Functional examples are included in the
[examples](./examples/) directory.
[Cost Details](https://cloud.google.com/products/calculator/#id=78888c9b-02ac-4130-9327-fecd7f4cfb11)

<!-- BEGINNING OF PRE-COMMIT-TERRAFORM DOCS HOOK -->
## Inputs

| Name | Description | Type | Default | Required |
|------|-------------|------|---------|:--------:|
| bucket\_name | The name of the bucket to create | `string` | n/a | yes |
| bucket\_name | The name of the bucket to create | `string` | `"genai-doc-summary-webhook-1234"` | no |
| gcf\_timeout\_seconds | GCF execution timeout | `number` | `900` | no |
| project\_id | The Google Cloud project ID to deploy to | `string` | n/a | yes |
| region | Google Cloud region | `string` | `"us-central1"` | no |
Expand All @@ -64,7 +52,8 @@ Functional examples are included in the

| Name | Description |
|------|-------------|
| bucket\_name | Name of the bucket |
| genai\_doc\_summary\_colab\_url | The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution |
| neos\_walkthrough\_url | The URL to launch the in-console tutorial for the Generative AI Document Summarization solution |

<!-- END OF PRE-COMMIT-TERRAFORM DOCS HOOK -->

Expand All @@ -86,23 +75,16 @@ the resources of this module:

- Storage Admin: `roles/storage.admin`

The [Project Factory module][project-factory-module] and the
[IAM module][iam-module] may be used in combination to provision a
service account with the necessary roles applied.

### APIs

A project with the following APIs enabled must be used to host the
resources of this module:

- Google Cloud Storage JSON API: `storage-api.googleapis.com`

The [Project Factory module][project-factory-module] can be used to
provision a project with the necessary APIs enabled.

## Contributing

Refer to the [contribution guidelines](./CONTRIBUTING.md) for
Refer to the [contribution guidelines](./docs/CONTRIBUTING.md) for
information on contributing to this module.

[iam-module]: https://registry.terraform.io/modules/terraform-google-modules/iam/google
Expand Down
50 changes: 50 additions & 0 deletions metadata.display.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: blueprints.cloud.google.com/v1alpha1
kind: BlueprintMetadata
metadata:
name: terraform-genai-doc-summarization-display
annotations:
config.kubernetes.io/local-config: "true"
spec:
info:
title: Generative AI Document Summarization
source:
repo: /~https://github.com/balajismaniam/terraform-genai-doc-summarization.git
sourceType: git
ui:
input:
variables:
bucket_name:
name: bucket_name
title: Bucket Name
gcf_timeout_seconds:
name: gcf_timeout_seconds
title: GCF Timeout Seconds
project_id:
name: project_id
title: Project Id
region:
name: region
title: Region
time_to_enable_apis:
name: time_to_enable_apis
title: Time To Enable Apis
webhook_name:
name: webhook_name
title: Webhook Name
webhook_path:
name: webhook_path
title: Webhook Path
146 changes: 146 additions & 0 deletions metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

apiVersion: blueprints.cloud.google.com/v1alpha1
kind: BlueprintMetadata
metadata:
name: terraform-genai-doc-summarization
annotations:
config.kubernetes.io/local-config: "true"
spec:
info:
title: Generative AI Document Summarization
source:
repo: /~https://github.com/balajismaniam/terraform-genai-doc-summarization.git
sourceType: git
version: 0.0.1
actuationTool:
flavor: Terraform
version: '>= 0.13'
description:
tagline: Create summaries of a large corpus of documents using Generative AI.
detailed: |-
This solution showcases how to summarize a large corpus of documents using Generative AI. It provides an
end-to-end demonstration of document summarization going all the way from raw documents, detecting text
in the documents and summarizing the documents on-demand using Vertex AI LLM APIs, Cloud Vision Optical
Character Recognition (OCR) and BigQuery.
preDeploy: To deploy this blueprint you must have an active billing account and billing permissions.
icon: assets/icon.png
deploymentDuration:
configurationSecs: 60
deploymentSecs: 600
costEstimate:
description: Cost Details
url: https://cloud.google.com/products/calculator/#id=78888c9b-02ac-4130-9327-fecd7f4cfb11
cloudProducts:
- productId: VERTEX_SECTION
pageUrl: ""
label: Vertex AI
- productId: VISION_SECTION
pageUrl: ""
label: Vertex AI Vision
- productId: search_BIGQUERY_SECTION
pageUrl: ""
label: BigQuery
- productId: FUNCTIONS_SECTION
pageUrl: ""
label: Cloud Functions
- productId: STORAGE_SECTION
pageUrl: ""
label: Cloud Storage
content:
architecture:
diagramUrl: https://www.gstatic.com/pantheon/images/solutions/gen_ai_document_summarization_architecture_v1.svg
description:
- 1. The developer follows a tutorial on a Jupyter Notebook, where they upload a PDF — either through Vertex AI Workbench or Colaboratory.
- 2. The uploaded PDF file is sent to a function running on Cloud Functions. This function handles PDF file processing.
- 3. The Cloud Functions function uses Cloud Vision to extract all text from the PDF file.
- 4. The Cloud Functions function stores the extracted text inside a Cloud Storage bucket.
- 5. The Cloud Functions function uses Vertex AI’s LLM API to summarize the extracted text.
- 6. The Cloud Functions function stores the text summaries of PDFs in BigQuery tables.
- 7. As an alternative to uploading PDF files through Jupyter Notebook, the developer can upload a PDF file directly to a Cloud Storage
- bucket — for instance, through the Console UI or gcloud. This upload triggers Eventarc to begin the Document Processing phase.
- 8. As a result of the direct upload to Cloud Storage, Eventarc triggers the Document Processing phase, handled by Cloud Functions.
documentation:
- title: Generative AI Document Summary
url: https://cloud.google.com/architecture/ai-ml/generative-ai-document-summarization
examples:
- name: simple_example
location: examples/simple_example
interfaces:
variables:
- name: bucket_name
description: The name of the bucket to create
varType: string
defaultValue: genai-doc-summary-webhook-1234
- name: gcf_timeout_seconds
description: GCF execution timeout
varType: number
defaultValue: 900
- name: project_id
description: The Google Cloud project ID to deploy to
varType: string
required: true
- name: region
description: Google Cloud region
varType: string
defaultValue: us-central1
- name: time_to_enable_apis
description: Wait time to enable APIs in new projects
varType: string
defaultValue: 180s
- name: webhook_name
description: Name of the webhook
varType: string
defaultValue: webhook
- name: webhook_path
description: Path to the webhook directory
varType: string
defaultValue: webhook
outputs:
- name: genai_doc_summary_colab_url
description: The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution
- name: neos_walkthrough_url
description: The URL to launch the in-console tutorial for the Generative AI Document Summarization solution
requirements:
roles:
- level: Project
roles:
- roles/aiplatform.admin
- roles/artifactregistry.reader
- roles/bigquery.admin
- roles/cloudfunctions.admin
- roles/eventarc.admin
- roles/iam.serviceAccountAdmin
- roles/iam.serviceAccountUser
- roles/logging.admin
- roles/pubsub.admin
- roles/resourcemanager.projectIamAdmin
- roles/run.admin
- roles/serviceusage.serviceUsageAdmin
- roles/storage.admin
services:
- aiplatform.googleapis.com
- artifactregistry.googleapis.com
- bigquery.googleapis.com
- cloudbuild.googleapis.com
- cloudfunctions.googleapis.com
- cloudresourcemanager.googleapis.com
- eventarc.googleapis.com
- iam.googleapis.com
- run.googleapis.com
- serviceusage.googleapis.com
- storage-api.googleapis.com
- storage.googleapis.com
- vision.googleapis.com
11 changes: 8 additions & 3 deletions outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@
* limitations under the License.
*/

output "bucket_name" {
description = "Name of the bucket"
value = google_storage_bucket.main.name
output "neos_walkthrough_url" {
value = "https://console.cloud.google.com/products/solutions/deployments?walkthrough_id=panels--sic--document-summarization-gcf_tour"
description = "The URL to launch the in-console tutorial for the Generative AI Document Summarization solution"
}

output "genai_doc_summary_colab_url" {
value = "https://colab.research.google.com/github/GoogleCloudPlatform/terraform-google-gen-ai-document-summarization/blob/main/notebook/gen_ai_jss.ipynb"
description = "The URL to launch the notebook tutorial for the Generateive AI Document Summarization Solution"
}
15 changes: 13 additions & 2 deletions test/setup/iam.tf
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,19 @@

locals {
int_required_roles = [
#TODO: Pare down the roles.
"roles/owner"
"roles/aiplatform.admin",
"roles/artifactregistry.reader",
"roles/bigquery.admin",
"roles/cloudfunctions.admin",
"roles/eventarc.admin",
"roles/iam.serviceAccountAdmin",
"roles/iam.serviceAccountUser",
"roles/logging.admin",
"roles/pubsub.admin",
"roles/resourcemanager.projectIamAdmin",
"roles/run.admin",
"roles/serviceusage.serviceUsageAdmin",
"roles/storage.admin",
]
}

Expand Down
12 changes: 11 additions & 1 deletion test/setup/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,18 @@ module "project" {
billing_account = var.billing_account

activate_apis = [
"aiplatform.googleapis.com",
"artifactregistry.googleapis.com",
"bigquery.googleapis.com",
"cloudbuild.googleapis.com",
"cloudfunctions.googleapis.com",
"cloudresourcemanager.googleapis.com",
"eventarc.googleapis.com",
"iam.googleapis.com",
"run.googleapis.com",
"serviceusage.googleapis.com",
"storage-api.googleapis.com",
"serviceusage.googleapis.com"
"storage.googleapis.com",
"vision.googleapis.com",
]
}
1 change: 1 addition & 0 deletions variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ variable "project_id" {
variable "bucket_name" {
description = "The name of the bucket to create"
type = string
default = "genai-doc-summary-webhook-1234"
}

variable "region" {
Expand Down

0 comments on commit ccfe4a4

Please sign in to comment.