Skip to content

Commit

Permalink
Merge pull request #142 from raju-rangan/main
Browse files Browse the repository at this point in the history
feat: Orchestrate an intelligent document processing workflow using tool-use on Amazon Bedrock
  • Loading branch information
jonathancaevans authored Jan 15, 2025
2 parents 101bb16 + ca0a261 commit 4c2365c
Show file tree
Hide file tree
Showing 19 changed files with 1,872 additions and 0 deletions.
10 changes: 10 additions & 0 deletions medical-idp/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.idea
grobid/__pycache__
__pycache__
.venv
.vscode
temp/*
*.DS_Store
*/.ipynb_checkpoints
.ipynb_checkpoints/*
Untitled*
8 changes: 8 additions & 0 deletions medical-idp/.streamlit/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
[logger]
level = "info"

[browser]
gatherUsageStats = true

[ui]
hideTopBar = true
52 changes: 52 additions & 0 deletions medical-idp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Orchestrate an intelligent document processing workflow using tool-use on Amazon Bedrock

## Solution Overview

This intelligent document processing solution leverages Amazon Bedrock to orchestrate a sophisticated workflow for handling multi-page healthcare documents with mixed content types. At the core of this solution is Amazon Bedrock's Converse API with its powerful tool-use capabilities, which enables foundation models to interact with external functions and APIs as part of their response generation.

The solution employs a strategic multi-model approach, optimizing for both performance and cost by selecting the most appropriate model for each task:

* **Claude 3 Haiku**: Serves as the workflow orchestrator due to its lower latency and cost-effectiveness. Its strong reasoning and tool-use abilities make it ideal for:
- Coordinating the overall document processing pipeline
- Making routing decisions for different document types
- Invoking appropriate processing functions
- Managing the workflow state

* **Claude 3.5 Sonnet (v2)**: For vision-intensive tasks where its superior visual reasoning capabilities excels at:
- Interpreting complex document layouts and structure
- Extracting text from tables and forms
- Processing medical charts and handwritten notes
- Converting unstructured visual information into structured data

![Orchestration Flow](static/flow_diagram.webp)

## Use Case and Dataset

For our example use case, we'll examine a patient intake process at a healthcare institution. The workflow processes a patient health information package containing three distinct document types that demonstrate the varying complexity in document processing:

1. **Structured Document**: A new patient intake form with standardized fields for personal information, medical history, and current symptoms. This form follows a consistent layout with clearly defined fields and checkboxes, making it an ideal example of a structured document.

2. **Semi-structured Document**: A health insurance card that contains essential coverage information. While insurance cards generally contain similar information (policy number, group ID, coverage dates), they come from different providers with varying layouts and formats, showing the semi-structured nature of these documents.

3. **Unstructured Document**: A handwritten doctor's note from an initial consultation, containing free-form observations, preliminary diagnoses, and treatment recommendations. This represents the most challenging category of unstructured documents, where information isn't confined to any predetermined format or structure.

The example document can be downloaded [here](docs/new-patient-registration.pdf).

## Solution Setup
1. Setup an Amazon SageMaker Domain using the instruction in the [quick setup guide](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html)
2. Launch the Studio. Then create and launch a JupyterLab space using the instruction in the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide-create-space.html)
3. Follow instructions in the documentation to [create a guardrail](https://docs.aws.amazon.com/bedrock/latest/userguide/guardrails-create.html). Focus on adding “Sensitive Information Filters” that would mask personally identifiable information (PII) or personal health information (PIH).
4. Clone the code from the aws-samples GitHub repository
`git clone <repo-url>`
5. Change directory to the root of the cloned repository by running
`cd medical-idp`
6. Install dependencies by running
`pip install -r requirements.txt`
7. Update setup.sh with the guardrail ID you created in step 3. Then set the ENV variable by running
`source setup.sh`
8. Finally, start the Streamlit application by running
`streamlit run app.py`

Now you are ready to explore the intelligent document processing workflow using Amazon Bedrock.

> ⚠️ **WARNING**: This codebase demonstrates intelligent document processing capabilities using Claude models and references medical documents as an example. Any medical or healthcare-related analysis, diagnosis, or decision-making without proper review and validation by qualified medical professionals is done at your own risk. Neither AWS nor the authors assume any liability for such use.
Empty file added medical-idp/config.py
Empty file.
Binary file added medical-idp/docs/new-patient-registration.pdf
Binary file not shown.
11 changes: 11 additions & 0 deletions medical-idp/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
pyarrow
python-dotenv
beautifulsoup4
lxml
grobid-client-python
watchdog
streamlit-pdf-viewer
requests
PyMuPDF
Pillow
boto3
16 changes: 16 additions & 0 deletions medical-idp/setup.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
#!/bin/bash

# Set the Guardrail ID
export GUARDRAIL_ID="<TODO: ENTER GUARDRAIL ID HERE>"

# Set the Guardrail Version
export GUARDRAIL_VERSION="DRAFT" # Change this to use a specific version of the guardrail

# Print the values to confirm
echo "GUARDRAIL_ID set to: $GUARDRAIL_ID"
echo "GUARDRAIL_VERSION set to: $GUARDRAIL_VERSION"

# Optionally, you can add more environment variables here if needed

# Note: Running this script directly won't affect your current shell session.
# You need to source it for the variables to be available in your current session.
Binary file added medical-idp/static/favicon.ico
Binary file not shown.
Binary file added medical-idp/static/flow_diagram.webp
Binary file not shown.
148 changes: 148 additions & 0 deletions medical-idp/streamlit_app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
import os
from hashlib import blake2b
from tempfile import NamedTemporaryFile
import subprocess

import dotenv
from streamlit_pdf_viewer import pdf_viewer
from PIL import Image
dotenv.load_dotenv(override=True)
import boto3
import json
import streamlit as st

from utils.processor import FileProcessor, ToolSpec

region_name = "us-east-1"

# Create a Boto3 session
session = boto3.Session(region_name=region_name)

# Create a Bedrock client
bedrock_client = session.client("bedrock")

# Create an instance of FileProcessor
processor = FileProcessor()

title = "From Conversation to Automation"

css='''
<style>
[data-testid="column"] {
overflow-y: auto;
}
.column-2 {
max-height: 80vh;
overflow-y: auto;
}
</style>
'''


if 'tmp_file' not in st.session_state:
st.session_state['tmp_file'] = None

if 'doc_id' not in st.session_state:
st.session_state['doc_id'] = None

if 'button_enabled' not in st.session_state:
st.session_state['button_enabled'] = False

if 'binary' not in st.session_state:
st.session_state['binary'] = None

im = Image.open("static/favicon.ico")
st.set_page_config(
page_title=title,
page_icon=im,
initial_sidebar_state="expanded",
menu_items={
'About': "A demo to showcase intelligent document processing using tools in Amazon Bedrock"
},
layout="wide"
)
st.markdown(css, unsafe_allow_html=True)
with st.sidebar:
st.header("Documentation")
st.markdown("[Amazon Bedrock](https://aws.amazon.com/bedrock/)")
st.markdown(
"""Upload doctor's notes and see Anthropic Claude model's multi-modal capability to extract information""")

st.header("Inference Options")
enable_guardrails = st.toggle('Use Guardrails', value=False, disabled=not st.session_state['button_enabled'],
help="When enabled will use a gaurdrail to detect and block PII in the request and response.")
temp = st.slider(label="Temperature", min_value=0.0, max_value=1.0, step=0.1, value=0.0, help="Temperature controls the level of randomness in the model's output")
maxTokens = st.slider(label="Max Output Tokens", min_value=50, max_value=2048, value=2000, help="Output tokne controls the size of the output")

resolution_boost = st.slider(label="Resolution boost", min_value=1, max_value=10, value=1)
width = st.slider(label="PDF width", min_value=100, max_value=1000, value=800)



def new_file():
st.session_state['doc_id'] = None
st.session_state['button_enabled'] = True
st.session_state['binary'] = None
st.session_state['tmp_file'] = None

col1, col2= st.columns([6,4])

with col1:
st.title(title)
st.subheader("Connecting Foundational Models to external tools.")
process_button = st.button("Process Document", disabled=not st.session_state['button_enabled'])
uploaded_file = st.file_uploader("Upload a document",
type=("pdf"),
on_change=new_file,
help="Process mortgage applications using generative AI")

if uploaded_file:
if not st.session_state['binary']:
with (st.spinner('Reading file...')):
binary = uploaded_file.getvalue()
tmp_file = NamedTemporaryFile(suffix='.pdf', delete=False)
tmp_file.write(bytearray(binary))
st.session_state["tmp_file"] = tmp_file.name
st.session_state['binary'] = binary

with (st.spinner("Rendering PDF document")):
pdf_viewer(
input=st.session_state['binary'],
width=width,
pages_vertical_spacing=10,
resolution_boost=resolution_boost
)

with col2:
st.markdown('<div class="column-2">', unsafe_allow_html=True) # Start of scrollable column
# add a bunch of text to the second columns
st.subheader("Output")
st.markdown("Output from the Foundational Model")

if process_button:
if st.session_state['tmp_file']:
placeholder = st.empty()
with (st.spinner("Processing the document...")):
prompt_parts = []
toolspecs = [ToolSpec.DOCUMENT_PROCESSING_PIPELINE] # Always include the main tool DOCUMENT_PROCESSING_PIPELINE
toolspecs.append(ToolSpec.DOC_NOTES)
toolspecs.append(ToolSpec.NEW_PATIENT_INFO)
toolspecs.append(ToolSpec.INSURANCE_FORM)

tmp_file = st.session_state['tmp_file']

prompt = ("1. Extract 2. save and 3. summarize the information from the patient information package located at " + tmp_file + ". " +
"The package might contain various types of documents including insurance cards. Extract and save information from all documents provided. "
"Perform any preprocessing or classification of the file provided prior to the extraction." +
"Set the enable_guardrails parameter to " + str(enable_guardrails) + ". " +
"At the end, list all the tools that you had access to. Give an explantion on why each tool was used and if you are not using a tool, explain why it was not used as well" +
"Think step by step.")
processor.process_file(prompt=prompt,
placeholder=placeholder,
enable_guardrails=enable_guardrails,
temperature=temp,
maxTokens=maxTokens,
toolspecs=toolspecs)


141 changes: 141 additions & 0 deletions medical-idp/tools/document_classifier.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
import json
from utils.constants import ModelIDs
from utils.bedrockutility import BedrockUtils

UNKNOWN_TYPE = "UNK"
DOCUMENT_TYPES = ["INTAKE_FORM", "INSURANCE_CARD", "DOC_NOTES", UNKNOWN_TYPE]

class DocumentClassifier:
def __init__(self, file_handler):
self.sonnet_3_5_bedrock_utils = BedrockUtils(model_id=ModelIDs.anthropic_claude_3_5_sonnet)
self.sonnet_3_0_bedrock_utils = BedrockUtils(model_id=ModelIDs.anthropic_claude_3_sonnet)
self.haiku_bedrock_utils = BedrockUtils(model_id=ModelIDs.anthropic_claude_3_haiku)
self.meta_32_util = BedrockUtils(model_id=ModelIDs.meta_llama_32_model_id)
self.file_handler = file_handler

def classify_documents(self, input_data):
"""Classify documents based on their content."""
return self.categorize_document(input_data['document_paths'])

def categorize_document(self, file_paths):
"""
Categorize documents based on their content.
"""
try:
if len(file_paths) == 1:
# Single file handling
binary_data, media_type = self.file_handler.get_binary_for_file(file_paths[0])
if binary_data is None or media_type is None:
return []

message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data in binary_data
]
else:
# Multiple file handling
binary_data_array = []
for file_path in file_paths:
binary_data, media_type = self.file_handler.get_binary_for_file(file_path)
if binary_data is None or media_type is None:
continue
# Only use the first page for classification in multiple file case
binary_data_array.append((binary_data[0], media_type))

if not binary_data_array:
return []

message_content = [
{"image": {"format": media_type, "source": {"bytes": data}}}
for data, media_type in binary_data_array
]

message_list = [{
"role": 'user',
"content": [
*message_content,
{"text": "What types of document is in this image?"}
]
}]

# Create system message with instructions
data = {"file_paths": file_paths}
files = json.dumps(data, indent=2)
system_message = self._create_system_message(files)

response = self.sonnet_3_0_bedrock_utils.invoke_bedrock(
message_list=message_list,
system_message=system_message
)
response_message = [response['output']['message']]
return response_message

except Exception as e:
print(f"An error occurred: {str(e)}")
return []

def _create_system_message(self, files):
"""
Create a system message for document classification in a doctor's consultation package.
"""
return [{
"text": f'''
<task>
You are a medical document processing agent. You have perfect vision.
You meticulously analyze the images and categorize them based on these document types:
<document_types>INTAKE_FORM, INSURANCE_CARD, DOC_NOTES</document_types>
</task>
<input_files>
{files}
</input_files>
<instructions>
1. Categorize each file into one of the document types.
2. Use 'UNK' for unknown document types.
3. Look at all the sections on each page and associate <topics> to it.
4. For example, if `Patient Information` is on the page this will include `PATIENT_INFO`,
if `Medical History` is on this page, the value will include `MEDICAL_HISTORY`.
If one of the listed topics is not found on the page, just return UNK.
5. Only include the topics that are relevant to the particular file.
6. Ensure that there is no confusion between the section number and the file path.
7. Your output should be an array with one element per file,
and the following attributes for each element `category`, `file_path`, and `topic`.
</instructions>
<topics>
PATIENT_INFO,
MEDICAL_HISTORY,
CURRENT_MEDICATIONS,
ALLERGIES,
VITAL_SIGNS,
CHIEF_COMPLAINT,
PHYSICAL_EXAMINATION,
DIAGNOSIS,
TREATMENT_PLAN,
INSURANCE_DETAILS,
UNK
</topics>
<important>
Do not include any text outside the JSON object in your response.
Your entire response should be parseable as a single JSON object.
</important>
<example_output>
[
{{
"category": "INTAKE_FORM",
"file_path": "temporary/file/path.png",
"topics": [
"PATIENT_INFO",
"MEDICAL_HISTORY",
"CURRENT_MEDICATIONS",
"ALLERGIES"
]
}}
]
</example_output>
'''
}]

Loading

0 comments on commit 4c2365c

Please sign in to comment.