This repository contains a Google Cloud Function that runs TabPFN classification models to categorize transactions. It's designed to work with Google Sheets data and provides a simple API for classification tasks.
priorlabs.-.banking.transactions.demo.mp4
TabPFN Cloud Function is built to:
- Process transaction data from API requests
- Apply machine learning (TabPFN) to classify transactions into categories
- Return predictions via HTTP responses
- Handle authentication and rate limiting
- Support Google Cloud Storage for model storage and retrieval
┌─────────────────┐ ┌───────────────┐ ┌───────────────────┐ ┌───────────────┐
│ Google Sheets │────▶│ Apps Script │────▶│ Cloud Function │────▶│ Cloud Storage│
└─────────────────┘ └───────────────┘ └───────────────────┘ └───────────────┘
│
▼
┌───────────────────┐
│ TabPFN Client │
└───────────────────┘
cloud_function@google_apps_script/
├── .env.example.yaml # Example environment configuration
├── .env.prod # Production environment variables
├── .env.test # Test environment variables
├── .env.yaml # Current environment configuration
├── .gcloudignore # Files to exclude from deployment
├── .gitattributes # Git attributes configuration
├── .gitignore # Git ignore file
├── Code.gs # Google Apps Script integration
├── README.md # This documentation
├── cloudbuild.example.yaml # Example Cloud Build configuration
├── cloudbuild.yaml # Cloud Build configuration
├── deploy.ps1 # PowerShell deployment script
├── get_token.py # API token management utility
├── main.py # Main Cloud Function entrypoint
├── predictor.py # Transaction prediction logic
├── preprocessing.py # Data preprocessing utilities
├── requirements.txt # Python dependencies
├── models/ # Model files directory
│ ├── .gitkeep # Placeholder for git
│ └── tabpfn-client/ # TabPFN model directory
│ ├── .gitkeep # Placeholder for git
│ ├── tabpfn_model.pkl # Main TabPFN model
│ └── transformers.pkl # Model transformers
- Google Sheets: End-user interface where transactions are stored and categorized
- Apps Script (Code.gs): Google Apps Script that creates a custom menu and handles communication with the Cloud Function
- Cloud Function (main.py): HTTP endpoint that receives transaction data and returns predictions
- Cloud Storage: Stores TabPFN model files for the Cloud Function to access
- TabPFN Client (predictor.py): Core ML component that categorizes transactions using the TabPFN model
- Python 3.10
- Google Cloud Platform account
- TabPFN API token (from TabPFN)
- Google Cloud Storage bucket (for model storage)
-
Clone this repository:
git clone /~https://github.com/belalanne/tabpfn-cloud-function.git cd tabpfn-cloud-function
-
Create and activate a virtual environment:
python -m venv venv venv\Scripts\activate # Windows source venv/bin/activate # Linux/Mac
-
Install dependencies:
pip install -r requirements.txt
-
Set up environment variables by creating a
.env
file:USE_GCS=false USE_MOCK=true TABPFN_API_TOKEN=your_api_token_here GCS_BUCKET=your_gcs_bucket_name
-
Create a GCP project and enable required APIs:
- Cloud Functions API
- Cloud Build API
- Cloud Storage API
-
Set up environment variables:
- Create a
.env.yaml
file based on the provided template - Replace the placeholder values with your actual settings
- Create a
-
Deploy using gcloud:
gcloud functions deploy infer-category \ --gen2 \ --region=your-region \ --runtime=python310 \ --source=. \ --entry-point=infer_category \ --trigger-http \ --memory=2048MB \ --timeout=540s \ --env-vars-file=.env.yaml
-
Or use the provided deployment script:
./deploy.ps1
Once deployed, your function will be available at:
https://your-region-your-project.cloudfunctions.net/infer-category
Send a POST request with the following JSON structure:
{
"transactions": [
{
"date": "2023-04-15",
"description": "PAYMENT *GROCERY STORE",
"amount": -45.67,
"account": "Checking"
},
{
"date": "2023-04-16",
"description": "DIRECT DEPOSIT SALARY",
"amount": 1200.00,
"account": "Savings"
}
]
}
The function returns:
{
"success": true,
"results": [
{
"date": "2023-04-15",
"description": "PAYMENT *GROCERY STORE",
"amount": -45.67,
"account": "Checking",
"category": "Groceries",
"confidence": 0.89
},
{
"date": "2023-04-16",
"description": "DIRECT DEPOSIT SALARY",
"amount": 1200.00,
"account": "Savings",
"category": "Income",
"confidence": 0.95
}
],
"request_id": "20230415_123456_789",
"mode": "tabpfn"
}
This function integrates seamlessly with Google Sheets through the provided Apps Script. A comprehensive implementation is available in the Code.gs
file included in this repository.
- In your Google Sheet, go to Extensions > Apps Script
- Create a new script project
- Copy the contents of the
Code.gs
file from this repository into your script editor - Update the
CLOUD_FUNCTION_URL
variable at the top of the script with your deployed function URL:const CLOUD_FUNCTION_URL = 'https://your-region-your-project.cloudfunctions.net/infer-category';
- Save the script and reload your Google Sheet
Once set up, the script provides:
- A new "Transaction Categories" menu in your Google Sheet
- Automatic detection of transaction columns
- Batch processing to handle large transaction sets
- Color-coded confidence scores
- Error handling and reporting
- API usage monitoring
To use:
- Create a sheet with columns for
dateOp
,transaction_description
, andamount
- Add your transaction data
- Select "Transaction Categories" > "Predict Categories" from the menu
- View the results in the automatically created columns
The Google Sheets integration includes:
- Batch processing for large datasets
- API usage tracking and limits display
- Detailed error reporting
- Conditional formatting for confidence scores
- JSON response viewing for debugging
The TabPFN Cloud Function relies on model files that need to be accessible to the function at runtime. There are two approaches:
-
Create a Google Cloud Storage bucket:
gsutil mb -l LOCATION gs://YOUR_BUCKET_NAME
-
Upload the TabPFN model files to your bucket:
gsutil cp models/tabpfn-client/*.pkl gs://YOUR_BUCKET_NAME/models/tabpfn-client/
-
Configure your
.env.yaml
to use GCS:GCS_BUCKET: "your-bucket-name" USE_GCS: "true"
-
Make sure your Cloud Function has permission to access the GCS bucket (using appropriate IAM roles)
For smaller models or testing purposes, you can include the model files directly in your deployment:
- Place the model files in the
models/tabpfn-client/
directory - Configure your
.env.yaml
to not use GCS:USE_GCS: "false"
- Deploy your function normally
Note: This approach can increase cold start times and may not be suitable for large models.
Configure these environment variables for deployment:
Variable | Description | Example |
---|---|---|
GCS_BUCKET |
Google Cloud Storage bucket name | my-models-bucket |
USE_GCS |
Whether to use GCS for model storage | true or false |
USE_MOCK |
Use mock predictions for testing | true or false |
TABPFN_API_TOKEN |
API token for TabPFN | your_api_token |
- TabPFN for the underlying classification technology
- Google Cloud Platform for serverless infrastructure