A multi-agent tool that generates comprehensive company research reports. The platform uses a pipeline of AI agents to gather, curate, and synthesize information about any company.
- Multi-Source Research: Gathers data from various sources including company websites, news articles, financial reports, and industry analyses
- AI-Powered Content Filtering: Uses Tavily's relevance scoring for content curation
- Real-Time Progress Streaming: Uses WebSocket connections to stream research progress and results
- Dual Model Architecture:
- Gemini 2.0 Flash for high-context research synthesis
- GPT-4 Optimized for precise report formatting and editing
- Modern React Frontend: Responsive UI with real-time updates, progress tracking, and download options
- Modular Architecture: Built using a pipeline of specialized research and processing nodes
The platform follows an agentic framework with specialized nodes that process data sequentially:
-
Research Nodes:
CompanyAnalyzer
: Researches core business informationIndustryAnalyzer
: Analyzes market position and trendsFinancialAnalyst
: Gathers financial metrics and performance dataNewsScanner
: Collects recent news and developments
-
Processing Nodes:
Collector
: Aggregates research data from all analyzersCurator
: Implements content filtering and relevance scoringBriefing
: Generates category-specific summaries using Gemini 2.0 FlashEditor
: Compiles and formats the briefings into a final report using GPT-4o-mini
The platform leverages separate models for optimal performance:
-
Gemini 2.0 Flash (
briefing.py
):- Handles high-context research synthesis tasks
- Excels at processing and summarizing large volumes of data
- Used for generating initial category briefings
- Efficient at maintaining context across multiple documents
-
GPT-4o-mini Optimized (
editor.py
):- Specializes in precise formatting and editing tasks
- Handles markdown structure and consistency
- Superior at following exact formatting instructions
- Used for:
- Final report compilation
- Content deduplication
- Markdown formatting
- Real-time report streaming
This approach combines Gemini's strength in handling large context windows with GPT-4o-mini's precision in following specific formatting instructions.
The platform uses a content filtering system in curator.py
:
-
Relevance Scoring:
- Documents are scored by Tavily's AI-powered search
- A minimum threshold (default 0.4) is required to proceed
- Scores reflect relevance to the specific research query
- Higher scores indicate better matches to the research intent
-
Document Processing:
- Content is normalized and cleaned
- URLs are deduplicated and standardized
- Documents are sorted by relevance scores
- Real-time progress updates are sent via WebSocket
The platform implements a WebSocket-based real-time communication system:
-
Backend Implementation:
- Uses FastAPI's WebSocket support
- Maintains persistent connections per research job
- Sends structured status updates for various events:
await websocket_manager.send_status_update( job_id=job_id, status="processing", message=f"Generating {category} briefing", result={ "step": "Briefing", "category": category, "total_docs": len(docs) } )
-
Frontend Integration:
- React components subscribe to WebSocket updates
- Updates are processed and displayed in real-time
- Different UI components handle specific update types:
- Query generation progress
- Document curation statistics
- Briefing completion status
- Report generation progress
-
Status Types:
query_generating
: Real-time query creation updatesdocument_kept
: Document curation progressbriefing_start/complete
: Briefing generation statusreport_chunk
: Streaming report generationcuration_complete
: Final document statistics
The easiest way to get started is using the setup script:
- Clone the repository:
git clone /~https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research
- Make the setup script executable and run it:
chmod +x setup.sh
./setup.sh
The setup script will:
- Check for required Python and Node.js versions
- Optionally create a Python virtual environment (recommended)
- Install all dependencies (Python and Node.js)
- Guide you through setting up your environment variables
- Optionally start both backend and frontend servers
You'll need the following API keys ready:
- Tavily API Key
- Google Gemini API Key
- OpenAI API Key
- MongoDB URI (optional)
If you prefer to set up manually, follow these steps:
- Clone the repository:
git clone /~https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research
- Install backend dependencies:
# Optional: Create and activate virtual environment
python -m venv .venv
source .venv/bin/activate
# Install Python dependencies
pip install -r requirements.txt
- Install frontend dependencies:
cd ui
npm install
- Create a
.env
file with your API keys:
TAVILY_API_KEY=your_tavily_key
GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key
# Optional: Enable MongoDB persistence
# MONGODB_URI=your_mongodb_connection_string
The application can be run using Docker and Docker Compose:
- Clone the repository:
git clone /~https://github.com/pogjester/tavily-company-research.git
cd tavily-company-research
- Create a
.env
file with your API keys:
TAVILY_API_KEY=your_tavily_key
GEMINI_API_KEY=your_gemini_key
OPENAI_API_KEY=your_openai_key
# Optional: Enable MongoDB persistence
# MONGODB_URI=your_mongodb_connection_string
- Build and start the containers:
docker compose up --build
This will start both the backend and frontend services:
- Backend API will be available at
http://localhost:8000
- Frontend will be available at
http://localhost:5174
To stop the services:
docker compose down
Note: When updating environment variables in .env
, you'll need to restart the containers:
docker compose down && docker compose up
- Start the backend server (choose one):
# Option 1: Direct Python Module
python -m application.py
# Option 2: FastAPI with Uvicorn
uvicorn application:app --reload --port 8000
- In a new terminal, start the frontend:
cd ui
npm run dev
- Access the application at
http://localhost:5173
-
Start the backend server (choose one option):
Option 1: Direct Python Module
python -m application.py
Option 2: FastAPI with Uvicorn
# Install uvicorn if not already installed pip install uvicorn # Run the FastAPI application with hot reload uvicorn application:app --reload --port 8000
The backend will be available at:
- API Endpoint:
http://localhost:8000
- WebSocket Endpoint:
ws://localhost:8000/research/ws/{job_id}
- API Endpoint:
-
Start the frontend development server:
cd ui npm run dev
-
Access the application at
http://localhost:5173
The application can be deployed to various cloud platforms. Here are some common options:
-
Install the EB CLI:
pip install awsebcli
-
Initialize EB application:
eb init -p python-3.11 tavily-research
-
Create and deploy:
eb create tavily-research-prod
- Docker: The application includes a Dockerfile for containerized deployment
- Heroku: Deploy directly from GitHub with the Python buildpack
- Google Cloud Run: Suitable for containerized deployment with automatic scaling
Choose the platform that best suits your needs. The application is platform-agnostic and can be hosted anywhere that supports Python web applications.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Tavily for the research API
- Google Gemini for the text generation model
- All other open-source libraries and their contributors