Skip to content

Open Deep Researcher with openai compatible endpoint, now completely local with ollama, local playwright via searxng with citations and planning from CoT

License

Notifications You must be signed in to change notification settings

benhaotang/OpenDeepResearcher-via-searxng

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

47 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OpenDeepResearcher via Searxng πŸ§‘β€πŸ”¬

A robust research tool that uses AI to perform comprehensive research on any topic, offering flexible deployment through Docker, or simply direct Python execution as OpenAI compatible endpoint with whatever frontend you like, can also operate pure locally with ollama integration.

OpenDeepResearcher

A demo usage using hybird(left)/online(right) model with Msty as frontend, there is also a dedicated manual for Msty settings.

πŸš€ Docker/Python Setup (Recommended)

The setup provides an OpenAI-compatible API endpoint with flexible configuration for different operation modes:

  1. Configure research.config based on your needs:
    [Settings]
    # Choose your operation mode:
    use_jina = true/false    # Use Jina API for fast web parsing
    use_ollama = true/false  # Use local Ollama models
    with_planning = true     # Enable research planning
    
    # For online mode (Maximum Speed):
    use_jina = true
    use_ollama = false
    default_model = anthropic/claude-3.5-haiku
    reason_model = deepseek/deepseek-r1-distill-qwen-32b
    
    # For hybrid mode (Balance):
    use_jina = true
    use_ollama = true
    
    # For fully local mode (Maximum Privacy):
    use_jina = false
    use_ollama = true
    default_model = mistral-small
    reason_model = deepseek-r1:14b
    
    [Concurrency]
    use_embed_browser = true/false  # Use embedded browser instead of external Chrome
    
    [API]
    openai_url = https://openrouter.ai/api/v1/chat/completions # Most OpenAI compatible endpoint
    openai_compat_api_key = your-key-here  # For API authentication
    jina_api_key = your-jina-key          # Only needed if use_jina = true
    searxng_url = http://localhost:4000/search # Default for docker setup
    # OR use a reliable public instance if you don't want to setup docker:
    # searxng_url = https://searx.perennialte.ch/search

Tip

If you want to use free models like gemini-experimental-xxx from google or on a lower usage tier, please set rate limits in research.config, read the troubleshooting section for more details. Follow disscussion here to setup online mode with free gemini api.

  1. Setup Requirements:

    • For local models (if use_ollama = true):
      ollama pull mistral-small    # search & writing
      ollama pull deepseek-r1:14b  # reasoning & planning
    • For local web parsing (if use_jina = false):
      # Option 1: Use external Chrome (use_embed_browser = false in research.config)
      # Start Chrome debug mode, add optional user-data-dir for profile with online credentials
      google-chrome --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0 [--user-data-dir=/path/to/profile]
      
      # Option 2: Use embedded browser (use_embed_browser = true in research.config)
      # No manual Chrome setup needed, container manages headless browser automatically
      
      # Optional: Enhanced parsing
      ollama pull reader-lm:0.5b  # webpage parsing
      pip install docling         # PDF parsing
  2. Choose your deployment:

    A. Using Docker and CPU (recommended):

    cd docker
    docker compose up --build

    A.1. Using Docker and GPU

    docker compose -f docker-compose.xxx.yml up --build # xxx = cuda or rocm

    But for most user, I still suggest using CPU version for smaller file size as GPU now is only used for accelerating PDF OCR in fully local mode

    B. Direct Python (same functionality without containerization):

    cd docker
    pip install -r requirements.txt
    python main.py  # Runs on http://localhost:8000
  3. Access points:

  4. Usage Example:

curl http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deep_researcher",
    "messages": [{"role": "user", "content": "Latest developments in quantum computing"}],
    "stream": true,
    "max_iterations": 10,  # Research depth (>1)
    "max_search_items": 4,  # Results per search (>1, for use_jina=false)
    "default_model": "anthropic/claude-3.5-haiku", # Optional: Override default model
    "reason_model": "deepseek/deepseek-r1-distill-qwen-32b" # Optional: Override reasoning model
  }'

Note

For advanced settings and configurations, please check docker/README.md for detailed instructions.

πŸ–₯️ Simple Gradio Interface to Test the Server Setup (Online Mode only)

For those who prefer a graphical interface and don't want to install a 3rd-party chat client, a simple Gradio-based UI is available in the simple-webui directory.

Gradio sample usage

To use the Gradio interface:

  1. Install dependencies:
    cd simple-webui
    pip install -r requirements.txt
  2. Make sure the OpenDeepResearcher API is running either by Docker or Python.
  3. Start the interface:
    python gradio_online_mode.py

πŸ““ Jupyter Notebook Setup (Alternative, obsolete soon)

If you prefer using Jupyter notebooks directly:

1. Online Mode (Maximum Speed)

2. Hybrid Mode (Speed/Privacy Balance)

3. Fully Local Mode (Maximum Privacy)

πŸ§‘β€πŸ”¬ How It Works

graph TB;
    subgraph Input
    A[User Query]
    end
    subgraph Planning
    B[Generate Research Plan]
    E[Generate Writing Plan]
    end
    subgraph Research
    C[Search Agent]
    D[Evaluate Results]
    end
    subgraph Output
    F[Final Report]
    end
    
    A --> B
    B --> C
    C --> D
    D -->|Need More| C
    D -->|Complete| E
    E --> F
    
    style A fill:#f9f,stroke:#333,stroke-width:2px
    style F fill:#ccf,stroke:#333,stroke-width:2px
Loading

βš™οΈ Core Components

  • SearXNG: Private, unbiased search:
  • Content Parsing:
    • Fast: Jina API
    • Private: reader-lm + docling (local)
  • LLM Provider:
    • Fast: OpenRouter API
    • Private: Ollama (local models)

🏁 Roadmap

  • Support Ollama
  • Support Playwright and your own credentials to bypass publisher limits
  • Use Playwright and Ollama's reader-lm for 100% local service
  • Make into a docker image for easy install
  • Add a simple Gradio interface for quick testing
  • Refine process and reduce token usage via DSPy
  • Add more parsing methods with a decision agent to optimize per website extraction
  • Integrate tool calling
  • Add classifer models to fact-check sources

πŸ’‘ Troubleshooting

  • RuntimeError with asyncio: Install and apply nest_asyncio
  • API or Rate limit Issues:
    • Verify API keys and endpoints
    • When hitting rate limits:
      1. Set appropriate request_per_minute in research.config (-1 to disable)
      2. Configure fallback_model with a model that has:
        • Large context length (100k+ for online mode, 32k+ for local)
        • High tokens per minute limits
        • Example: google/gemini-2.0-flash-001
      3. Add operation_wait_time between iterations if needed
  • Jina URL resolve issue: Wait and retry, usually due to high load
  • Chrome/Browser: Choose between external Chrome (use_jina = false, use_embed_browser = false) or embedded browser (use_embed_browser = true)
  • SearXNG Access: For local setup, verify port 4000 is available. Alternatively, use https://searx.perennialte.ch/ or another public instance that supports JSON output (test with instance-url/search?q=test&format=json to see if it returns JSON data or 403)

Price prediction

  • If you use the online mode, the cost is around $0.1 to $0.5 for simple reports in minutes or up to $2 for complex reports in up to an hour. (Using Gemini 2.0 Flash paid version as reference, claude and o3-mini will be much expensive)
  • If you use the hybrid mode, the cost is around $0.01 to $0.1 for even most comprehensive reports. But please ensure you have enough context length for the models to work with, recommend at least 32k tokens.
  • If you use the fully local mode, the generation time will be a lot longer, for a 5 interation 4 search items report, it will take around 1 hour on my RX 7800 XT.

My example, an 8-pages proceeding style physics report going through 573 sources using online method took 51 min at €1.4 with Gemini 2.0 Flash(via openrouter) and Jina.

Of course, the above is if you don't count electricity bill.


Follow original author Matt on X for base code updates.

Follow this repo for academic and local use updates.

OpenDeepResearcher and OpenDeepResearcher-via-searxng are released under the MIT License. See the LICENSE file for more details.

Also my gratitude to all the open-source software we have used in this project, including ollama, searxng, docling, playwright, Jina and many more.

About

Open Deep Researcher with openai compatible endpoint, now completely local with ollama, local playwright via searxng with citations and planning from CoT

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Jupyter Notebook 74.1%
  • Python 25.6%
  • Dockerfile 0.3%