Integrations

Run VeloFill with Ollama and Gemma 3

Spin up Gemma 3 through Ollama, point VeloFill at your localhost endpoint, and keep sensitive data inside your network.

Updated March 3, 2026 8 min read
Install VeloFill Explore documentation
Llama logo representing local LLM hosting

Looking for more advanced features? This guide covers the simplest way to connect VeloFill to a local model. If you need to manage multiple models, enforce stricter security, or centralize logging, see our guide on routing VeloFill through LiteLLM.

Why pair VeloFill with Gemma 3

Running Gemma 3 locally gives you fast responses and keeps regulated data off public clouds. VeloFill speaks the same OpenAI-compatible API as Ollama, so with a few tweaks you can level up your form automation without sending prompts over the internet.

Prerequisites

  • A machine with a GPU that has at least 6 GB of VRAM for the 4B model (12 GB for 12B, 24 GB for 27B). 16 GB of system RAM is a minimum, with 32 GB recommended.
  • Ollama installed and running on macOS, Windows, or Linux.
  • VeloFill extension installed in Chrome, Edge, or Firefox.
  • Optional but encouraged: GPU acceleration for faster inference times.

Step 1: pull Gemma 3 through Ollama

  1. Open a terminal on the machine hosting Ollama.
  2. Ensure the Ollama application is running. If it was installed as a service, it should already be active. If not, you can start it manually with ollama serve.
  3. Download the Gemma 3 model:
    ollama pull gemma3:4b
    
  4. (Optional) Test the model locally to confirm it responds:
    ollama run gemma3:4b "Summarize VeloFill in two sentences."
    

Tip: If you need more capability and have 12+ GB of VRAM, try gemma3:12b. For maximum quality with 24+ GB VRAM, use gemma3:27b. The instructions below work the same way, just swap the tag where needed.

Which Gemma 3 Model Should You Use?

Gemma 3 is available in several sizes. Here’s how to choose:

Model VRAM Needed Best For
gemma3:4b 6 GB Most users—fast and efficient
gemma3:12b 12 GB Better quality, still responsive
gemma3:27b 24 GB Maximum quality for complex forms

Looking for Gemma 3 9B? Google’s Gemma 3 does not have a 9B variant. The Gemma 2 family included a 9B model, but Gemma 3 uses different sizes (4B, 12B, 27B for multimodal models). If you have 10-12 GB of VRAM and were hoping to use a 9B model, we recommend gemma3:12b—it’s the closest match and offers better performance.

Other Model Options

Gemma 3 isn’t your only option. Ollama supports many models suitable for form filling:

Model Size VRAM Needed Best For
llama3.2:3b 2 GB 4 GB Fast, simple forms
llama3.2:latest 4.7 GB 8 GB General purpose (recommended)
mistral:7b 4.1 GB 8 GB Strong reasoning
qwen2.5:7b 4.7 GB 8 GB Good multilingual support
qwen3:8b 4.8 GB 8 GB Latest Qwen, strong performance

To use a different model, simply pull it and update the Model ID in VeloFill:

# Example: Use Llama 3.2 instead
ollama pull llama3.2:latest

Then enter llama3.2:latest as the Model ID in VeloFill.

Step 2: Configure CORS for Browser Access

Ollama listens on http://127.0.0.1:11434 by default, but browser extensions require explicit CORS permission. You must set the OLLAMA_ORIGINS environment variable before starting Ollama:

macOS/Linux:

export OLLAMA_ORIGINS="*"
ollama serve

Windows (PowerShell):

$env:OLLAMA_ORIGINS="*"
ollama serve

Important: Restart Ollama after setting this variable. On macOS, quit the Ollama app and reopen it. On Linux with systemd, you may need to edit the service:

sudo systemctl edit ollama.service

Add:

[Service]
Environment="OLLAMA_ORIGINS=*"

Then run sudo systemctl daemon-reload && sudo systemctl restart ollama.

Warning: The Ollama API provides unrestricted access by default. Do not expose the Ollama port directly to the public internet, as this would allow anyone to use your model.

If you need to access Ollama from a different workstation, use a secure method like an SSH tunnel or a reverse proxy that can add an authentication layer.

Step 3: point VeloFill at Gemma 3

  1. Open the VeloFill extension and choose Options → LLM Provider.
  2. Select OpenAI-compatible / Custom endpoint.
  3. Set the endpoint URL to http://127.0.0.1:11434/v1.
  4. Leave the API key field blank unless you put Ollama behind a proxy that enforces keys.
  5. Under Model ID, enter gemma3:4b (or the tag you pulled earlier).
  6. Save the configuration.

Optional: adjust model parameters

These parameters can be adjusted directly within the VeloFill extension’s LLM Provider options screen.

  • Max tokens: Gemma 3 handles up to 8K tokens comfortably; cap requests to avoid latency spikes.
  • Temperature: Start at 0.2 for deterministic autofill responses; increase gradually for more creative copy.
  • System prompt: Reinforce internal style guides or classification instructions to keep outputs on-brand.

Step 4: validate the setup inside VeloFill

  1. Open a low-risk form (newsletter signup, sandbox CRM, etc.).
  2. Trigger the VeloFill autofill workflow.
  3. Watch the status panel—responses should show Gemma 3 as the active model, returning in a couple of seconds.
  4. If nothing happens, open the browser devtools console and look for network calls to /v1/chat/completions. A 200 response confirms the integration is live.

Troubleshooting checklist

  • Connection refused: Ensure ollama serve is running and that firewalls allow port 11434.
  • Model download slow: Use ollama pull gemma3:4b on lower-bandwidth connections and upgrade later.
  • Latency high: Reduce prompt size, lower max tokens, or move the model to a machine with a stronger GPU.
  • Prompt privacy: Keep the Ollama host on your internal network. Avoid port forwarding directly to the public internet without authentication.
  • CORS errors: Ensure OLLAMA_ORIGINS="*" is set and Ollama was restarted after setting it.
  • Model not found: Verify the model is downloaded with ollama list. The name must match exactly, including the tag (e.g., gemma3:4b, not just gemma3).
  • Out of memory: If you see memory errors, try a smaller model (e.g., gemma3:4b instead of gemma3:12b) or close other GPU-intensive applications. Check available VRAM with nvidia-smi on Linux/Windows.
  • Model not loading: Check if the model is running with ollama ps. If it shows 0% GPU utilization, your model may be falling back to CPU, which is much slower.

Keep iterating

  • Layer VeloFill knowledge bases so Gemma 3 can reference your private SOPs.
  • Schedule periodic model refreshes when new Gemma 3 builds release through Ollama.
  • Pair with LiteLLM if you want a unified gateway for multiple local models.

Need more help? See our troubleshooting guide for additional support, or check the Ollama documentation for model-specific issues.

Related reading

Mistral AI logo

How to Use VeloFill with Mistral AI

Learn how to connect VeloFill to Mistral's high-performance language models. This guide covers creating an API key on La Plateforme, configuring VeloFill, and optimizing your setup.

Read more →

Need a guided walkthrough?

Our team can help you connect VeloFill to your workflows, secure API keys, and roll out best practices.

Contact support Browse documentation