Skip to main content

LangChain Model Switching: Use OpenAI, Claude, Gemini, and Ollama in One App

Intermediate60 min2 exercises30 XP
0/2 exercises

You've built an app on GPT-4o. It works great — until OpenAI has an outage and your users see nothing but error messages. Or your costs spike because GPT-4o is overkill for simple summarisation tasks. Or your enterprise client demands that sensitive data never leaves their network, so you need a local model.

The fix isn't rewriting your app for each provider. It's writing your app once and swapping the model with a single config change. That's exactly what LangChain's model abstraction gives you, and by the end of this tutorial, you'll have a production-ready model selector that handles four providers, automatic fallbacks, and rate-limit retries.

Why Model Switching Matters

I've shipped three different LLM-powered products, and every single one ended up needing multiple models. Not because I planned it that way, but because reality forced it. Here's what happens in practice:

Cost control — GPT-4o costs roughly 15x more than GPT-4o-mini per token. If 70% of your requests are simple classification tasks, you're burning money on a vastly overqualified model. Routing those tasks to a cheaper model cuts your bill dramatically.

Reliability — Every provider has outages. OpenAI, Anthropic, Google — they all go down. If your app depends on a single provider, your app goes down too. A fallback chain that tries Provider B when Provider A fails is table stakes for production.

Privacy and compliance — Some clients require that data never leaves their infrastructure. Ollama running a local Llama model satisfies that constraint. Your cloud-hosted app won't.

Model strengths — Claude excels at long-context document analysis. GPT-4o is strong at structured output. Gemini handles multimodal inputs natively. Matching the task to the model's strengths gives better results.

Setting Up Four Providers

Before we build the switcher, let's get each provider working individually. Each one requires its own package and API key (except Ollama, which runs locally).

Prerequisites

Python version: 3.10+

Required packages: langchain (0.3+), langchain-openai, langchain-anthropic, langchain-google-genai, langchain-ollama

Install:

Install all four provider packages
Loading editor...

API keys: You need keys for the cloud providers. Set them as environment variables — never hardcode them in your scripts.

Set API keys as environment variables
Loading editor...

Initialising Each Model

Each provider has its own LangChain class, but they all share the same interface. Watch how the invoke() call is identical across all four — that's the abstraction at work.

Initialise all four providers
Loading editor...

Every one of these objects is a BaseChatModel. That means they all respond to .invoke(), .stream(), .batch(), and .ainvoke() with the same signature. Let's prove it:

Same invoke() call across all four providers
Loading editor...

Each provider returns an AIMessage with a .content attribute. The response format is identical regardless of which model generated it. This is the foundation everything else builds on.

Building a Config-Driven Model Selector

Hardcoding ChatOpenAI(model="gpt-4o-mini") everywhere in your app is the same mistake as hardcoding database connection strings. When you need to change the model, you're hunting through files. A config-driven selector centralises that decision into one place.

The idea is straightforward: define a function that takes a provider name and model name, then returns the right LangChain model object. Your application code never imports provider-specific classes directly — it calls the selector.

A simple model selector function
Loading editor...

Now your app code looks like this — and switching models means changing two strings, not rewriting imports and constructor calls:

Using the model selector in application code
Loading editor...

To switch from OpenAI to Claude, change get_model("openai", "gpt-4o-mini") to get_model("anthropic", "claude-sonnet-4-20250514"). Nothing else changes. Not the message format, not the invoke call, not the response parsing.

Config File Approach

For real applications, I prefer pulling the model choice from a config file rather than hardcoding it in Python. A YAML or JSON config separates deployment decisions from code, so ops teams can switch models without touching your source.

Loading model config from a YAML file
Loading editor...
Exercise 1: Build a Model Selector Function
Write Code

Write a function called select_model that takes a provider string and returns the corresponding model class name as a string. Support three providers: "openai" should return "ChatOpenAI", "anthropic" should return "ChatAnthropic", and "google" should return "ChatGoogleGenerativeAI". If the provider is not recognised, raise a ValueError with the message Unknown provider: '<name>' (where <name> is the provider that was passed in).

Loading editor...

Fallback Chains — Automatic Provider Failover

A model selector is great for manual switching, but what happens when OpenAI returns a 500 error at 2 AM? You need automatic failover — try the primary model, and if it fails, fall back to an alternative without any human intervention.

LangChain has a built-in .with_fallbacks() method for exactly this. You chain models together, and LangChain tries them in order until one succeeds.

Creating a fallback chain with three providers
Loading editor...

The model_with_fallbacks object behaves exactly like a regular model. You call .invoke() on it the same way. If the primary model (GPT-4o-mini) raises an exception, LangChain catches it and tries Claude. If Claude also fails, it tries Gemini. If all three fail, it raises the last exception.

Using a fallback chain — same invoke() interface
Loading editor...

This is genuinely one of the most useful patterns in production LLM apps. I've seen systems where the fallback chain saved entire deployments during provider outages. The caller has no idea which model actually answered — it just works.

Testing Fallback Behaviour

You'll want to verify that fallbacks actually trigger. The simplest way is to intentionally misconfigure the primary model so it fails, then check that the fallback responds.

Testing that fallback triggers on primary failure
Loading editor...
Exercise 2: Implement a Fallback Chain
Write Code

Write a function called create_fallback_chain that takes a list of provider names (strings) and returns the provider that would handle the request if earlier providers fail. The function should simulate a fallback chain: it receives a list of providers and a set of failed_providers. It should return the first provider from the list that is NOT in the failed_providers set. If all providers have failed, return "ALL_FAILED".

Loading editor...

Handling Rate Limits Gracefully

Rate limits are the most common failure mode in LLM applications. Hit your provider's requests-per-minute ceiling, and you get a 429 error. Without handling, your app crashes. With proper handling, it waits and retries automatically.

LangChain models support a max_retries parameter out of the box. When the provider returns a rate-limit error, LangChain waits with exponential backoff and retries.

Built-in retry with exponential backoff
Loading editor...

For heavier workloads — batch processing thousands of documents, for instance — you might hit rate limits repeatedly. Combining retries with fallbacks gives you a robust pipeline: retry the primary model a few times, and if it still can't get through, switch to a different provider entirely.

Combining retries with fallbacks
Loading editor...

Model-Based Routing — Matching Tasks to Models

Not every request deserves your most expensive model. A simple yes/no classification doesn't need GPT-4o. A complex code generation task doesn't belong on a tiny local model. Routing requests to the right model based on task type is where you see real cost savings.

The pattern is a routing function that examines the request and picks the appropriate model. This is different from the fallback chain — fallbacks handle failures, routing handles intent.

A task-based model router
Loading editor...

Now your application code declares what it needs, not which model to use. The routing table — separate from your business logic — makes the cost/quality tradeoff explicit and easy to adjust.

Using the router for different task types
Loading editor...

Putting It All Together — Production Model Manager

Let's combine the config-driven selector, fallback chains, retries, and routing into a single class that you can drop into any project. This is the pattern I use in production — a ModelManager that handles all the complexity so the rest of the app doesn't have to think about it.

A production ModelManager class
Loading editor...

Using the ModelManager is clean and explicit. The application code stays model-agnostic while the manager handles all the provider-specific details.

Using the ModelManager in application code
Loading editor...

Common Mistakes and How to Fix Them

Mistake 1: Wrong Parameter Names Across Providers

The LangChain abstraction normalises the interface, but constructor parameters still differ slightly between providers. This catches people who assume all providers accept the same kwargs.

Wrong — using OpenAI parameters on Anthropic
# This will error — Anthropic uses "timeout", not "request_timeout"
model = ChatAnthropic(
    model="claude-sonnet-4-20250514",
    request_timeout=30,  # OpenAI parameter name
)
Correct — use the right parameter name for each provider
# OpenAI uses request_timeout
openai_model = ChatOpenAI(model="gpt-4o-mini", request_timeout=30)

# Anthropic uses timeout
claude_model = ChatAnthropic(model="claude-sonnet-4-20250514", timeout=30)

Mistake 2: Creating Models Inside Loops

Each model constructor creates a new HTTP client and validates the API key. Creating a model inside a loop means you spin up a new client for every iteration — slow and wasteful.

Wrong — new model object per iteration
questions = ["Q1?", "Q2?", "Q3?"]
for q in questions:
    model = ChatOpenAI(model="gpt-4o-mini")  # New client each time!
    response = model.invoke([HumanMessage(content=q)])
    print(response.content)
Correct — create once, reuse
model = ChatOpenAI(model="gpt-4o-mini")  # Create once
questions = ["Q1?", "Q2?", "Q3?"]
for q in questions:
    response = model.invoke([HumanMessage(content=q)])
    print(response.content)

Mistake 3: No Fallbacks in Production

If your app uses a single provider with no fallback, you're one outage away from a production incident. Every LLM provider has downtime — building without fallbacks is like running a web server without health checks.

Risky — single point of failure
# If OpenAI goes down, your entire app goes down
model = ChatOpenAI(model="gpt-4o-mini")
response = model.invoke(messages)
Resilient — automatic failover
primary = ChatOpenAI(model="gpt-4o-mini", max_retries=2)
fallback = ChatAnthropic(model="claude-sonnet-4-20250514", max_retries=2)
model = primary.with_fallbacks([fallback])
response = model.invoke(messages)

Mistake 4: Forgetting That Ollama Requires a Running Server

Unlike cloud providers where an API key is enough, Ollama requires a local server process. If you try to invoke an Ollama model without ollama serve running, you'll get a connection error.

Ollama requires a running server
Loading editor...

Provider Comparison — Quick Reference

When deciding which provider to use for which task, this table summarises the key tradeoffs. Pricing changes frequently, so check each provider's pricing page for current rates.

FeatureOpenAI (GPT-4o-mini)Anthropic (Claude Sonnet)Google (Gemini Flash)Ollama (Local)
SetupAPI keyAPI keyAPI keyLocal install
LatencyLowLowLowDepends on hardware
Max context128K tokens200K tokens1M tokensModel-dependent
StrengthsStructured output, function callingLong docs, careful reasoningMultimodal, large contextPrivacy, no API costs
LangChain classChatOpenAIChatAnthropicChatGoogleGenerativeAIChatOllama
Packagelangchain-openailangchain-anthropiclangchain-google-genailangchain-ollama

Complete Code

Here's the full ModelManager class with the routing function and fallback support in a single script. Copy this into a model_manager.py file and import it across your project.

Complete model_manager.py
Loading editor...

Frequently Asked Questions

Can I mix streaming and non-streaming across providers?

Yes. All four providers support .stream() and .astream() through the same LangChain interface. The streaming chunks may differ slightly in structure, but you iterate them the same way:

Python
Loading editor...

What happens if my fallback model also fails?

LangChain tries each fallback in order. If the last model in the chain also raises an exception, that exception propagates to your calling code. Wrap the .invoke() call in a try/except block to handle the case where all providers are down.

Python
Loading editor...

How do I track which provider actually responded?

The response's response_metadata dictionary typically includes the model name. You can also use LangChain callbacks to log exactly which model in the fallback chain handled each request. This is essential for cost tracking and debugging.

Python
Loading editor...

Does model switching work with LCEL chains and agents?

Absolutely. Since every model returned by the selector is a BaseChatModel, it plugs into any LCEL chain, agent, or tool pipeline. A prompt template piped to a model with fallbacks works exactly as you'd expect: prompt | model_with_fallbacks | output_parser. The fallback logic is invisible to the rest of the chain.

References

  • LangChain documentation — How to add fallbacks to a runnable. Link
  • LangChain documentation — Chat model integrations. Link
  • LangChain documentation — langchain-openai package. Link
  • LangChain documentation — langchain-anthropic package. Link
  • LangChain documentation — langchain-google-genai package. Link
  • LangChain documentation — langchain-ollama package. Link
  • OpenAI API documentation — Rate limits. Link
  • Ollama documentation — Getting started. Link

  • Related Tutorials