How to stream chat model responses
All chat
models
implement the Runnable
interface,
which comes with a default implementations of standard runnable
methods (i.e.Β invoke
, batch
, stream
, streamEvents
).
The default streaming implementation provides an AsyncIterator
that yields a single value: the final output from the underlying chat
model provider.
The default implementation does not provide support for token-by-token streaming, but it ensures that the the model can be swapped in for any other model as it supports the same standard interface.
The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.
See which integrations support token-by-token streaming here.
Streamingβ
Below we use a ---
to help visualize the delimiter between tokens.
Pick your chat model:
- OpenAI
- Anthropic
- FireworksAI
- MistralAI
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/openai
yarn add @langchain/openai
pnpm add @langchain/openai
Add environment variables
OPENAI_API_KEY=your-api-key
Instantiate the model
import { ChatOpenAI } from "@langchain/openai";
const model = new ChatOpenAI({
model: "gpt-3.5-turbo-0125",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/anthropic
yarn add @langchain/anthropic
pnpm add @langchain/anthropic
Add environment variables
ANTHROPIC_API_KEY=your-api-key
Instantiate the model
import { ChatAnthropic } from "@langchain/anthropic";
const model = new ChatAnthropic({
model: "claude-3-sonnet-20240229",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/community
yarn add @langchain/community
pnpm add @langchain/community
Add environment variables
FIREWORKS_API_KEY=your-api-key
Instantiate the model
import { ChatFireworks } from "@langchain/community/chat_models/fireworks";
const model = new ChatFireworks({
model: "accounts/fireworks/models/firefunction-v1",
temperature: 0
});
Install dependencies
- npm
- yarn
- pnpm
npm i @langchain/mistralai
yarn add @langchain/mistralai
pnpm add @langchain/mistralai
Add environment variables
MISTRAL_API_KEY=your-api-key
Instantiate the model
import { ChatMistralAI } from "@langchain/mistralai";
const model = new ChatMistralAI({
model: "mistral-large-latest",
temperature: 0
});
for await (const chunk of await model.stream(
"Write me a 1 verse song about goldfish on the moon"
)) {
console.log(`${chunk.content}
---`);
}
---
Here
---
is
---
a
---
---
1
---
---
verse
---
song
---
about
---
gol
---
dfish
---
on
---
the
---
moon
---
:
---
Gol
---
dfish
---
on
---
the
---
moon
---
,
---
swimming
---
through
---
the
---
sk
---
ies
---
,
---
Floating
---
in
---
the
---
darkness
---
,
---
beneath
---
the
---
lunar
---
eyes
---
.
---
Weight
---
less
---
as
---
they
---
drift
---
,
---
through
---
the
---
endless
---
voi
---
d,
---
D
---
rif
---
ting
---
,
---
swimming
---
,
---
exploring
---
,
---
this
---
new
---
worl
---
d unexp
---
lo
---
ye
---
d.
---
---
---
Stream eventsβ
Chat models also support the standard astream events method.
This method is useful if youβre streaming output from a larger LLM application that contains multiple steps (e.g., an LLM chain composed of a prompt, llm and parser).
let idx = 0;
for await (const event of model.streamEvents(
"Write me a 1 verse song about goldfish on the moon",
{
version: "v1",
}
)) {
idx += 1;
if (idx >= 5) {
console.log("...Truncated");
break;
}
console.log(event);
}
{
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
event: "on_llm_start",
name: "ChatAnthropic",
tags: [],
metadata: {},
data: { input: "Write me a 1 verse song about goldfish on the moon" }
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "",
additional_kwargs: {
id: "msg_01DqDQ9in33ZhmrCzdZaRNMZ",
type: "message",
role: "assistant",
model: "claude-3-haiku-20240307"
},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "",
name: undefined,
additional_kwargs: {
id: "msg_01DqDQ9in33ZhmrCzdZaRNMZ",
type: "message",
role: "assistant",
model: "claude-3-haiku-20240307"
},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: "Here",
additional_kwargs: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: "Here",
name: undefined,
additional_kwargs: {},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
{
event: "on_llm_stream",
run_id: "a84e1294-d281-4757-8f3f-dc4440612949",
tags: [],
metadata: {},
name: "ChatAnthropic",
data: {
chunk: AIMessageChunk {
lc_serializable: true,
lc_kwargs: {
content: " is",
additional_kwargs: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: [],
response_metadata: {}
},
lc_namespace: [ "langchain_core", "messages" ],
content: " is",
name: undefined,
additional_kwargs: {},
response_metadata: {},
tool_calls: [],
invalid_tool_calls: [],
tool_call_chunks: []
}
}
}
...Truncated