Intermediate

⚑ Streaming Responses

Display agent output token-by-token in real time for a fast, responsive user experience.

Streaming is essential for any interactive application. Without it, the user stares at a blank screen until the model finishes generating the entire response β€” which can take several seconds for longer outputs. With streaming, text appears progressively, just like in ChatGPT or Copilot.

Agent Framework provides RunStreamingAsync() which returns an IAsyncEnumerable<AgentResponseUpdate>. You iterate over it using await foreach and process each update as it arrives. Each update may contain a fragment of the text response, tool call events, or status updates.

Key Concepts

  • RunStreamingAsync() β€” returns an async stream of response fragments
  • AgentResponseUpdate β€” a single streaming update; may contain text, tool calls, or metadata
  • AgentResponseUpdate.Text β€” extracts just the text fragment from the update
  • await foreach β€” C# construct for consuming IAsyncEnumerable
  • Partial output β€” each update is only a fragment; accumulate them for the full response

NuGet Packages

dotnet add package Microsoft.Agents.AI.OpenAI --prerelease
dotnet add package Azure.AI.OpenAI --prerelease
dotnet add package Azure.Identity

Code Sample β€” Basic Streaming

using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Agents.AI;

AIAgent agent = new AzureOpenAIClient(
        new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
        new AzureCliCredential())
    .GetChatClient("gpt-4o-mini")
    .AsAIAgent(instructions: "You are a helpful assistant.");

// RunStreamingAsync returns IAsyncEnumerable<AgentResponseUpdate>.
// Each update arrives as the model generates tokens.
await foreach (var update in agent.RunStreamingAsync("Tell me a fun fact about the Moon."))
{
    // update.Text is null for non-text updates (e.g. tool call events).
    if (update.Text is not null)
    {
        Console.Write(update.Text);
    }
}

// Print a newline after streaming completes.
Console.WriteLine();

Code Sample β€” Streaming with Accumulated Response

using Azure.AI.OpenAI;
using Azure.Identity;
using Microsoft.Agents.AI;
using System.Text;

AIAgent agent = new AzureOpenAIClient(
        new Uri(Environment.GetEnvironmentVariable("AZURE_OPENAI_ENDPOINT")!),
        new AzureCliCredential())
    .GetChatClient("gpt-4o-mini")
    .AsAIAgent(instructions: "You are a helpful assistant.");

var fullResponse = new StringBuilder();

await foreach (var update in agent.RunStreamingAsync("What are the planets in our solar system?"))
{
    if (update.Text is not null)
    {
        Console.Write(update.Text);   // stream to console in real time
        fullResponse.Append(update.Text);
    }
}

Console.WriteLine();
Console.WriteLine($"\n--- Full response ({fullResponse.Length} chars) ---");
Console.WriteLine(fullResponse.ToString());

Step-by-Step Explanation

  1. Call RunStreamingAsync() β€” This sends the request to the model and immediately returns an async enumerable without waiting for the full response.
  2. Iterate with await foreach β€” Each AgentResponseUpdate arrives as the model produces tokens. The loop body executes for every update.
  3. Check update.Text β€” Not every update contains text. Tool call invocations, function results, and metadata also arrive as updates. Guard with a null check (or is not null) before writing.
  4. Accumulate if needed β€” Use a StringBuilder or similar to capture the full response for storage, logging, or further processing after streaming completes.

Next Steps