Streaming Responses

Overview

Enable real-time response streaming by setting stream: true in your /v2/chat requests. The API will send Server-Sent Events (SSE) as the response is generated, providing a better user experience with progressive content delivery.

Benefits of Streaming

Better UX

Show real-time progress and thinking to keep users engaged

Lower Latency

Display content as it’s generated instead of waiting for complete response

Transparency

Show the AI’s thinking process and search progress

Progressive Display

Render content incrementally for faster perceived performance

How It Works

When streaming is enabled:

Request Sent

Your client sends a POST request with stream: true

Connection Established

Server establishes an SSE connection

Events Streamed

Server sends events as the response is generated

Connection Closed

Server closes connection after final event

Streaming Request

curl -X POST https://api.fintool.com/v2/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Analyze Microsoft earnings"
      }
    ],
    "stream": true
  }'

Always use the --no-buffer flag with cURL or set stream=True in your HTTP client to receive events in real-time.

Server-Sent Events Format

Events are sent in the SSE format:

event: message
data: {"type": "message", "message": {...}}

event: message
data: {"type": "message", "message": {...}}

Each event contains a JSON object with the following structure:

Event Types

type

string

The type of event. Common values:

message - New content or updates
thinking - AI is processing (optional)
done - Stream complete

Event Examples

Thinking Event

Shows the AI’s current processing state:

{
  "type": "message",
  "message": {
    "role": "assistant",
    "thinking": "Searching Microsoft financial documents...",
    "content": ""
  }
}

Content Event

Progressive content delivery:

{
  "type": "message",
  "message": {
    "role": "assistant",
    "content": "Microsoft reported Q2 2025 revenue of $69.6B"
  }
}

Complete Event with Citations

Final event includes full content and citations:

{
  "type": "message",
  "message": {
    "role": "assistant",
    "content": "Microsoft reported Q2 2025 revenue of $69.6B **[msft_q2_2025]**, up 17% year-over-year.",
    "metadata": {
      "session_data": "eyJzZXNzaW9uX2lkIjoi..."
    }
  },
  "citations": [
    {
      "chunk_id": "msft_q2_2025",
      "document_title": "Microsoft Q2 2025 Earnings Release",
      "page_number": 1,
      "relevance_score": 0.98
    }
  ]
}

Complete Streaming Example

Here’s a full example of handling streaming responses in Python:

import requests
import json
import re

def stream_chat(query):
    url = "https://api.fintool.com/v2/chat"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "messages": [{"role": "user", "content": query}],
        "stream": True
    }

    full_content = ""
    citations = []

    with requests.post(url, headers=headers, json=payload, stream=True) as response:
        for line in response.iter_lines():
            if not line:
                continue

            line = line.decode('utf-8')

            # Parse SSE format
            if line.startswith('event: '):
                event_type = line[7:]
            elif line.startswith('data: '):
                try:
                    data = json.loads(line[6:])

                    if data.get('type') == 'message':
                        message = data.get('message', {})

                        # Show thinking state
                        if 'thinking' in message:
                            print(f"[Thinking] {message['thinking']}")

                        # Update content
                        if 'content' in message:
                            new_content = message['content']
                            # Print only new content
                            if new_content.startswith(full_content):
                                new_part = new_content[len(full_content):]
                                print(new_part, end='', flush=True)
                                full_content = new_content

                        # Store citations
                        if 'citations' in data:
                            citations = data['citations']

                except json.JSONDecodeError:
                    pass

    print("\n\nCitations:")
    for citation in citations:
        print(f"- {citation['document_title']}, Page {citation['page_number']}")

# Usage
stream_chat("What was Apple's revenue in Q4 2024?")

React/TypeScript Example

For web applications using React:

import { useState } from 'react';

interface Message {
  role: 'user' | 'assistant';
  content: string;
  thinking?: string;
}

function ChatComponent() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async (query: string) => {
    setIsStreaming(true);

    const response = await fetch('https://api.fintool.com/v2/chat', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        messages: [...messages, { role: 'user', content: query }],
        stream: true
      })
    });

    const reader = response.body?.getReader();
    const decoder = new TextDecoder();

    let assistantMessage: Message = { role: 'assistant', content: '' };
    setMessages(prev => [...prev, assistantMessage]);

    while (true) {
      const { done, value } = await reader!.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));

          if (data.type === 'message') {
            setMessages(prev => {
              const updated = [...prev];
              const lastMsg = updated[updated.length - 1];

              if (data.message.thinking) {
                lastMsg.thinking = data.message.thinking;
              }

              if (data.message.content) {
                lastMsg.content = data.message.content;
              }

              return updated;
            });
          }
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i}>
          {msg.thinking && <em>{msg.thinking}</em>}
          <p>{msg.content}</p>
        </div>
      ))}
    </div>
  );
}

Best Practices

Handle Connection Errors

Implement retry logic for network failures. SSE connections can drop, especially on mobile networks.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def stream_with_retry(query):
    # Your streaming code here
    pass

Buffer Partial Content

Store partial content and update UI incrementally to avoid flickering or jumpy displays.

Show Thinking States

Display the “thinking” field to keep users informed about processing progress.

Parse Events Carefully

SSE format can have multiple events in a single chunk. Always split by \n\n and handle each event separately.

Handle Graceful Shutdown

Always clean up connections and close streams properly when component unmounts or user navigates away.

Troubleshooting

Events not appearing in real-time

Make sure you’re using buffering options:

cURL: Use --no-buffer or -N flag
Python requests: Use stream=True
Fetch API: Read from response.body.getReader()

JSON parsing errors

SSE format uses data: prefix. Always strip this before parsing JSON:

if line.startswith('data: '):
    data = json.loads(line[6:])  # Skip 'data: ' prefix

Connection closes prematurely

Check your network timeout settings. Some proxies or load balancers may close idle connections. Consider implementing heartbeat/keepalive logic.

Non-Streaming vs Streaming

Non-Streaming (stream: false)

Single response when complete
Simpler to implement
Better for batch processing
Lower overhead

Streaming (stream: true)

Progressive content delivery
Better user experience
Shows thinking process
Appears faster

Use streaming for interactive applications where users are waiting for responses. Use non-streaming for background jobs or batch processing.

Getting Started

API Reference

Guides

Streaming Responses

Overview

Benefits of Streaming

Better UX

Lower Latency

Transparency

Progressive Display

How It Works

Streaming Request

Server-Sent Events Format

Event Types

Event Examples

Thinking Event

Content Event

Complete Event with Citations

Complete Streaming Example

React/TypeScript Example

Best Practices

Troubleshooting

Non-Streaming vs Streaming

Non-Streaming (stream: false)

Streaming (stream: true)

Getting Started

API Reference

Guides

​Overview

​Benefits of Streaming

Better UX

Lower Latency

Transparency

Progressive Display

​How It Works

​Streaming Request

​Server-Sent Events Format

​Event Types

​Event Examples

​Thinking Event

​Content Event

​Complete Event with Citations

​Complete Streaming Example

​React/TypeScript Example

​Best Practices

​Troubleshooting

​Non-Streaming vs Streaming

Non-Streaming (stream: false)

Streaming (stream: true)

Overview

Benefits of Streaming

How It Works

Streaming Request

Server-Sent Events Format

Event Types

Event Examples

Thinking Event

Content Event

Complete Event with Citations

Complete Streaming Example

React/TypeScript Example

Best Practices

Troubleshooting

Non-Streaming vs Streaming