Skip to main content

Overview

Enable real-time response streaming by setting stream: true in your /v2/chat requests. The API will send Server-Sent Events (SSE) as the response is generated, providing a better user experience with progressive content delivery.

Benefits of Streaming

Better UX

Show real-time progress and thinking to keep users engaged

Lower Latency

Display content as it’s generated instead of waiting for complete response

Transparency

Show the AI’s thinking process and search progress

Progressive Display

Render content incrementally for faster perceived performance

How It Works

When streaming is enabled:
1

Request Sent

Your client sends a POST request with stream: true
2

Connection Established

Server establishes an SSE connection
3

Events Streamed

Server sends events as the response is generated
4

Connection Closed

Server closes connection after final event

Streaming Request

curl -X POST https://api.fintool.com/v2/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -N \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "Analyze Microsoft earnings"
      }
    ],
    "stream": true
  }'
Always use the --no-buffer flag with cURL or set stream=True in your HTTP client to receive events in real-time.

Server-Sent Events Format

Events are sent in the SSE format:
event: message
data: {"type": "message", "message": {...}}

event: message
data: {"type": "message", "message": {...}}
Each event contains a JSON object with the following structure:

Event Types

type
string
The type of event. Common values:
  • message - New content or updates
  • thinking - AI is processing (optional)
  • done - Stream complete

Event Examples

Thinking Event

Shows the AI’s current processing state:
{
  "type": "message",
  "message": {
    "role": "assistant",
    "thinking": "Searching Microsoft financial documents...",
    "content": ""
  }
}

Content Event

Progressive content delivery:
{
  "type": "message",
  "message": {
    "role": "assistant",
    "content": "Microsoft reported Q2 2025 revenue of $69.6B"
  }
}

Complete Event with Citations

Final event includes full content and citations:
{
  "type": "message",
  "message": {
    "role": "assistant",
    "content": "Microsoft reported Q2 2025 revenue of $69.6B **[msft_q2_2025]**, up 17% year-over-year.",
    "metadata": {
      "session_data": "eyJzZXNzaW9uX2lkIjoi..."
    }
  },
  "citations": [
    {
      "chunk_id": "msft_q2_2025",
      "document_title": "Microsoft Q2 2025 Earnings Release",
      "page_number": 1,
      "relevance_score": 0.98
    }
  ]
}

Complete Streaming Example

Here’s a full example of handling streaming responses in Python:
import requests
import json
import re

def stream_chat(query):
    url = "https://api.fintool.com/v2/chat"
    headers = {
        "Authorization": "Bearer YOUR_API_KEY",
        "Content-Type": "application/json"
    }

    payload = {
        "messages": [{"role": "user", "content": query}],
        "stream": True
    }

    full_content = ""
    citations = []

    with requests.post(url, headers=headers, json=payload, stream=True) as response:
        for line in response.iter_lines():
            if not line:
                continue

            line = line.decode('utf-8')

            # Parse SSE format
            if line.startswith('event: '):
                event_type = line[7:]
            elif line.startswith('data: '):
                try:
                    data = json.loads(line[6:])

                    if data.get('type') == 'message':
                        message = data.get('message', {})

                        # Show thinking state
                        if 'thinking' in message:
                            print(f"[Thinking] {message['thinking']}")

                        # Update content
                        if 'content' in message:
                            new_content = message['content']
                            # Print only new content
                            if new_content.startswith(full_content):
                                new_part = new_content[len(full_content):]
                                print(new_part, end='', flush=True)
                                full_content = new_content

                        # Store citations
                        if 'citations' in data:
                            citations = data['citations']

                except json.JSONDecodeError:
                    pass

    print("\n\nCitations:")
    for citation in citations:
        print(f"- {citation['document_title']}, Page {citation['page_number']}")

# Usage
stream_chat("What was Apple's revenue in Q4 2024?")

React/TypeScript Example

For web applications using React:
import { useState } from 'react';

interface Message {
  role: 'user' | 'assistant';
  content: string;
  thinking?: string;
}

function ChatComponent() {
  const [messages, setMessages] = useState<Message[]>([]);
  const [isStreaming, setIsStreaming] = useState(false);

  const sendMessage = async (query: string) => {
    setIsStreaming(true);

    const response = await fetch('https://api.fintool.com/v2/chat', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        messages: [...messages, { role: 'user', content: query }],
        stream: true
      })
    });

    const reader = response.body?.getReader();
    const decoder = new TextDecoder();

    let assistantMessage: Message = { role: 'assistant', content: '' };
    setMessages(prev => [...prev, assistantMessage]);

    while (true) {
      const { done, value } = await reader!.read();
      if (done) break;

      const chunk = decoder.decode(value);
      const lines = chunk.split('\n');

      for (const line of lines) {
        if (line.startsWith('data: ')) {
          const data = JSON.parse(line.slice(6));

          if (data.type === 'message') {
            setMessages(prev => {
              const updated = [...prev];
              const lastMsg = updated[updated.length - 1];

              if (data.message.thinking) {
                lastMsg.thinking = data.message.thinking;
              }

              if (data.message.content) {
                lastMsg.content = data.message.content;
              }

              return updated;
            });
          }
        }
      }
    }

    setIsStreaming(false);
  };

  return (
    <div>
      {messages.map((msg, i) => (
        <div key={i}>
          {msg.thinking && <em>{msg.thinking}</em>}
          <p>{msg.content}</p>
        </div>
      ))}
    </div>
  );
}

Best Practices

Implement retry logic for network failures. SSE connections can drop, especially on mobile networks.
from tenacity import retry, stop_after_attempt, wait_exponential

@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=2, max=10))
def stream_with_retry(query):
    # Your streaming code here
    pass
Store partial content and update UI incrementally to avoid flickering or jumpy displays.
Display the “thinking” field to keep users informed about processing progress.
SSE format can have multiple events in a single chunk. Always split by \n\n and handle each event separately.
Always clean up connections and close streams properly when component unmounts or user navigates away.

Troubleshooting

Make sure you’re using buffering options:
  • cURL: Use --no-buffer or -N flag
  • Python requests: Use stream=True
  • Fetch API: Read from response.body.getReader()
SSE format uses data: prefix. Always strip this before parsing JSON:
if line.startswith('data: '):
    data = json.loads(line[6:])  # Skip 'data: ' prefix
Check your network timeout settings. Some proxies or load balancers may close idle connections. Consider implementing heartbeat/keepalive logic.

Non-Streaming vs Streaming

Non-Streaming (stream: false)

  • Single response when complete
  • Simpler to implement
  • Better for batch processing
  • Lower overhead

Streaming (stream: true)

  • Progressive content delivery
  • Better user experience
  • Shows thinking process
  • Appears faster
Use streaming for interactive applications where users are waiting for responses. Use non-streaming for background jobs or batch processing.