Overview
Enable real-time response streaming by setting stream: true in your /v2/chat requests. The API will send Server-Sent Events (SSE) as the response is generated, providing a better user experience with progressive content delivery.
Benefits of Streaming
Better UX Show real-time progress and thinking to keep users engaged
Lower Latency Display content as it’s generated instead of waiting for complete response
Transparency Show the AI’s thinking process and search progress
Progressive Display Render content incrementally for faster perceived performance
How It Works
When streaming is enabled:
Request Sent
Your client sends a POST request with stream: true
Connection Established
Server establishes an SSE connection
Events Streamed
Server sends events as the response is generated
Connection Closed
Server closes connection after final event
Streaming Request
curl -X POST https://api.fintool.com/v2/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-N \
-d '{
"messages": [
{
"role": "user",
"content": "Analyze Microsoft earnings"
}
],
"stream": true
}'
Always use the --no-buffer flag with cURL or set stream=True in your HTTP client to receive events in real-time.
Events are sent in the SSE format:
event: message
data: {"type": "message", "message": {...}}
event: message
data: {"type": "message", "message": {...}}
Each event contains a JSON object with the following structure:
Event Types
The type of event. Common values:
message - New content or updates
thinking - AI is processing (optional)
done - Stream complete
Event Examples
Thinking Event
Shows the AI’s current processing state:
{
"type" : "message" ,
"message" : {
"role" : "assistant" ,
"thinking" : "Searching Microsoft financial documents..." ,
"content" : ""
}
}
Content Event
Progressive content delivery:
{
"type" : "message" ,
"message" : {
"role" : "assistant" ,
"content" : "Microsoft reported Q2 2025 revenue of $69.6B"
}
}
Complete Event with Citations
Final event includes full content and citations:
{
"type" : "message" ,
"message" : {
"role" : "assistant" ,
"content" : "Microsoft reported Q2 2025 revenue of $69.6B **[msft_q2_2025]**, up 17% year-over-year." ,
"metadata" : {
"session_data" : "eyJzZXNzaW9uX2lkIjoi..."
}
},
"citations" : [
{
"chunk_id" : "msft_q2_2025" ,
"document_title" : "Microsoft Q2 2025 Earnings Release" ,
"page_number" : 1 ,
"relevance_score" : 0.98
}
]
}
Complete Streaming Example
Here’s a full example of handling streaming responses in Python:
import requests
import json
import re
def stream_chat ( query ):
url = "https://api.fintool.com/v2/chat"
headers = {
"Authorization" : "Bearer YOUR_API_KEY" ,
"Content-Type" : "application/json"
}
payload = {
"messages" : [{ "role" : "user" , "content" : query}],
"stream" : True
}
full_content = ""
citations = []
with requests.post(url, headers = headers, json = payload, stream = True ) as response:
for line in response.iter_lines():
if not line:
continue
line = line.decode( 'utf-8' )
# Parse SSE format
if line.startswith( 'event: ' ):
event_type = line[ 7 :]
elif line.startswith( 'data: ' ):
try :
data = json.loads(line[ 6 :])
if data.get( 'type' ) == 'message' :
message = data.get( 'message' , {})
# Show thinking state
if 'thinking' in message:
print ( f "[Thinking] { message[ 'thinking' ] } " )
# Update content
if 'content' in message:
new_content = message[ 'content' ]
# Print only new content
if new_content.startswith(full_content):
new_part = new_content[ len (full_content):]
print (new_part, end = '' , flush = True )
full_content = new_content
# Store citations
if 'citations' in data:
citations = data[ 'citations' ]
except json.JSONDecodeError:
pass
print ( " \n\n Citations:" )
for citation in citations:
print ( f "- { citation[ 'document_title' ] } , Page { citation[ 'page_number' ] } " )
# Usage
stream_chat( "What was Apple's revenue in Q4 2024?" )
React/TypeScript Example
For web applications using React:
import { useState } from 'react' ;
interface Message {
role : 'user' | 'assistant' ;
content : string ;
thinking ?: string ;
}
function ChatComponent () {
const [ messages , setMessages ] = useState < Message []>([]);
const [ isStreaming , setIsStreaming ] = useState ( false );
const sendMessage = async ( query : string ) => {
setIsStreaming ( true );
const response = await fetch ( 'https://api.fintool.com/v2/chat' , {
method: 'POST' ,
headers: {
'Authorization' : 'Bearer YOUR_API_KEY' ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
messages: [ ... messages , { role: 'user' , content: query }],
stream: true
})
});
const reader = response . body ?. getReader ();
const decoder = new TextDecoder ();
let assistantMessage : Message = { role: 'assistant' , content: '' };
setMessages ( prev => [ ... prev , assistantMessage ]);
while ( true ) {
const { done , value } = await reader ! . read ();
if ( done ) break ;
const chunk = decoder . decode ( value );
const lines = chunk . split ( ' \n ' );
for ( const line of lines ) {
if ( line . startsWith ( 'data: ' )) {
const data = JSON . parse ( line . slice ( 6 ));
if ( data . type === 'message' ) {
setMessages ( prev => {
const updated = [ ... prev ];
const lastMsg = updated [ updated . length - 1 ];
if ( data . message . thinking ) {
lastMsg . thinking = data . message . thinking ;
}
if ( data . message . content ) {
lastMsg . content = data . message . content ;
}
return updated ;
});
}
}
}
}
setIsStreaming ( false );
};
return (
< div >
{ messages . map (( msg , i ) => (
< div key = { i } >
{ msg . thinking && < em >{ msg . thinking }</ em >}
< p >{msg. content } </ p >
</ div >
))}
</ div >
);
}
Best Practices
Implement retry logic for network failures. SSE connections can drop, especially on mobile networks. from tenacity import retry, stop_after_attempt, wait_exponential
@retry ( stop = stop_after_attempt( 3 ), wait = wait_exponential( multiplier = 1 , min = 2 , max = 10 ))
def stream_with_retry ( query ):
# Your streaming code here
pass
Store partial content and update UI incrementally to avoid flickering or jumpy displays.
Display the “thinking” field to keep users informed about processing progress.
SSE format can have multiple events in a single chunk. Always split by \n\n and handle each event separately.
Always clean up connections and close streams properly when component unmounts or user navigates away.
Troubleshooting
Events not appearing in real-time
Make sure you’re using buffering options:
cURL: Use --no-buffer or -N flag
Python requests: Use stream=True
Fetch API: Read from response.body.getReader()
SSE format uses data: prefix. Always strip this before parsing JSON: if line.startswith( 'data: ' ):
data = json.loads(line[ 6 :]) # Skip 'data: ' prefix
Connection closes prematurely
Check your network timeout settings. Some proxies or load balancers may close idle connections. Consider implementing heartbeat/keepalive logic.
Non-Streaming vs Streaming
Non-Streaming (stream: false)
Single response when complete
Simpler to implement
Better for batch processing
Lower overhead
Streaming (stream: true)
Progressive content delivery
Better user experience
Shows thinking process
Appears faster
Use streaming for interactive applications where users are waiting for responses. Use non-streaming for background jobs or batch processing.