Real-Time as an Architectural Concern
Real-time features are not a layer you add to a finished application. The moment your UI needs to reflect state that changes on a server or on another client, your architecture changes. You now have distributed state, and the fundamental problems of distributed systems apply: message ordering, delivery guarantees, conflict resolution, and consistency under network partition.
Most frontend engineers encounter real-time through one of three paths: a chat feature that needs to push new messages, a dashboard that needs live metrics, or a document editor that needs to synchronise across users. Each has the same underlying challenge but different tolerance for latency, conflict, and data loss. Understanding those trade-offs determines which protocol and architecture is appropriate.
What This Guide Covers
- Transport primitives: WebSockets, Server-Sent Events, and polling compared across latency, infrastructure, and browser support dimensions.
- Connection management: state machines, heartbeats, reconnect with exponential backoff, and presence systems.
- Optimistic UI at real-time scale: reconciling client-predicted state with authoritative server state arriving asynchronously.
- Conflict resolution: last-write-wins, operational transforms, and CRDTs for eventually consistent collaborative data.
- Collaborative architecture: Yjs, awareness protocols, room abstraction, and managed real-time platforms.
- Offline synchronisation: mutation queues, background sync, and conflict detection when reconnecting after an outage.
- Scaling: why WebSocket servers are hard to scale horizontally, and how Redis pub/sub and managed platforms solve it.
WebSocket vs SSE vs Polling
Three primitives cover almost every real-time requirement in frontend systems. The choice between them is architectural: it affects your server infrastructure, your reconnection strategy, and the complexity of your client code.
| Dimension | WebSocket | Server-Sent Events | Polling |
|---|---|---|---|
| Direction | Bidirectional (full-duplex) | Server to client only | Client to server only |
| Protocol | WS / WSS (upgrade from HTTP) | HTTP/1.1, HTTP/2 native | Standard HTTP |
| Latency | Lowest (~1-5ms after handshake) | Low (~10-50ms) | Interval-bound (1s+) |
| Auto-reconnect | Manual implementation required | Built-in browser reconnect | Inherent (each request is independent) |
| HTTP/2 multiplexing | Not applicable (different protocol) | Yes, shares connection pool | Yes |
| Firewall / proxy | Sometimes blocked | Rarely blocked (plain HTTP) | Never blocked |
| Server scaling | Stateful: sticky sessions or pub/sub required | Stateful: same challenges as WebSocket | Stateless: any server can handle any request |
| Infrastructure cost | Persistent connection per client | Persistent connection per client | No persistent connections |
| Best for | Chat, gaming, collaborative editing, live cursors | Notifications, feeds, streaming AI, dashboards | Order status, low-frequency updates, simple setups |
Choose WebSocket When:
- The client sends data as frequently as the server (chat, gaming, collaboration)
- Sub-100ms latency is required in both directions
- You need a custom binary protocol or message framing
- You are building live cursors, shared drawing, or voice coordination
Choose SSE When:
- Updates flow only from server to client (feeds, notifications, AI streaming)
- You want HTTP/2 multiplexing and standard HTTP caching headers
- Firewall compatibility is a concern
- You want native browser reconnect without custom retry logic
Choose Polling When:
- Updates are infrequent (every 30s or more)
- Infrastructure simplicity is more important than latency
- The update cadence is irregular and unpredictable
- You need standard HTTP caching at CDN or browser layer
Consider Managed Real-Time When:
- You do not want to manage WebSocket server scaling
- You need presence, rooms, and pub/sub out of the box
- Options: Ably, Pusher Channels, Liveblocks, PartyKit, Supabase Realtime
- Cost of managed service is lower than engineering a scalable self-hosted solution
WebSocket Architecture
A WebSocket connection begins with an HTTP upgrade handshake and transitions to a persistent full-duplex TCP connection. From the browser's perspective it is an event-emitting object: you send messages and listen for incoming ones. The architecture challenge is everything around that: typed message protocols, state management, channel multiplexing, and resilient reconnection.
Typed Message Protocol
A typed message protocol is the contract between client and server over a WebSocket connection. Use discriminated unions so TypeScript can narrow the payload type from the type field without any casting.
// Every message over the wire carries an id for deduplication
// and a timestamp for ordering
export interface BaseMessage {
id: string; // Unique per message, used to deduplicate on reconnect
type: string;
ts: string; // ISO 8601, server-assigned
}
// Server -> Client messages
export interface ChatMessage extends BaseMessage {
type: "chat:message";
payload: { roomId: string; userId: string; text: string; };
}
export interface PresenceUpdate extends BaseMessage {
type: "presence:update";
payload: { userId: string; status: "online" | "away" | "offline"; };
}
export interface CursorMove extends BaseMessage {
type: "cursor:move";
payload: { userId: string; x: number; y: number; };
}
export interface PatchEvent extends BaseMessage {
type: "doc:patch";
payload: { docId: string; patch: unknown; version: number; };
}
export interface ServerAck extends BaseMessage {
type: "ack";
payload: { messageId: string; };
}
export interface ServerError extends BaseMessage {
type: "error";
payload: { code: string; message: string; };
}
// Client -> Server messages
export interface SendChat {
type: "chat:send";
payload: { roomId: string; text: string; };
}
export interface JoinRoom {
type: "room:join";
payload: { roomId: string; };
}
// Union types for type-safe dispatch and handling
export type ServerMessage =
| ChatMessage
| PresenceUpdate
| CursorMove
| PatchEvent
| ServerAck
| ServerError;
export type ClientMessage = SendChat | JoinRoom;Connection State Machine
import { useEffect, useRef, useCallback, useReducer } from "react";
import type { ServerMessage, ClientMessage } from "@/types/ws-protocol";
type ConnectionStatus =
| "idle"
| "connecting"
| "connected"
| "reconnecting"
| "disconnected";
interface ConnectionState {
status: ConnectionStatus;
attempt: number;
lastError: string | null;
}
type ConnectionAction =
| { type: "CONNECT" }
| { type: "CONNECTED" }
| { type: "DISCONNECTED"; error?: string }
| { type: "RECONNECT" }
| { type: "GIVE_UP" };
function reducer(state: ConnectionState, action: ConnectionAction): ConnectionState {
switch (action.type) {
case "CONNECT":
return { ...state, status: "connecting", lastError: null };
case "CONNECTED":
return { status: "connected", attempt: 0, lastError: null };
case "DISCONNECTED":
return { ...state, status: "disconnected", lastError: action.error ?? null };
case "RECONNECT":
return { ...state, status: "reconnecting", attempt: state.attempt + 1 };
case "GIVE_UP":
return { ...state, status: "disconnected" };
default:
return state;
}
}
const MAX_RETRIES = 8;
const BASE_DELAY_MS = 500;
const MAX_DELAY_MS = 30_000;
export function useWebSocketConnection(
url: string,
onMessage: (msg: ServerMessage) => void
) {
const [state, dispatch] = useReducer(reducer, {
status: "idle",
attempt: 0,
lastError: null,
});
const wsRef = useRef<WebSocket | null>(null);
const seenIds = useRef(new Set<string>());
const retryTimer = useRef<ReturnType<typeof setTimeout>>();
const connect = useCallback(() => {
dispatch({ type: "CONNECT" });
const ws = new WebSocket(url);
wsRef.current = ws;
ws.onopen = () => {
dispatch({ type: "CONNECTED" });
};
ws.onmessage = (event) => {
try {
const msg = JSON.parse(event.data) as ServerMessage;
// Deduplicate by message id to handle replay on reconnect
if (seenIds.current.has(msg.id)) return;
seenIds.current.add(msg.id);
// Trim the set to avoid unbounded memory growth
if (seenIds.current.size > 10_000) {
const oldest = [...seenIds.current].slice(0, 5_000);
oldest.forEach((id) => seenIds.current.delete(id));
}
onMessage(msg);
} catch {
// Malformed message: log and continue
}
};
ws.onclose = (event) => {
dispatch({ type: "DISCONNECTED", error: event.reason });
// Do not retry on clean close (code 1000) or auth failure (4001)
if (event.code === 1000 || event.code === 4001) return;
const attempt = state.attempt + 1;
if (attempt >= MAX_RETRIES) {
dispatch({ type: "GIVE_UP" });
return;
}
// Exponential backoff with jitter
const delay = Math.min(BASE_DELAY_MS * 2 ** attempt, MAX_DELAY_MS);
const jitter = Math.random() * delay * 0.2; // 20% jitter
dispatch({ type: "RECONNECT" });
retryTimer.current = setTimeout(connect, delay + jitter);
};
ws.onerror = () => {
// onerror always fires before onclose; handle in onclose
};
}, [url, onMessage, state.attempt]);
const send = useCallback((msg: ClientMessage) => {
if (wsRef.current?.readyState === WebSocket.OPEN) {
wsRef.current.send(JSON.stringify(msg));
}
}, []);
useEffect(() => {
connect();
return () => {
clearTimeout(retryTimer.current);
wsRef.current?.close(1000, "Component unmounted");
};
}, []);
return { status: state.status, lastError: state.lastError, send };
}Integrating with TanStack Query
import { useEffect } from "react";
import { useQueryClient } from "@tanstack/react-query";
import { useWebSocketConnection } from "./useWebSocketConnection";
import type { ServerMessage, ChatMessage } from "@/types/ws-protocol";
export function useRealtimeRoom(roomId: string) {
const queryClient = useQueryClient();
const handleMessage = useCallback((msg: ServerMessage) => {
switch (msg.type) {
case "chat:message":
// Append new message to the cached list
queryClient.setQueryData<ChatMessage[]>(
["chat", roomId, "messages"],
(old) => (old ? [...old, msg] : [msg])
);
break;
case "presence:update":
// Update a single user's presence in the presence map
queryClient.setQueryData<Record<string, string>>(
["chat", roomId, "presence"],
(old) => ({ ...old, [msg.payload.userId]: msg.payload.status })
);
break;
case "error":
// Surface server-side errors to the appropriate query
console.error("[WS Error]", msg.payload.code, msg.payload.message);
break;
}
}, [roomId, queryClient]);
const { status, send } = useWebSocketConnection(
`${process.env.NEXT_PUBLIC_WS_URL}/rooms/${roomId}`,
handleMessage
);
return { connectionStatus: status, send };
}Server-Sent Events
Server-Sent Events (SSE) deliver a stream of server-initiated events over a standard HTTP connection. Unlike WebSockets, the protocol is plain HTTP: no upgrade handshake, no custom framing, and full compatibility with HTTP/2 multiplexing and existing infrastructure like load balancers and CDN edge nodes. The browser automatically reconnects with the last event ID so the server can replay missed events without the client needing to implement any retry logic.
import { auth } from "@/lib/auth";
import { eventBus } from "@/lib/event-bus";
import { NextRequest } from "next/server";
export const runtime = "nodejs"; // Edge runtime does not support long-lived responses
export async function GET(
req: NextRequest,
{ params }: { params: { channel: string } }
) {
const session = await auth();
if (!session) {
return new Response("Unauthorized", { status: 401 });
}
const { channel } = params;
const encoder = new TextEncoder();
const stream = new ReadableStream({
start(controller) {
// Helper to send a named SSE event
function sendEvent(eventName: string, data: unknown) {
const payload =
`event: ${eventName}
` +
`data: ${JSON.stringify(data)}
` +
`id: ${Date.now()}
`;
controller.enqueue(encoder.encode(payload));
}
// Send a heartbeat comment every 15s to prevent proxy timeouts
const heartbeat = setInterval(() => {
controller.enqueue(encoder.encode(": heartbeat
"));
}, 15_000);
// Subscribe to the event bus for this channel
const unsubscribe = eventBus.subscribe(channel, (event) => {
sendEvent(event.type, event.payload);
});
// Clean up when the client disconnects
req.signal.addEventListener("abort", () => {
clearInterval(heartbeat);
unsubscribe();
controller.close();
});
// Send connection confirmation
sendEvent("connected", { channel, userId: session.user.id });
},
});
return new Response(stream, {
headers: {
"Content-Type": "text/event-stream",
"Cache-Control": "no-cache, no-transform",
"X-Accel-Buffering": "no", // Disable nginx buffering
Connection: "keep-alive",
},
});
}import { useEffect, useRef } from "react";
type SSEHandler<T> = (data: T) => void;
type SSEHandlers = Record<string, SSEHandler<unknown>>;
interface UseSSEOptions {
withCredentials?: boolean;
onConnect?: () => void;
onDisconnect?: () => void;
onError?: (error: Event) => void;
}
export function useSSE(url: string, handlers: SSEHandlers, opts: UseSSEOptions = {}) {
const handlersRef = useRef(handlers);
handlersRef.current = handlers; // Always up-to-date without re-creating the effect
useEffect(() => {
let es: EventSource;
let retryCount = 0;
let stopped = false;
function connect() {
es = new EventSource(url, { withCredentials: opts.withCredentials ?? true });
es.onopen = () => {
retryCount = 0;
opts.onConnect?.();
};
// Register all named event handlers
for (const [eventName, handler] of Object.entries(handlersRef.current)) {
es.addEventListener(eventName, (event: MessageEvent) => {
try {
handler(JSON.parse(event.data));
} catch {
handler(event.data);
}
});
}
es.onerror = (error) => {
opts.onError?.(error);
es.close();
opts.onDisconnect?.();
// EventSource auto-reconnects, but we layer our own backoff for hard failures
if (!stopped && retryCount < 10) {
const delay = Math.min(500 * 2 ** retryCount, 30_000);
retryCount++;
setTimeout(connect, delay);
}
};
}
connect();
return () => {
stopped = true;
es?.close();
};
}, [url]); // Only reconnect if URL changes
}
// Usage:
// useSSE("/api/events/notifications", {
// notification: (data) => addNotification(data as Notification),
// "notifications:clear": () => clearAll(),
// });Polling Strategies
Polling is underrated. For updates that arrive every 30 seconds or when the exact arrival time is unpredictable, polling is simpler to reason about, easier to scale, and compatible with standard HTTP caching. The key is making polling smart: it should adapt to context, stop when the data is no longer relevant, and use conditional requests to avoid transferring unchanged payloads.
import { useQuery } from "@tanstack/react-query";
import { useEffect, useRef } from "react";
type TerminalStatus = "delivered" | "failed" | "cancelled" | "refunded";
const TERMINAL: Set<string> = new Set(["delivered", "failed", "cancelled", "refunded"]);
interface OrderStatusResponse {
status: string;
updatedAt: string;
}
export function useOrderStatus(orderId: string) {
const isHidden = useRef(false);
// Track tab visibility to reduce polling when the tab is backgrounded
useEffect(() => {
function handleVisibility() {
isHidden.current = document.visibilityState === "hidden";
}
document.addEventListener("visibilitychange", handleVisibility);
return () => document.removeEventListener("visibilitychange", handleVisibility);
}, []);
return useQuery({
queryKey: ["orders", orderId, "status"],
queryFn: () =>
fetch(`/api/orders/${orderId}/status`).then((r) => r.json()) as Promise<OrderStatusResponse>,
refetchInterval: (query) => {
// Stop polling when the order reaches a terminal state
const status = query.state.data?.status;
if (status && TERMINAL.has(status)) return false;
// Poll less frequently when the tab is hidden
return isHidden.current ? 60_000 : 10_000;
},
// Re-fetch immediately when the tab regains focus
refetchOnWindowFocus: true,
// Keep showing stale data while the refetch is in flight
staleTime: 0,
});
}import { NextRequest, NextResponse } from "next/server";
import { getOrderStatus } from "@/lib/orders";
import { createETag } from "@/lib/etag";
export async function GET(
req: NextRequest,
{ params }: { params: { id: string } }
) {
const order = await getOrderStatus(params.id);
const etag = createETag(order);
// Conditional request: return 304 if data has not changed
// This avoids transferring the payload on every poll tick
if (req.headers.get("If-None-Match") === etag) {
return new Response(null, { status: 304 });
}
return NextResponse.json(order, {
headers: {
ETag: etag,
"Cache-Control": "no-cache", // Revalidate on every request, but use ETag
},
});
}Connection State Management
A real-time connection is not binary (connected or not). It passes through a lifecycle: idle before the first connect, connecting during the handshake, connected when open, reconnecting during retry delays after a drop, and disconnected after giving up. Exposing this state to the UI is what allows you to show meaningful feedback rather than silently showing stale data.
Heartbeat and Keepalive
Proxies, load balancers, and mobile networks terminate idle TCP connections after a timeout that is often shorter than you expect (60 seconds is common). A heartbeat prevents these silent disconnections by sending a ping at regular intervals and expecting a pong back.
export class WebSocketHeartbeat {
private pingTimer: ReturnType<typeof setInterval> | null = null;
private pongTimeout: ReturnType<typeof setTimeout> | null = null;
private readonly PING_INTERVAL = 25_000; // Ping every 25s
private readonly PONG_TIMEOUT = 5_000; // Consider dead if no pong in 5s
constructor(
private ws: WebSocket,
private onDead: () => void
) {}
start() {
this.pingTimer = setInterval(() => {
if (this.ws.readyState !== WebSocket.OPEN) return;
this.ws.send(JSON.stringify({ type: "ping" }));
this.pongTimeout = setTimeout(() => {
// No pong received: close and trigger reconnect
this.ws.close(4000, "Ping timeout");
this.onDead();
}, this.PONG_TIMEOUT);
}, this.PING_INTERVAL);
}
receivedPong() {
if (this.pongTimeout) {
clearTimeout(this.pongTimeout);
this.pongTimeout = null;
}
}
stop() {
if (this.pingTimer) clearInterval(this.pingTimer);
if (this.pongTimeout) clearTimeout(this.pongTimeout);
}
}Presence System
A presence system tracks which users are currently online, away, or offline within a shared context (a room, a document, a page). The standard pattern is: on connect broadcast your status, on disconnect the server broadcasts your offline status, and periodically send an “I am still here” heartbeat to move from “online” to “away” after inactivity.
import { useEffect, useRef } from "react";
type PresenceStatus = "online" | "away" | "offline";
interface UsePresenceOptions {
sendStatus: (status: PresenceStatus) => void;
awayAfterMs?: number;
}
export function usePresence({ sendStatus, awayAfterMs = 120_000 }: UsePresenceOptions) {
const awayTimer = useRef<ReturnType<typeof setTimeout>>();
const currentStatus = useRef<PresenceStatus>("online");
function setStatus(status: PresenceStatus) {
if (currentStatus.current === status) return;
currentStatus.current = status;
sendStatus(status);
}
function resetAwayTimer() {
clearTimeout(awayTimer.current);
if (currentStatus.current !== "online") setStatus("online");
awayTimer.current = setTimeout(() => setStatus("away"), awayAfterMs);
}
useEffect(() => {
const events = ["mousemove", "keydown", "touchstart", "scroll", "click"];
events.forEach((e) => window.addEventListener(e, resetAwayTimer, { passive: true }));
function handleVisibility() {
if (document.visibilityState === "hidden") {
setStatus("away");
} else {
resetAwayTimer();
}
}
document.addEventListener("visibilitychange", handleVisibility);
setStatus("online");
resetAwayTimer();
return () => {
clearTimeout(awayTimer.current);
events.forEach((e) => window.removeEventListener(e, resetAwayTimer));
document.removeEventListener("visibilitychange", handleVisibility);
};
}, []);
}Optimistic UI at Real-Time Scale
Optimistic updates become more complex in a real-time context because the authoritative state can arrive from two sources: the server's response to your own mutation, and real-time events from other clients. An optimistic update you applied to the local cache may be overwritten by a real-time event before the server has even responded to your request.
The Problem
You optimistically mark a task “done”. Before the server responds, a real-time event arrives showing another user changed the same task. Your optimistic state was based on a version that is now stale. Applying the real-time event would silently discard your change.
The Solution
Track pending mutations by operation ID. When a real-time event arrives, check if there are pending mutations for the same entity. If so, defer applying the event until the mutation settles (confirmed or rolled back), then apply the event on top of the confirmed state.
interface PendingMutation<T> {
operationId: string;
entityId: string;
optimisticValue: T;
timestamp: number;
}
// A simple in-memory store for pending mutations
// In a more complex app, this would live in Zustand or a context
const pending = new Map<string, PendingMutation<unknown>>();
export const optimisticQueue = {
add<T>(entityId: string, optimisticValue: T): string {
const operationId = crypto.randomUUID();
pending.set(operationId, {
operationId,
entityId,
optimisticValue,
timestamp: Date.now(),
});
return operationId;
},
remove(operationId: string) {
pending.delete(operationId);
},
hasPendingFor(entityId: string): boolean {
for (const m of pending.values()) {
if (m.entityId === entityId) return true;
}
return false;
},
// Call this when a real-time event arrives for an entity that has pending mutations.
// Returns true if the event was deferred (should not be applied yet).
shouldDefer(entityId: string): boolean {
return this.hasPendingFor(entityId);
},
};import { useMutation, useQueryClient } from "@tanstack/react-query";
import { optimisticQueue } from "@/lib/optimistic-queue";
import type { Task } from "@/types/api";
export function useCompleteTask(taskId: string) {
const queryClient = useQueryClient();
return useMutation({
mutationFn: () =>
fetch(`/api/tasks/${taskId}/complete`, { method: "POST" }).then((r) => r.json()),
onMutate: async () => {
await queryClient.cancelQueries({ queryKey: ["tasks", taskId] });
const snapshot = queryClient.getQueryData<Task>(["tasks", taskId]);
const operationId = optimisticQueue.add(taskId, { status: "done" });
queryClient.setQueryData<Task>(["tasks", taskId], (old) =>
old ? { ...old, status: "done" } : old
);
return { snapshot, operationId };
},
onError: (_err, _vars, context) => {
if (context?.snapshot) {
queryClient.setQueryData(["tasks", taskId], context.snapshot);
}
if (context?.operationId) {
optimisticQueue.remove(context.operationId);
}
},
onSuccess: (_data, _vars, context) => {
if (context?.operationId) {
optimisticQueue.remove(context.operationId);
}
},
onSettled: () => {
queryClient.invalidateQueries({ queryKey: ["tasks", taskId] });
},
});
}
// In the WebSocket message handler, defer events for entities with pending mutations:
// if (optimisticQueue.shouldDefer(msg.payload.taskId)) {
// deferredEvents.push(msg);
// return;
// }
// After the mutation settles, replay deferred events.Real-Time State Synchronisation
The core challenge of real-time state synchronisation is maintaining a consistent view of shared state across clients when updates arrive asynchronously and out of order. Two patterns dominate in 2026: using TanStack Query as the real-time cache (simple, most apps), and a normalised entity store with a Zustand slice updated by real-time events (complex, high-frequency).
TanStack Query as the Real-Time Cache
For most applications, TanStack Query's setQueryData is all you need. Real-time events update the cache directly, and components re-render because they are subscribed to those query keys. This requires no separate state management layer.
import { QueryClient } from "@tanstack/react-query";
import type { ServerMessage } from "@/types/ws-protocol";
// A single function that routes incoming real-time events to the correct query keys.
// Register this as the onMessage handler for your WebSocket or SSE connection.
export function createRealtimeSyncHandler(queryClient: QueryClient) {
return function handleServerMessage(msg: ServerMessage) {
switch (msg.type) {
case "chat:message":
queryClient.setQueryData<ChatMessage[]>(
["rooms", msg.payload.roomId, "messages"],
(old) => {
if (!old) return [msg];
// Prevent duplicates on reconnect replay
if (old.some((m) => m.id === msg.id)) return old;
return [...old, msg];
}
);
break;
case "presence:update":
queryClient.setQueryData<Record<string, string>>(
["presence", msg.payload.userId.split(":")[0]], // roomId from userId scope
(old) => ({ ...old, [msg.payload.userId]: msg.payload.status })
);
break;
case "doc:patch":
// For document patches, invalidate and re-fetch rather than applying the patch
// client-side to avoid implementing an OT engine in the frontend
queryClient.invalidateQueries({ queryKey: ["docs", msg.payload.docId] });
break;
default:
// Unhandled message types: log in development only
if (process.env.NODE_ENV === "development") {
console.warn("[RealtimeSync] Unhandled message type:", (msg as ServerMessage).type);
}
}
};
}High-Frequency Updates with Zustand
For high-frequency events like cursor positions or typing indicators that update many times per second, TanStack Query introduces unnecessary overhead (re-fetching, cache GC, devtools updates). A Zustand slice with subscribeWithSelectorprovides surgical re-renders with minimal overhead.
import { create } from "zustand";
import { subscribeWithSelector } from "zustand/middleware";
import { throttle } from "lodash-es";
interface CursorPosition { x: number; y: number; }
interface CursorsState {
// userId -> position
cursors: Record<string, CursorPosition>;
setCursor: (userId: string, pos: CursorPosition) => void;
removeCursor: (userId: string) => void;
}
export const useCursorsStore = create<CursorsState>()(
subscribeWithSelector((set) => ({
cursors: {},
setCursor: (userId, pos) =>
set((state) => ({
cursors: { ...state.cursors, [userId]: pos },
})),
removeCursor: (userId) =>
set((state) => {
const { [userId]: _, ...rest } = state.cursors;
return { cursors: rest };
}),
}))
);
// Throttle incoming cursor events to 60fps maximum before writing to the store
export const handleCursorMove = throttle((userId: string, x: number, y: number) => {
useCursorsStore.getState().setCursor(userId, { x, y });
}, 16); // ~60fps
// A component subscribes only to a specific user's cursor to minimise re-renders
export function useUserCursor(userId: string): CursorPosition | undefined {
return useCursorsStore((state) => state.cursors[userId]);
}Conflict Resolution
When multiple clients can modify the same data concurrently, conflicts are inevitable. A conflict occurs when two clients apply changes to the same entity based on the same prior state, producing divergent versions that must be reconciled. The resolution strategy you choose has consequences for data integrity, implementation complexity, and user experience.
| Strategy | How It Works | Data Loss Risk | Complexity | Best For |
|---|---|---|---|---|
| Last Write Wins (LWW) | Highest timestamp or version wins | High (concurrent writes lost silently) | Very low | Presence status, cursor position, settings |
| Server Wins | Server state always overrides client optimistic state | Medium (user edits reverted) | Low | Order status, payment state, inventory |
| Client Wins | Client optimistic state is reapplied after server response | Medium (concurrent edits lost) | Medium | Single-user editing, form submissions |
| Operational Transform (OT) | Transform concurrent operations against each other before applying | None (all changes preserved) | Very high | Text documents (Google Docs model) |
| CRDT | Data structures that merge deterministically without a server | None (convergent by design) | Medium-High | Text, lists, counters, sets in collaborative apps |
Version-Based Conflict Detection
// Attach a version to every mutable entity
export interface Versioned<T> {
data: T;
version: number; // Monotonically increasing server-assigned version
updatedAt: string; // ISO 8601 for human-readable ordering
}
// When sending a mutation, include the version the client based the change on
export interface MutationWithVersion<T> {
payload: T;
baseVersion: number; // The version the client had when they started editing
}
// The server compares baseVersion to the current version:
// - If they match: no concurrent modification, apply the change
// - If they differ: conflict, return 409 with the current state for the client to resolve
// Client-side conflict handler
export async function submitWithConflictResolution<T>(
url: string,
mutation: MutationWithVersion<T>,
onConflict: (serverVersion: Versioned<T>, clientChange: T) => T
): Promise<Versioned<T>> {
const res = await fetch(url, {
method: "PATCH",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(mutation),
});
if (res.status === 409) {
// Server returned the current state alongside the conflict
const serverVersion: Versioned<T> = await res.json();
// Ask the application layer to resolve the conflict
const resolved = onConflict(serverVersion, mutation.payload);
// Retry with the resolved value based on the server's current version
return submitWithConflictResolution(
url,
{ payload: resolved, baseVersion: serverVersion.version },
onConflict
);
}
if (!res.ok) throw new Error(`Mutation failed: ${res.status}`);
return res.json();
}Collaborative Application Architecture
Building a genuinely collaborative experience, where multiple users can edit the same document, canvas, or data structure concurrently without conflicts, requires a fundamentally different data model than a traditional CRUD application. The 2026 standard for collaborative rich text and structured documents is Yjs, a CRDT-based shared data structure library with providers for synchronisation over WebSocket, WebRTC, and IndexedDB.
import * as Y from "yjs";
import { WebsocketProvider } from "y-websocket";
import { IndexeddbPersistence } from "y-indexeddb";
export function createCollaborativeDoc(docId: string, userId: string) {
// The Y.Doc is the shared CRDT document
const doc = new Y.Doc();
// IndexedDB persistence: the document survives page refreshes and offline periods
const localProvider = new IndexeddbPersistence(`doc-${docId}`, doc);
// WebSocket provider: synchronises with the server and other clients in real time
const wsProvider = new WebsocketProvider(
process.env.NEXT_PUBLIC_YJS_SERVER!,
docId,
doc,
{
connect: true,
// Resync all content when reconnecting after a gap
resyncInterval: 5_000,
}
);
// The awareness protocol carries ephemeral per-user state (cursor, selection, name)
const { awareness } = wsProvider;
awareness.setLocalStateField("user", {
id: userId,
name: "Current User",
color: generateUserColor(userId),
cursor: null,
});
return {
doc,
wsProvider,
localProvider,
awareness,
// Shared data types within the document
text: doc.getText("content"),
comments: doc.getArray("comments"),
metadata: doc.getMap("metadata"),
destroy() {
awareness.setLocalState(null); // Signal offline to other users
wsProvider.destroy();
localProvider.destroy();
},
};
}
function generateUserColor(userId: string): string {
// Deterministic colour from userId so the same user always gets the same colour
const hash = userId.split("").reduce((acc, c) => acc + c.charCodeAt(0), 0);
const hue = hash % 360;
return `hsl(${hue}, 70%, 55%)`;
}import { useEditor, EditorContent } from "@tiptap/react";
import StarterKit from "@tiptap/starter-kit";
import Collaboration from "@tiptap/extension-collaboration";
import CollaborationCursor from "@tiptap/extension-collaboration-cursor";
import { createCollaborativeDoc } from "@/lib/collaboration/doc-provider";
import { useEffect, useRef } from "react";
interface Props {
docId: string;
userId: string;
userName: string;
}
export function CollaborativeEditor({ docId, userId, userName }: Props) {
const collab = useRef(createCollaborativeDoc(docId, userId));
useEffect(() => {
return () => collab.current.destroy();
}, [docId]);
const editor = useEditor({
extensions: [
StarterKit.configure({ history: false }), // Yjs manages history
Collaboration.configure({ document: collab.current.doc }),
CollaborationCursor.configure({
provider: collab.current.wsProvider,
user: { name: userName, color: "#0ea5e9" },
// Render each remote user's cursor with their name label
render: (user) => {
const cursor = document.createElement("span");
cursor.classList.add("collaboration-cursor__caret");
cursor.style.borderColor = user.color;
const label = document.createElement("div");
label.classList.add("collaboration-cursor__label");
label.style.backgroundColor = user.color;
label.textContent = user.name;
cursor.appendChild(label);
return cursor;
},
}),
],
editorProps: {
attributes: { class: "prose focus:outline-none" },
},
});
return <EditorContent editor={editor} />;
}Managed Real-Time Platforms
Self-hosting a WebSocket server with Yjs requires managing sticky sessions, Redis pub/sub for fan-out, and a persistence layer for document state. Managed platforms abstract this entirely:
| Platform | Best For | CRDT Support | Presence | Self-Host Option |
|---|---|---|---|---|
| Liveblocks | Collaborative SaaS products | Yes (Yjs + custom) | Yes | No |
| PartyKit | Multiplayer web apps, Cloudflare Workers-native | Yes (via Yjs) | Yes | Yes (open source) |
| Ably | Pub/sub at scale, Spaces API for presence | Limited (LiveSync) | Yes (Spaces) | No |
| Supabase Realtime | Postgres-backed real-time with broadcast + presence | No | Yes | Yes (open source) |
Real-Time Caching Strategies
Caching in a real-time system is more nuanced than in a request/response API because the cache can be invalidated by events that arrive outside the normal query lifecycle. The key is treating real-time events as cache mutations rather than as signals to refetch: apply the delta directly to the cache when you have the full event payload, and invalidate to trigger a refetch only when the event carries an incomplete picture.
| Event Type | Cache Strategy | Why |
|---|---|---|
| Full entity update | setQueryData(key, newEntity) | Payload is complete: apply directly, no round-trip |
| Partial field update | setQueryData(key, merge(old, patch)) | Merge patch into cached entity |
| Entity created | Prepend to list + setQueryData for detail | Avoid invalidating the whole list |
| Entity deleted | Filter from list + removeQueries for detail | Remove stale cache entries immediately |
| Aggregate changed | invalidateQueries(key) | Cannot reconstruct aggregate client-side safely |
| High-frequency delta | Throttled Zustand write, not query cache | TanStack Query has overhead unsuitable for 60fps events |
import { QueryClient } from "@tanstack/react-query";
export function applyRealtimeCacheOperation(
queryClient: QueryClient,
event: {
operation: "create" | "update" | "delete";
entity: string;
id: string;
payload?: unknown;
}
) {
const detailKey = [event.entity, event.id];
const listKey = [event.entity];
switch (event.operation) {
case "create":
// Set the detail cache entry
if (event.payload) {
queryClient.setQueryData(detailKey, event.payload);
}
// Prepend to any active list queries without refetching
queryClient.setQueriesData<{ data: unknown[]; meta: unknown }>(
{ queryKey: listKey, exact: false },
(old) => {
if (!old || !event.payload) return old;
return { ...old, data: [event.payload, ...old.data] };
}
);
break;
case "update":
if (event.payload) {
// Update detail
queryClient.setQueryData(detailKey, event.payload);
// Patch matching items in all list caches
queryClient.setQueriesData<{ data: unknown[]; meta: unknown }>(
{ queryKey: listKey, exact: false },
(old) => {
if (!old) return old;
return {
...old,
data: old.data.map((item) =>
(item as { id: string }).id === event.id ? event.payload : item
),
};
}
);
}
break;
case "delete":
queryClient.removeQueries({ queryKey: detailKey });
queryClient.setQueriesData<{ data: unknown[]; meta: unknown }>(
{ queryKey: listKey, exact: false },
(old) => {
if (!old) return old;
return {
...old,
data: old.data.filter((item) => (item as { id: string }).id !== event.id),
};
}
);
break;
}
}Offline Synchronisation
An offline-capable application continues to work without a network connection and reconciles local changes with the server when connectivity is restored. This requires three things: a local persistence layer to queue mutations, detection of online/offline transitions, and a sync strategy that handles conflicts between offline mutations and changes that happened on the server while the client was disconnected.
import { openDB, IDBPDatabase } from "idb";
interface QueuedMutation {
id: string;
url: string;
method: string;
body: string;
timestamp: number;
retryCount: number;
}
const DB_NAME = "offline-mutations";
const STORE = "queue";
async function getDb(): Promise<IDBPDatabase> {
return openDB(DB_NAME, 1, {
upgrade(db) {
db.createObjectStore(STORE, { keyPath: "id" });
},
});
}
export const mutationQueue = {
async enqueue(url: string, method: string, body: unknown): Promise<string> {
const db = await getDb();
const id = crypto.randomUUID();
await db.put(STORE, {
id,
url,
method,
body: JSON.stringify(body),
timestamp: Date.now(),
retryCount: 0,
});
return id;
},
async getAll(): Promise<QueuedMutation[]> {
const db = await getDb();
return db.getAll(STORE);
},
async remove(id: string): Promise<void> {
const db = await getDb();
await db.delete(STORE, id);
},
async incrementRetry(id: string): Promise<void> {
const db = await getDb();
const item = await db.get(STORE, id);
if (item) {
await db.put(STORE, { ...item, retryCount: item.retryCount + 1 });
}
},
};import { mutationQueue } from "./mutation-queue";
type SyncResult =
| { id: string; status: "ok" }
| { id: string; status: "conflict"; serverData: unknown }
| { id: string; status: "error"; reason: string };
export async function syncOfflineMutations(
onConflict: (mutation: { url: string; body: unknown }, serverData: unknown) => unknown
): Promise<SyncResult[]> {
const pending = await mutationQueue.getAll();
if (pending.length === 0) return [];
// Process in chronological order to preserve causal ordering
const sorted = pending.sort((a, b) => a.timestamp - b.timestamp);
const results: SyncResult[] = [];
for (const mutation of sorted) {
try {
const res = await fetch(mutation.url, {
method: mutation.method,
headers: { "Content-Type": "application/json" },
body: mutation.body,
});
if (res.ok) {
await mutationQueue.remove(mutation.id);
results.push({ id: mutation.id, status: "ok" });
continue;
}
if (res.status === 409) {
// Server reports a conflict: let the application layer resolve it
const serverData = await res.json();
const resolved = onConflict(
{ url: mutation.url, body: JSON.parse(mutation.body) },
serverData
);
// Retry with the resolved payload
const retryRes = await fetch(mutation.url, {
method: mutation.method,
headers: { "Content-Type": "application/json" },
body: JSON.stringify(resolved),
});
if (retryRes.ok) {
await mutationQueue.remove(mutation.id);
results.push({ id: mutation.id, status: "conflict", serverData });
} else {
results.push({ id: mutation.id, status: "error", reason: `${retryRes.status}` });
}
continue;
}
await mutationQueue.incrementRetry(mutation.id);
results.push({ id: mutation.id, status: "error", reason: `HTTP ${res.status}` });
} catch (err) {
await mutationQueue.incrementRetry(mutation.id);
results.push({ id: mutation.id, status: "error", reason: String(err) });
}
}
return results;
}
// Wire up the sync engine to the online event
// window.addEventListener("online", () => syncOfflineMutations(resolveConflict));import { useMutation, useQueryClient } from "@tanstack/react-query";
import { mutationQueue } from "@/lib/offline/mutation-queue";
interface OfflineMutationOptions<TInput> {
url: string;
method?: string;
onSuccess?: () => void;
queryKeysToInvalidate?: unknown[][];
}
export function useOfflineMutation<TInput>(opts: OfflineMutationOptions<TInput>) {
const queryClient = useQueryClient();
return useMutation({
mutationFn: async (input: TInput) => {
if (!navigator.onLine) {
// Offline: enqueue the mutation for later sync
await mutationQueue.enqueue(opts.url, opts.method ?? "POST", input);
// Return a synthetic response to allow optimistic updates to proceed
return { queued: true };
}
const res = await fetch(opts.url, {
method: opts.method ?? "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify(input),
});
if (!res.ok) throw new Error(`${res.status}`);
return res.json();
},
onSettled: () => {
opts.queryKeysToInvalidate?.forEach((key) => {
queryClient.invalidateQueries({ queryKey: key });
});
opts.onSuccess?.();
},
});
}Scaling Real-Time Systems
Scaling a real-time system is fundamentally different from scaling a REST API. A REST server is stateless: any server can handle any request. A WebSocket server is stateful: each connection is pinned to the server that accepted it. When a message needs to reach all clients in a room, it must reach clients on every server instance, not just the one that received the message.
The Problem: Sticky Sessions
With sticky sessions (routing the same client to the same server via a cookie or IP hash), scaling adds servers but each client is still pinned to one. A message from Client A on Server 1 will not reach Client B on Server 2 without a fan-out layer.
The Solution: Redis Pub/Sub
Every server subscribes to a shared Redis pub/sub channel for each room. When a message arrives on any server, it publishes to Redis. Every server receives the message from Redis and delivers it to its connected clients who are in that room.
import { createClient } from "redis";
const publisher = createClient({ url: process.env.REDIS_URL });
const subscriber = createClient({ url: process.env.REDIS_URL });
await publisher.connect();
await subscriber.connect();
// In-memory registry of active WebSocket connections per room on this server instance
const localConnections = new Map<string, Set<WebSocket>>();
export function registerConnection(roomId: string, ws: WebSocket) {
if (!localConnections.has(roomId)) {
localConnections.set(roomId, new Set());
// Subscribe to the Redis channel for this room when the first client joins
subscriber.subscribe(`room:${roomId}`, (message) => {
deliverToLocalClients(roomId, message);
});
}
localConnections.get(roomId)!.add(ws);
}
export function deregisterConnection(roomId: string, ws: WebSocket) {
const conns = localConnections.get(roomId);
if (!conns) return;
conns.delete(ws);
if (conns.size === 0) {
localConnections.delete(roomId);
subscriber.unsubscribe(`room:${roomId}`);
}
}
// Publish a message to all servers via Redis
export async function broadcastToRoom(roomId: string, message: unknown) {
await publisher.publish(`room:${roomId}`, JSON.stringify(message));
}
// Deliver a message to all clients connected to this server instance in the room
function deliverToLocalClients(roomId: string, rawMessage: string) {
const conns = localConnections.get(roomId);
if (!conns) return;
for (const ws of conns) {
if (ws.readyState === WebSocket.OPEN) {
ws.send(rawMessage);
}
}
}Connection Limits and Backpressure
Each WebSocket connection holds memory on the server and a file descriptor in the OS. A standard Node.js server handles 10,000-50,000 concurrent connections depending on message volume. At scale, distribute connections across a cluster and implement backpressure: if a client's message queue is growing (slow client on a fast connection), pause sending and resume when the queue drains.
Real-Time Anti-Patterns
Reconnecting Without Deduplication
When a WebSocket reconnects, the server may replay missed messages from the last known event ID. Without client-side deduplication by message ID, replayed messages appear twice in the UI. Always deduplicate by a stable message ID before applying a message to application state.
// WRONG: applies every incoming message without checking for duplicates
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
setMessages((prev) => [...prev, msg]); // Duplicates appear on reconnect
};
// CORRECT: deduplicate by stable message id
const seenIds = new Set<string>();
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (seenIds.has(msg.id)) return; // Replay on reconnect: skip
seenIds.add(msg.id);
setMessages((prev) => [...prev, msg]);
};Opening a New Connection Per Component
// WRONG: each component opens its own WebSocket connection
function ChatPanel() {
const ws = new WebSocket("wss://api.example.com/ws"); // New conn per mount
// ...
}
// CORRECT: share a single connection, subscribe components to a context or store
// The connection is opened once at the app level.
// Components consume events via a context or a pub/sub within the app.
// Option 1: React context
const WSContext = createContext<WSContextValue | null>(null);
export function WSProvider({ children }: { children: React.ReactNode }) {
const connection = useWebSocketConnection(WS_URL, handleMessage);
return <WSContext.Provider value={connection}>{children}</WSContext.Provider>;
}
// Option 2: Zustand store connected to a single WS instance (see section 08)Updating UI on Every Keystroke Over WebSocket
Sending every keystroke to the server (for typing indicators or collaborative editing) at full speed saturates the connection with low-value messages. Throttle or debounce outgoing events by type before sending.
// WRONG: sends every keystroke immediately
function handleInput(value: string) {
ws.send(JSON.stringify({ type: "doc:change", payload: { value } }));
}
// CORRECT: throttle expensive events, debounce infrequent ones
import { throttle, debounce } from "lodash-es";
// Cursor moves: throttle to 30fps (plenty for visible smoothness)
const sendCursorMove = throttle((x: number, y: number) => {
ws.send(JSON.stringify({ type: "cursor:move", payload: { x, y } }));
}, 33);
// Document changes: debounce to send at most once per 200ms idle period
const sendDocChange = debounce((content: string) => {
ws.send(JSON.stringify({ type: "doc:change", payload: { content } }));
}, 200);
// Typing indicator: send once on start, send stop after 3s idle
const sendTypingStop = debounce(() => {
ws.send(JSON.stringify({ type: "typing:stop" }));
}, 3_000);
function handleInput(value: string) {
ws.send(JSON.stringify({ type: "typing:start" })); // Only sends when idle > 3s
sendTypingStop();
sendDocChange(value);
}No Backoff on Reconnect
Reconnecting immediately and repeatedly after a failure floods the server at exactly the moment it is most likely to be under stress. Always use exponential backoff with jitter. The jitter spreads reconnect attempts from multiple clients across time, preventing a thundering herd from overwhelming a recovering server.
Real-Time in a Vite SPA
A Vite SPA connects to real-time systems directly from the browser: there is no server layer to proxy connections or manage subscriptions. The patterns are identical to those in Next.js on the client side. The main difference is how you obtain the WebSocket URL and authentication token, and whether you use a managed real-time platform to avoid operating your own WebSocket server.
| Concern | Next.js Approach | Vite SPA Approach |
|---|---|---|
| WS URL | Server env var, injected server-side | VITE_WS_URL in .env (public) |
| Auth token for WS | httpOnly session cookie sent with upgrade request | Bearer token in first WS message or URL query param |
| SSE endpoint | Next.js Route Handler (app/api) | Express/Fastify/Hono BFF or managed platform |
| Offline sync server | Next.js API route receives synced mutations | Same REST/tRPC API the SPA already uses |
Direct WebSocket in a Vite SPA
// In a Vite SPA, create the connection once at the module level or in a provider.
// The useWebSocketConnection hook from section 03 works identically here.
// The only change is the URL source:
const WS_URL = import.meta.env.VITE_WS_URL as string;
// Token from your in-memory auth store (never localStorage for real-time tokens)
const AUTH_TOKEN = authStore.getAccessToken();
// Authenticate over the WebSocket handshake via URL query param
// (only safe over WSS / TLS - the token is in the URL)
const AUTHENTICATED_URL = `${WS_URL}?token=${encodeURIComponent(AUTH_TOKEN)}`;
// Or, authenticate via first message after connecting (preferred):
// ws.onopen = () => ws.send(JSON.stringify({ type: "auth", token: AUTH_TOKEN }));
export const wsConnection = new WebSocketConnection(AUTHENTICATED_URL, handleMessage);Using a Managed Platform in a Vite SPA
import Ably from "ably";
// Obtain a token from your BFF rather than embedding the Ably API key in the browser
async function getAblyToken(): Promise<Ably.TokenRequest> {
const res = await fetch("/api/realtime/token", { credentials: "include" });
return res.json();
}
export const ablyClient = new Ably.Realtime({
authCallback: async (_tokenParams, callback) => {
try {
const tokenRequest = await getAblyToken();
callback(null, tokenRequest);
} catch (err) {
callback(err as Error, null);
}
},
});
// Subscribe to a channel with typed messages
export function subscribeToRoom(
roomId: string,
onMessage: (msg: unknown) => void
) {
const channel = ablyClient.channels.get(`rooms:${roomId}`);
channel.subscribe("message", (msg) => onMessage(msg.data));
return () => channel.unsubscribe();
}
// Publish a message to a room
export async function publishToRoom(roomId: string, data: unknown) {
const channel = ablyClient.channels.get(`rooms:${roomId}`);
await channel.publish("message", data);
}Wiring Real-Time Events into TanStack Query
import { useEffect } from "react";
import { useQueryClient } from "@tanstack/react-query";
import { subscribeToRoom } from "@/lib/realtime-ably";
import { applyRealtimeCacheOperation } from "@/lib/realtime-cache";
import { useAuthStore } from "@/stores/auth";
export function RealtimeProvider({ children }: { children: React.ReactNode }) {
const queryClient = useQueryClient();
const { user } = useAuthStore();
useEffect(() => {
if (!user) return;
// Subscribe to the user's personal channel for notifications
const unsubscribePersonal = subscribeToRoom(
`user:${user.id}`,
(event) => {
if (typeof event === "object" && event !== null && "operation" in event) {
applyRealtimeCacheOperation(
queryClient,
event as Parameters<typeof applyRealtimeCacheOperation>[1]
);
}
}
);
return () => {
unsubscribePersonal();
};
}, [user?.id, queryClient]);
return <>{children}</>;
}