Building a real-time chat system that handles thousands of concurrent connections is a classic challenge in connection management. At Joyridez, we needed a solution that could scale efficiently without the overhead of threads or the complexity of callback-based state machines. After evaluating several approaches, we settled on C++20 coroutines. This article walks through our reasoning, the implementation details, and the lessons we learned along the way.
Why Coroutines for Real-Time Chat?
Real-time chat systems are fundamentally about managing many simultaneous connections. Each user sends messages, receives updates, and may disconnect at any time. Traditional approaches—one thread per connection or an event loop with callbacks—each have well-known drawbacks.
Thread-per-connection quickly exhausts memory as connections grow. Even with a thread pool, context switching overhead becomes significant. Callback-based async code, on the other hand, leads to what is often called 'callback hell'—deeply nested lambdas that are hard to read and maintain.
Coroutines offer a middle ground: they allow you to write asynchronous code that looks synchronous, while avoiding the resource overhead of threads. C++20 coroutines are stackless—they suspend execution at specific suspension points without consuming a full stack. This means you can have hundreds of thousands of coroutines active simultaneously, each holding minimal state.
For Joyridez, the primary motivation was developer productivity. Our chat system involves complex sequences of operations: authenticate user, join a room, subscribe to message stream, handle incoming messages, and manage heartbeats. With coroutines, each sequence reads like a linear function, with co_await marking the asynchronous steps. This dramatically reduced the number of bugs related to state transitions and resource leaks.
We also appreciated that coroutines integrate naturally with existing C++ code. You don't need to rewrite your entire codebase; you can wrap existing async primitives with a coroutine-compatible awaitable. This allowed us to adopt coroutines incrementally, starting with the connection handler and later extending to the message routing layer.
Comparison with Other Approaches
To give a balanced view, we compared coroutines against two alternatives: threads and traditional async with callbacks. Threads are simple to reason about but don't scale to high connection counts. Callbacks scale well but are error-prone. Coroutines combine the scalability of callbacks with the readability of threads. The trade-off is that coroutines require a compatible runtime and a deeper understanding of the language feature.
Core Mechanism: How Coroutines Enable Async I/O
At the heart of C++20 coroutines is the concept of an awaitable—any type that provides await_ready, await_suspend, and await_resume methods. When you write co_await some_awaitable, the compiler transforms the function into a state machine. The coroutine frame holds local variables and the suspension point.
In our chat system, the most common awaitable is a socket read or write operation. We built a thin abstraction over POSIX sockets (or Windows I/O completion ports) that returns an awaitable. When co_await is called, if the operation is not yet complete, the coroutine suspends and control returns to the event loop. Once the I/O completes, the event loop resumes the coroutine, which then continues execution as if the operation had been synchronous.
This pattern is similar to JavaScript's async/await, but at a lower level. The key difference is that C++ coroutines are zero-overhead in the sense that they don't allocate a stack per coroutine. The coroutine frame is heap-allocated (or custom-allocated) and contains only the state needed for the suspension points.
For Joyridez, the most important benefit was that we could write connection handlers that loop over messages without blocking a thread. A typical handler looks like:
while (true) {
auto message = co_await socket.read();
if (message.empty()) break;
co_await process_message(message);
co_await socket.write(response);
}This code is clean and easy to follow. The event loop underneath handles thousands of such coroutines, switching between them as I/O completes.
Memory Footprint
Each coroutine frame typically consumes a few hundred bytes. For 10,000 concurrent connections, that's a few megabytes—much less than the hundreds of megabytes required for thread stacks. This makes coroutines ideal for connection-heavy workloads.
Implementation Details: Building the Chat System
Our chat system architecture consists of three layers: the connection manager, the message router, and the application logic. The connection manager handles WebSocket upgrades and maintains a map of active sessions. The message router dispatches messages to the appropriate room or user. Application logic includes authentication, rate limiting, and message persistence.
We chose to implement the connection manager using a reactor pattern with epoll on Linux. The event loop is a simple while loop that calls epoll_wait and then resumes the coroutines associated with ready file descriptors.
Each connection is represented by a coroutine that handles the entire lifecycle: accept, handshake, read/write loop, and cleanup. The coroutine is launched when a new connection is accepted. It first performs the WebSocket handshake synchronously (since it's a short operation), then enters the read/write loop.
One challenge we faced was handling partial reads. TCP is a stream protocol, and a single read may not return a complete WebSocket frame. We solved this by implementing a buffered reader that accumulates data until a full frame is available. The reader itself is a coroutine that suspends until enough data arrives.
Code Walkthrough: Connection Handler
Here's a simplified version of our connection handler:
task handle_connection(socket_t socket) {
auto ws = co_await websocket_handshake(socket);
auto session = make_shared<session>(ws);
add_to_room(session, "general");
while (true) {
auto frame = co_await ws.read_frame();
if (frame.is_close()) break;
auto response = co_await process_frame(frame);
co_await ws.write_frame(response);
}
remove_from_room(session);
co_await ws.close();
}This code is straightforward. The task type is our coroutine return type, which is essentially a void-returning coroutine that can be awaited. The co_await on read_frame and write_frame suspends the coroutine until the I/O completes.
One subtlety: we use shared_ptr for the session object to ensure it lives as long as the coroutine. The coroutine frame captures the shared_ptr by value, so the session is kept alive even if the connection is closed externally.
Edge Cases and Exceptions
Real-world systems have many edge cases. Here are the ones that gave us the most trouble, and how we handled them.
Coroutine Cancellation
When a client disconnects, we need to cancel the coroutine that's handling that connection. C++20 coroutines don't have built-in cancellation. We implemented a cancellation token pattern: each coroutine checks a flag before each co_await. If the flag is set, it returns early. The event loop sets the flag when the socket is closed.
This approach requires cooperation from the coroutine—if it's stuck in a long computation, it won't check the flag. In practice, our I/O-bound coroutines check the flag frequently enough.
Backpressure
If a client sends messages faster than the server can process them, we need backpressure. We implemented a bounded queue for outgoing messages. If the queue is full, we drop the oldest message or close the connection. Coroutines make this easy: the write function co_awaits until space is available.
We also added a timeout on co_await to prevent a slow client from holding up resources indefinitely. The awaitable checks a deadline and resumes with an error if the timeout expires.
Resource Cleanup
Coroutine frames are heap-allocated, so proper cleanup is essential. We use RAII extensively: the session object's destructor removes it from the room and closes the socket. The coroutine frame holds the session via shared_ptr, so when the coroutine finishes (normally or via cancellation), the session is released.
One gotcha: if a coroutine is destroyed without being resumed (e.g., if the event loop shuts down), the destructor of the coroutine frame must call the destructors of all captured objects. This works automatically in C++20.
Limitations of the Approach
Coroutines are not a silver bullet. We encountered several limitations during development.
Debugging Difficulty
Debugging coroutines is harder than debugging synchronous code. The call stack is flattened—you only see the current coroutine's frame, not the chain of calls that led to it. Tools like GDB have improved, but breakpoints inside coroutines can be confusing. We mitigated this by logging coroutine IDs and using structured logging to trace the flow.
Compiler Support
While C++20 coroutines are standardized, compiler support is still maturing. We used GCC 11 and Clang 14, both of which worked well. However, we encountered bugs in earlier versions, particularly around the noexcept specification and the promise_type interface. We recommend using the latest compilers and testing thoroughly.
Ecosystem Integration
Not all C++ libraries are coroutine-aware. We had to write adapters for our database client and HTTP library. This required understanding the library's async model and wrapping it in an awaitable. Over time, more libraries are adding native coroutine support, but the ecosystem is not yet mature.
Performance Overhead
Coroutines have some overhead: allocation of the coroutine frame, and the state machine dispatch. In our benchmarks, a coroutine-based echo server handled about 90% of the throughput of a well-tuned callback-based server. For most chat applications, this is acceptable. The productivity gains far outweigh the small performance penalty.
Reader FAQ
Do coroutines use threads?
No, coroutines themselves are not threads. They are functions that can suspend and resume. They run on a single thread (or a pool of threads) managed by the event loop. Multiple coroutines can be interleaved on the same thread, yielding at co_await points.
Can I use coroutines with existing thread pools?
Yes. You can have a coroutine that co_awaits a task that runs on a thread pool. The coroutine suspends until the task completes, then resumes on the event loop thread. This is useful for CPU-bound work.
How do I handle errors in coroutines?
Exceptions work normally in coroutines. If a coroutine throws an unhandled exception, it propagates to the resumer (the event loop). We catch exceptions in the top-level coroutine and log them, then close the connection.
What is the memory overhead per coroutine?
Typically a few hundred bytes, depending on the number of local variables and the size of the coroutine frame. You can customize allocation using the promise_type's operator new.
Are coroutines suitable for high-frequency trading?
Probably not. The overhead of suspension and resumption adds microseconds of latency. For sub-microsecond requirements, you'd need a different approach. For most network services, the latency is negligible.
Practical Takeaways
After building our chat system with C++20 coroutines, we have several concrete recommendations for teams considering this approach.
First, start with a small prototype. Convert one connection handler to use coroutines and measure the impact. This will give you confidence and help you identify tooling gaps.
Second, invest in good logging. Since the call stack is flattened, you need to track coroutine identity manually. We added a coroutine_id to each session and included it in every log line.
Third, handle cancellation early. Design your coroutines to be cancel-safe from the start. Refactoring later is painful.
Fourth, use RAII for all resources. Coroutine frames can be destroyed without resuming, so make sure destructors clean up properly.
Finally, stay up to date with compiler releases. Coroutine support improves with each version, and bug fixes are frequent.
Coroutines are not the only way to build a chat system, but for Joyridez, they struck the right balance between developer productivity and runtime efficiency. If you're managing thousands of connections and value readable code, they are worth a serious look.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!