Backpressure in gRPC streaming

HTTP/2 already does flow control. Each stream has a receive window that the receiver tops up with WINDOW_UPDATE frames as it reads. So in principle a slow consumer slows the producer — the producer's Send() call blocks once the window is empty.

In practice this is not enough on its own, for two reasons:

  1. The default window is small (64 KB per stream, 64 KB per connection on the gRPC-Go server) and the receiver tops it up greedily once you start reading. By the time you notice your consumer is behind, the in-flight bytes have already piled up in OS socket buffers and user-space queues.
  2. Your application-level queue between "decode RPC message" and "do something with it" is usually larger than the HTTP/2 window. Window-based backpressure stops at the framer; it does not push back on whatever your goroutine is doing.

What I do

For unary RPC nothing changes. For server-streaming, I treat the stream as a pull-based iterator on the server side — the next Send only happens after the producer is told to produce. The signal comes from a separate control channel embedded in the stream:

type Item struct { Seq uint64; Payload []byte }

// Each Send is preceded by a Recv on a small control stream
// that requests N items. This makes the consumer's rate
// observable on the server, not just observable to the framer.
func (s *Server) Subscribe(req *pb.SubReq, stream pb.Feed_SubscribeServer) error {
    pull := make(chan int, 1)
    go controlLoop(stream.Context(), pull)

    for {
        n, ok := <-pull
        if !ok { return nil }
        for i := 0; i < n; i++ {
            item, ok := s.producer.Next(stream.Context())
            if !ok { return nil }
            if err := stream.Send(item); err != nil { return err }
        }
    }
}

This is just the reactive-streams contract (Reactor, Akka, RxJava) ported to gRPC. The cost is one extra message round-trip per batch; in exchange the producer never builds a queue it cannot serve, and slow consumers cannot pull memory out from under fast ones.

Tuning the HTTP/2 windows is still useful

Even with pull-based control, I bump the server windows to ~1 MB per stream and ~4 MB per connection. The cost is memory; the win is that Send rarely blocks inside a single batch and the goroutine's tail-latency stays flat.

grpc.NewServer(
    grpc.InitialWindowSize(1 << 20),       // 1 MB per stream
    grpc.InitialConnWindowSize(4 << 20),   // 4 MB per conn
)

What I do not do

I do not rely on the bidirectional "BDP estimator" that gRPC-Go ships with. It adapts windows based on observed bandwidth-delay product, which is fine for throughput but it cannot tell the difference between "consumer is slow" and "link is slow." For backpressure I want the consumer's rate to be the only signal.

I also do not put a buffered channel between producer.Next and stream.Send. That hides the very signal I am trying to surface.