Distributed Real-Time Chunking

Abstract

Action chunking policies are increasingly run on remote servers due to model size, hardware constraints on edge devices, and cost constraints. Async inference has become a common strategy for enabling smooth action trajectories and closing the gap between action chunks due to inference and network latency. Existing approaches such as Real-Time Execution of Action Chunking Flow Policies (Black, Galliker, and Levine 2025), SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics (Shukor et al. 2025) address different problems within async inference. RTC in-painting focuses on the problem of trajectory discontinuities and mode switching, while SmolVLA Async Inference addresses a distributed-client server architecture. To our knowledge, what is currently missing from the literature is a unified async inference approach that combines RTC in-painting, a well defined distributed architecture, and resilient behavior under unreliable communication channels. We present Distributed Real Time Chunking (DRTC), an RTC compatible approach for distributed client-server scenarios that is designed to handle these failure modes, and we evaluate DRTC’s behavior under injected faults. A cooldown mechanism enables recovery from lost and delayed messages. Thread message passing and action schedule merging are modeled as Last Write Wins (LWW) registers from the CRDT (Shapiro et al. 2011) literature, and described analytically by the semilattice join. The semilattice join operation absorbs reordered and duplicated messages and ensures monotone data-flow from observation to action execution.

A LeRobot (Cadene et al. 2024) based implementation of DRTC is available at https://github.com/jackvial/drtc.

Introduction

Action chunking has become a prominent strategy in robotics learning (ACT (Zhao et al. 2023), \(\pi0\) (Black, Galliker, and Levine 2025), SmolVLA (Shukor et al. 2025), HIL-SERL (Luo et al. 2024), \(\pi^*_{0.6}\) (Amin et al. 2025)). One of the driving factors towards action chunking has been the increasing size of models, and with that increased inference times. Large models are often impractical to run onboard robots, and especially on mobile robots where power, space, and cost are constraining factors. These constraints, in part, have led to client-server architectures where the policy runs on a remote GPU server receiving observations from the robot client and sending action chunks in return.

While action chunking provides smooth operation across individual actions, inference latency greater than the control rate will cause gaps between action chunks. This latency gap is further increased by network round trip time in the distributed client-server case. Real-Time Execution of Action Chunking Flow Policies (Black, Galliker, and Levine 2025), and Async Inference (Shukor et al. 2025) address the latency gap problem by initiating inference asynchronously so that the next action chunk arrives before the current chunk has been exhausted. While RTC effectively enables async inference and addresses the problems of trajectories discontinuities and mode switching, it does not describe the specifics of distributed client-server architecture. SmolVLA Async inference does define a distributed architecture but it does not address the problem of trajectories discontinuities and mode switching. Neither RTC nor SmolVLA Async Inference address unreliable communication channel failure modes.

DRTC extends RTC to a distributed client-server architecture that specifically describes client-server responsibilities, and behavior under unreliable communication channels.

A client-side cooldown augmented inference request trigger is introduced that both effectively manages inference request cadence and allows for graceful recovery from delayed and lost action chunk results.

We model thread message passing and schedule merging as Last Write Wins (LWW) registers from the CRDT (Shapiro et al. 2011) literature, and describe both analytically as the semilattice join. The properties of the semilattice join provide the desired characteristics for handling message reordering, duplication, and ensuring the action schedule converges to monotone updates relative to observations.

Previous chunk access, required for RTC in-painting is supported via a server-side action chunk buffer allowing the action schedule to be reconstructed server-side based on indices sent from the client.

Our latency estimation employs a Jacobson Karels estimator that has preferable recovery characteristics under latency spikes compared to a max of last n latency time estimate.

Furthermore, we provide detailed implementation notes and experimental data demonstrating the system running on a physical so101, Raspberry Pi 5 client, and remote cloud GPU server.

Related Work

RTC via In-Painting

Real-Time Execution of Action Chunking Flow Policies (RTC)  (Black, Galliker, and Levine 2025) introduces a guidance based in-painting approach to address the problem of discontinuities and mode switching across overlapping trajectories in async inference for flow-matching policies. While RTC experimental setup describes remote inference over LAN, the implementation details present a shared state model that does not explicitly outline the distributed system semantics. For distributed client-server deployments, additional protocol semantics are needed to define client-server responsibilities, and behavior under unreliable communication channels.

SmolVLA Async Inference

SmolVLA (Shukor et al. 2025) presents an async inference client-server architecture using gRPC and protobuf for the transport layer. They reason about the requirements for covering the latency gap to achieve async inference, but the system depends on a fixed and non-adaptive threshold parameter \(g\) that does not adapt to latency varying over time. Inference is triggered once the current horizon falls below the threshold. To prevent excessive inference requests while the action queue is under threshold a client-side observation joint-space similarity filter is introduced but this can result in delaying useful updates for small movements. To prevent similarity filtering suppressing updates indefinitely, a bypass condition is introduced to force inference. The similarity filtering bypass is enabled only when the client-side action queue has been emptied, but this can result in gaps in the action trajectory, which might be undesirable. As with RTC, unreliable communication channel failure modes are not explicitly modeled.

Distributed Real-Time Chunking

DRTC builds on the work in Real-Time Execution of Action Chunking Flow Policies (Black, Galliker, and Levine 2025) by explicitly defining a distributed client-server architecture along with specifically addressing how the system should handle lost messages, message reordering, message duplication, and latency spikes.

System Architecture Overview

Client responsibilities.

The main control loop runs on the client and owns executing actions on the robot, determining when to initiate an inference request, merging new actions into the schedule, and incrementing the cooldown counter. The client runs two background threads, one for capturing and sending observations to the server, and other for receiving actions from the server. Thead communication is handled by LWW registers.

Server responsibilities.

The server runs inference on the main thread in a polling fashion checking an LWW register for incoming observation and metadata from the client. The server also maintains a gRPC thread pool for communication with the client.

Preliminaries

The robot control loop operates in discrete time steps \(t \in \mathbb{N}\), with each iteration corresponding to approximately \(\frac{1}{\text{FPS}}\) seconds. At each control step, the system checks an action schedule \(\psi\) for actions to execute. The current action step \(n(t) \in \mathbb{Z}_{\geq -1}\), initialized to \(n(0) = -1\), advances only when the schedule is non-empty and an action gets executed on the robot: \[n(t+1) = \begin{cases} n(t) + 1 & \text{if } |\psi(t)| > 0 \\ n(t) & \text{otherwise} \end{cases}\]

Observations, resulting action chunks, and events are indexed by \(k\) where \(k \in \mathbb{N}\), and so we say action chunk \(A_{k}\) resulted from observation \(O_{k}\). At the time of inference request triggering, observations are tagged with control step \(t_{k}\) and action step \(n_{k}\). \(t_{k}\) is used as a logical timestamp and to reason about causality. Each element of \(\psi\) is a tuple \(\langle j, t_{k}, a, n_{k} \rangle\) where \(j\) is the action step, and \(a\) is the action position vector.

Each action chunk \(A_k\) comprises \(H\) consecutive actions beginning at action step \(n_k\) that was captured at the time of inference request triggering. \[A_{k} = [a_{n_{k}}, a_{n_{k}+1}, \ldots, a_{n_{k}+H-1}]\]

Execution Horizon & Schedule Constraints Following (Black, Galliker, and Levine 2025), we define \(s_{min}\) as the configurable target number of actions to execute from each chunk. When \(s_{min} > \hat{\ell}_\Delta\), the hard mask region will be \(\hat{\ell}_\Delta\), so the effective horizon in this case is \(s_{min}\). When \(\hat{\ell}_\Delta \ge s_{min}\) the execution horizon will be \(\hat{\ell}_\Delta\), so in general the effective horizon is defined \(s = max(s_{min}, \hat{\ell}_\Delta)\) and subject to constraints \[\hat{\ell}_\Delta \leq s \leq H - \hat{\ell}_\Delta\]

In general the total overlap used for in-painting will be \(H - s\).

In DRTC we only increment \(n\) when an action is executed, the first execution horizon will be \(s_{min} + \hat{\ell}_\Delta\), for all subsequent chunks we fall into the usual regime for \(s\).

When merging new actions into the schedule, the incoming chunk starts with an action step starting at the step we initiated inference at and overwrites the steps starting from this point that have not already been executed, this means the schedule size \(|\psi|\) should usually be at most approximately \(H - \hat{\ell}_\Delta\), and never more than \(H\) see Section 3.6 for details on merging.

Distributed RTC In-Painting

To support RTC in-painting (Black, Galliker, and Levine 2025) in the distributed client-server setting, DRTC implements a fixed size buffer of the past raw action chunks that is used to reconstruct the client side action schedule \(\psi\) on the policy server.

For the merging strategy discussed in Section 3.6 we focus on the case where actions with the same action step are always overwritten by incoming actions with greater \(t\), even for the hard mask region (frozen region). Support for multiple spans is not necessary under this strategy, as the action schedule would only contain actions produced from a single source control timestep at the time of inference request triggering. We retain the multiple span support to accommodate future evolutions of the system where this might not be the case.

Server-Side Action Buffer. The policy server maintains a fixed size buffer \(\mathcal{B}\) of raw action chunks indexed by the source control timestep \(t_{k}\) for observation \(O_{k}\) that produced the action chunk \(A_{k}\). \[\mathcal{B}[t_{k}] \gets A_k^{\text{raw}}\]

Following Real-Time Execution of Action Chunking Flow Policies (Black, Galliker, and Levine 2025), the hard mask region gets full guidance weight of 1 and the actions from the incoming chunk are allowed to overwrite action steps from the hard mask region that might still exist in the action schedule \(\psi\) if the action chunk arrived early.

Under normal conditions, and for the always overwrite merging strategy, we would only ever need to maintain a previous action buffer of size 1, but to account for action chunks being lost in transit we set the buffer size to a conservative size of 10.

Action Schedule Span Representation. For span reconstruction the action schedule \(\psi\) is a set of elements \(\langle j, t_{k}, a, n_{k} \rangle\), where \(n_{k}\) is the source action step captured at inference request triggering. When requesting inference, the client sends the current action schedule spans representation, a list of tuples \(\langle t_{k}, f, e \rangle\) where \(t_{k}\) is the source control timestep, and \([f, e)\) are the offset indices relative to \(n_{k}\) for each run of consecutive actions belonging to the same \(t_{k}\). The spans of raw actions for each \(t_{k}\) are then reconstructed on the server by indexing into the buffer \(\mathcal{B}\) with \(t_{k}\) and used for in-painting with the next inference result. Note that the \(t_{k}\) referred to here is the \(t\) used to tag a result of a previous observation and resulting actions that have been merged to the action schedule, not the current \(t_{k}\) which will be used to tag the current observation.

Algorithm: ReconstructSchedule
1:function ReconstructSchedule(spans)
2: \(S = [\,]\)
3:for \((t_{k}, f, e)\) in spans
4:  \(S\).Append(\(\mathcal{B}[t_{k}][f:e]\))
5:end for
6:return Concat(\(S\))
7:end function

Hard & Soft Masking Regions. For in-painting, the current action schedule \(\psi\) can be described by two regions

\(\hat{\ell}_\Delta\) and \(overlapEnd \gets H - s\) will also be sent to the server along with the spans. The remaining implementation of In-painting on the policy server is the same as described in RTC (Black, Galliker, and Levine 2025).

Latency Estimation

Latency or Round-trip Time (RTT) \(\ell_{k}\) is measured as the wall-clock time elapsed from capturing observation \(O_{k}\) to receiving its corresponding action chunk \(A_{k}\). We use the Jacobson Karels algorithm (Jacobson 1988) originally developed for TCP congestion control, to build a smooth estimate of RTT. The algorithm estimates an average \(\bar{\ell}_{k}\) and average error \(\sigma\): \[\begin{aligned} \bar{\ell}_{k+1} &= (1 - \alpha) \cdot \bar{\ell}_{k} + \alpha \cdot \ell_{k} \\ \sigma_{k+1} &= (1 - \beta) \cdot \sigma_{k} + \beta \cdot |\ell_{k} - \bar{\ell}_{k}| \end{aligned}\] where \(\alpha\) and \(\beta\) are the gain/learning rate parameters, and set to 0.125 and 0.25 respectively as per  (Jacobson 1988). \(\sigma\) acts as uncertainty term, so when sample variation is high \(\sigma\) is large, and when sample variation is low \(\sigma\) shrinks. Combining the average and average error the estimate becomes: \[\hat{\ell} = \bar{\ell} + K \cdot \sigma\] where \(K\) is a configurable parameter, we found \(K = 1.5\) to provide a good balance between conservative estimation and fast recovery from latency spikes.

The latency estimate is then quantized to action steps for comparison with \(|\psi(t)|\) \[\hat{\ell}_{\Delta} = \left\lceil \hat{\ell} \cdot \text{FPS} \right\rceil\]

To prevent large variance on initialization the estimator is seeded with \(\bar{\ell}_{k}\) and \(\ell_{k}\) equal to the value of the first measurement, and \(\sigma_{k}\) set to 0.

Jacobson Karels has more desirable recovery characteristics compared to the max of previous \(n\) measurements estimate used in RTC via in-painting (Black, Galliker, and Levine 2025). With the max of \(n\) previous measurements strategy, a single large latency spike will keep the estimate elevated for \(n\) inference requests, causing the system to recover slowly in some cases. In contrast, the exponential decay in Jacobson Karels allows the estimate to more quickly begin returning toward the mean as variance in latency measurements subsides.

DRTC Inference Request Triggering

If \(|\psi(t)|\) is the number of currently scheduled future actions, to stay close to our min execution horizon \(s_{min}\), we should trigger an inference request at \[|\psi(t)| \leq H - s_{min}\]

Once the inference request condition is met we want to allow the server time to respond with an action chunk and not wastefully trigger inference requests on every control tick while the action schedule is under the threshold. To this end, we extend the inference request condition to include a cooldown condition \[|\psi(t)| \leq H - s_{min} \land O^{c}(t) = 0\]

\(O^{c}(t)\) is an integer counter initialized to control steps proportional to the round trip latency, plus safety margin of \(\epsilon\), and decremented each control tick until it reaches 0 \[O^{c}(t+1) = \begin{cases} s_{min} + \epsilon & \text{until latency estimate exists} \\ \hat{\ell}_{\Delta} + \epsilon & \text{if inference requested at } t \\ max(O^{c}(t) - 1, 0) & \text{otherwise} \end{cases}\] In the case the action chunk is never received the cooldown counter will reach 0, and inference will be triggered again. This will repeat every \(\hat{\ell}_{\Delta} + \epsilon\) control ticks until the action schedule is replenished.

On the first inference request and until a latency estimate exists, the counter is initialized to \(s_{min} + \epsilon\). For all subsequent inferences the counter is set to \(\hat{\ell}_{\Delta} + \epsilon\). \(\epsilon\) is added to allow for a slightly late arrival without retriggering to allow for cases where latency estimation has not yet had time to adjust. If actions arrive later than \(\hat{\ell}_{\Delta} + \epsilon\), or an observation, or action chunk is lost in transit, inference will again be triggered and repeat until the system recovers.

Note, that with the cooldown the effective execution horizon becomes \(s = max(s_{min}, \hat{\ell}_{\Delta} + \epsilon)\), given that \(\epsilon\) is small we still consider the execution horizon to be \(s = max(s_{min}, \hat{\ell}_{\Delta})\) for in-painting. The result of this is the first \(\epsilon\) steps from the soft mask region will on average be executed before the arrival of the next action chunk update.

Message Passing & Schedule Merging

The network is an unreliable channel that may reorder or duplicate messages. We would like the system to be resilient to reordering and duplication of messages, and for messages to be delivered in same order relative to their associated observations. To achieve the desired behavior model and implement message passing across threads as Last Write Wins (LWW) register  (Shapiro et al. 2011), and model the action schedule as a map of LWW registers where each message is tagged with the control timestep \(t\). In our architecture, messages transmitted over the network must be passed across a thread before being read, this allows network transmission to effectively inherit the characteristics of the registers used for thread message passing. The timestep \(t\) acts as a logical timestamp (Lamport 1978) and allows us to reason about causality with the goal of having the action schedule converge to a set of monotonically increasing updates. Analytically, message passing is described by the semilattice join, and merging as the pointwise semilattice join.

Message Passing.

Let \(m = \langle t_m, p_m \rangle\) and \(r = \langle t_r, p_r \rangle\) for \(m, r, \bot \in M_{\bot} := (\mathbb{N}\times \mathcal{P}) \;\cup\; \{\bot\}\) where \(t\) is the monotonically increasing control step, \(p\) is the generic message payload, and \(t\) uniquely identifies messages and it’s payload. With \(\bot\) as the empty register and elements ordered by \(t\) \[\bot \le x \;\; \forall x \in M_{\bot}, \qquad \langle t,p\rangle \le \langle t',p'\rangle \;\iff\; t \le t'\]

then the join is defined

\[\begin{aligned} \qquad \langle t,p\rangle \vee \langle t',p'\rangle = \begin{cases} \langle t',p'\rangle & \text{if } t' > t\\ \langle t,p\rangle & \text{otherwise.} \end{cases} \end{aligned}\]

\[\begin{aligned} m \vee \bot = m \end{aligned}\]

and the register update is \[\begin{aligned} r \gets r \vee m \end{aligned}\]

Schedule Merging.

Let the schedule be the partial function \(\psi:\mathbb{N}\rightharpoonup (\mathbb{N}\times \mathcal{A}) \;\cup\; \{\bot\}\), where for action step \(j \in \mathrm{dom}(\psi)\) we have \(\psi(j)=\langle t,a\rangle\), \(t\) is as above, and \(a\) is the action vector (\(n_{k}\) is omitted as it’s not involved in the join comparison). Let \(\psi'\) be the incoming action chunk of the same type. For \(j \notin \mathrm{dom}(\psi)\) with \(\bot \le x\) and \(\bot \vee x = x\), i.e take the element \(x\) that exists at one side of the join. Each element indexed by \(j\) is ordered by \(t\)

\[\begin{aligned} \langle t,a\rangle \le \langle t',a'\rangle &\iff t \le t' \end{aligned}\]

and so the register updates are the pointwise join for each \(j\) greater than the latest action step \(n(t)\) executed on the robot

\[\forall j > n(t):\qquad \psi(j) \gets \psi(j) \vee \psi'(j)\]

The join operation is commutative, idempotent, and associative. Commutativity and associativity under the join provide the behavior that applying updates in any arrival order or grouping results in the same final state. Idempotency under the join provides the behavior that duplicate delivery has no effect after the first update for the same \(t\). Late messages with smaller \(t\) are consumed by the join. From the above it follows that for each action \(j\), \(\psi(j)\) converges to the largest control step \(t\) seen by the system, excluding any messages that may have been lost in transit.

Implementation Details

The system is comprised of a robot client and a policy server. For the case we are concerned with the client and server will run on different machines, but they can be run on the same machine. Following (Shukor et al. 2025) gRPC and protobuf are used for network communication. LWW registers (Section 3.6) are used for thread communication following a single-producer single-consumer pattern (SPSC), and each implementing the semilattice join \(r \gets r \vee m\) keyed by control step \(t_{k}\).

Algorithm 1 Robot Client Control Loop
1:Input: horizon \(H\), safety margin \(\epsilon\), min execution horizon \(s_{\min}\)
2:Init: \(t \leftarrow 0, \psi(0) \leftarrow \emptyset\), \(O^c(0) \leftarrow 0\), \(n(0) \leftarrow -1\)
3:Threads:
4:ObservationSender,
5:ActionReceiver
6:Registers:
7: \(RC_{\text{obs}}\) (main \(\rightarrow\) obs sender)
8: \(RC_{\text{act}}\) (action receiver \(\rightarrow\) main)
Thread Main:
9:loop
10:# Execute action if available
11:if \(|\psi(t)| > 0\) then
12:  \((j, t_{k}, a, n_{k}) \leftarrow \textsc{PopFront}(\psi)\)
13:  Execute(\(a\))
14:  \(n(t+1) \leftarrow j\)
15:else
16:  \(n(t+1) \leftarrow n(t)\)
17:end if
18:# Trigger inference
19:if \(|\psi(t)| \leq H - s_{\min}\) and \(O^c(t) = 0\) then
20:  \(s \leftarrow \max(s_{\min},\; \hat{\ell}_\Delta)\)
21:  \(\text{spans} \leftarrow \textsc{BuildSpans}(\psi)\)
22:  # Request; obs sender captures \(O_k\)
23:  \(RC_{\text{obs}} \leftarrow RC_{\text{obs}} \vee \langle t,\;(\text{spans},\, \hat{\ell}_\Delta) \rangle\)
24:  # use \(s_{min} + \epsilon\) until latency estimate exists
25:  \(O^c(t+1) \leftarrow \hat{\ell}_\Delta + \epsilon\)
26:else
27:  \(O^c(t+1) \leftarrow \max(O^c(t) - 1,\; 0)\)
28:end if
29:# Check for incoming actions and merge
30:if \(RC_{\text{act}}\) has new data then
31:  \((A_k, t_k, \ell_k) \leftarrow \textsc{Read}(RC_{\text{act}})\)
32:  \(\psi' \leftarrow \langle t_k, A_k \rangle\)
33:  \(\forall j > n(t):\; \psi(j) \leftarrow \psi(j) \vee \psi'(j)\)
34:  \(\hat{\ell}_\Delta \leftarrow\) UpdateLatencyEstimate(\(\ell_k\))
35:end if
36: \(t \leftarrow t + 1\)
37:end loop
Thread ObservationSender:
38:loop
39:if \(RC_{\text{obs}}\) has new data then
40:  \((t_k, \text{spans}, \hat{\ell}_\Delta) \leftarrow \textsc{Read}(RC_{\text{obs}})\)
41:  \(O_k \leftarrow \textsc{CaptureObservation}()\)
42:  SendToServer(\(O_k, t_k, \text{spans}, \hat{\ell}_\Delta\))
43:end if
44:end loop
Thread ActionReceiver:
45:loop
46: \((A_k, t_k) \leftarrow \textsc{ReceiveFromServer}()\)
47: \(\ell_k \leftarrow \textsc{EstimateLatency}()\)
48: \(RC_{\text{act}} \leftarrow RC_{\text{act}} \vee \langle t_k,\; (A_k, \ell_k) \rangle\)
49:end loop
Algorithm 2 Policy Server Inference Loop
1:Input: policy \(\pi\), buffer capacity \(B\)
2:# Server-side action chunk buffer
3:Init: \(\mathcal{B} \leftarrow \textsc{LRUCache}(B)\)
4:Threads: gRPC threadpool
5:Registers:
6:\(RS_{\text{obs}}\) (receiver \(\rightarrow\) main)
7:\(RS_{\text{act}}\) (main \(\rightarrow\) stream)
8:loop
9:if \(RS_{\text{obs}}\) has new data then
10:  \((O_k, t_k, \text{spans}, \hat{\ell}_\Delta) \leftarrow \textsc{Read}(RS_{\text{obs}})\)
11:  \(\psi^{\text{raw}} \leftarrow \textsc{RebuildSchedule}(\text{spans}, \mathcal{B})\)
12:  # Inference with in-painting
13:  \(A_k^{\text{raw}} \leftarrow \pi(O_k, \psi^{\text{raw}}, \hat{\ell}_\Delta)\)
14:  # Cache action chunk before postprocessing
15:  \(\mathcal{B}[t_k] \leftarrow A_k^{\text{raw}}\)
16:  \(A_k \leftarrow \textsc{Postprocess}(A_k^{\text{raw}})\)
17:  # Push via action stream
18:  \(RS_{\text{act}} \leftarrow RS_{\text{act}} \vee \langle t_k,\; A_k \rangle\)
19:end if
20:end loop
gRPC ObservationReceiver (handler):
21: \((O_k, t_k, \text{spans}, \hat{\ell}_\Delta) \leftarrow \textsc{ReceiveFromClient}()\)
22: \(RS_{\text{obs}} \leftarrow RS_{\text{obs}} \vee \langle t_k,\; (O_k, \text{spans}, \hat{\ell}_\Delta) \rangle\)
gRPC ActionStreamer (generator):
23:loop
24:if \(RS_{\text{act}}\) has new data then
25:  \((t_k, A_k) \leftarrow \textsc{Read}(RS_{\text{act}})\)
26:  Yield(\(A_k, t_k\))
27:end if
28:end loop

Robot Client. The client runs three threads: (1) the main thread runs the control loop, (2) an observation sender thread sends the observation and metadata to the server (3) an action-receiver thread listens for incoming action chunks. The observation register \(RC_{\text{obs}}\) has the main thread as producer and the observation-sender as consumer; the action register \(RC_{\text{act}}\) has the action-receiver as producer and the main thread as consumer.

When the inference condition is met, the main thread builds the span representation and mask region indices (Section 3.3) and passes these as a request to the observation sender thread via an LWW register. The observation sender thread polls the LWW register, captures an observation from the sensors and sends the observation payload to the policy server.

While it is useful to model and analyze the action schedule \(\psi\) as a map of LWW registers, in practice it may be implemented as a SortedDict container keyed by action steps, still following the pointwise semilattice join described in section 3.6 and subject to the condition \(\forall j > n(t)\). The main thread should have exclusive write ownership of the action schedule, latest action step pointer, and cooldown counter.

Policy Server. The policy server runs inference on the main thread in a polling fashion. On each iteration it will check the LWW register \(RS_{\text{obs}}\) that is populated by the observation receiver and run inference. The server maintains a fixed size LRU cache \(\mathcal{B}\) of raw action chunks indexed by control timestep \(t_{k}\). On receiving an observation with its span representation and mask indices, the main thread reconstructs the schedule from \(\mathcal{B}\) (Section 3.3), runs inference with RTC in-painting, caches the raw chunk in \(\mathcal{B}\), writes to LWW register \(RS_{\text{act}}\) and streams the postprocessed result back to the client.

LWW Register with Watermark.

Registers used for thread message passing are read/write, values are not consumed, this is necessary for the semilattice join comparison. To avoid re-processing the same message the consumer maintains a watermark state that records the last read value. We encapsulate the state, lock, and join into a single LWWRegister class, and the watermark into a companion LWWReader class (Algorithm [alg:lww]) that is instantiated from a factory method. This encapsulation is important in practice: each register instance owns its own lock \(L\) and state \(s\), so call sites interact only through Update and Read without managing concurrency directly. Similarly, each LWWReader owns its watermark \(w\), so consumers call ReadIfNewer and watermark state is handled internally.

The locking and watermark are specific to the registers used for thread message passing. The main thread owns the reads and writes to the action schedule so no lock is needed. The read once semantics are already enforced from the LWW thread message passing register used to pass messages from the action receiver thread to the main thread on the client.

Algorithm 3 LWW Register with Watermark
class LWWRegister:
1:# Initially empty \(\langle \bot,\, \bot \rangle\)
2:State: \(s = \langle t,\, p \rangle\)
3:Lock: \(L\)
4:function Update(\(t',\, p'\))
5:with lock \(L\) do
6:  if \(t' > s.t\) then
7:   \(s \leftarrow \langle t',\, p' \rangle\)
8:  end if
9:end with
10:end function
11:function Read()
12:with lock \(L\) do
13:  return \(s\)
14:end with
15:end function
class LWWReader:
16:# Reference to parent LWWRegister
17:Field: \(R\)
18:Watermark: \(w \leftarrow -1\)
19:function ReadIfNewer()
20: \(\langle t,\, p \rangle \leftarrow R.\textsc{Read}()\)
21: \(\textit{is_new} \leftarrow (t > w)\)
22:if \(\textit{is_new}\) then
23:  \(w \leftarrow t\)
24:end if
25:return \(\langle t,\, p \rangle,\; \textit{is_new}\)
26:end function

Additional Implementation Details.

The following are not the focus of this work but each offered considerable improvements to the performance of the system and were used for all experiments.

For in-painting we used the RTC improvements described by Soare (Soare 2025). A client side Butterworth low-pass filter is applied per action dimension to smooth each trajectory and reduce jitter, we used the SciPy implementation although any low-pass filter should work about as well here. Camera images are JPEG-compressed before transmission to reduce network payload size. The camera reader is configured to allow reading stale frames so that observation capture does not block on camera I/O.

Experiments

We evaluate DRTC running on a physical so101 with a distributed client and server where the client runs on a Raspberry Pi 5 and policy server runs on a cloud GPU server or a local GPU server. The specific node configuration for each experiment can be found in it’s associated figure caption. Simulated faults are injected into the system to confirm the system behaves as expected. All experiments use a SmolVLA policy trained on a pick and place task. Table 1 contains the shared parameters for all experiments.

Fault Injection

Here we describe each of the simulated faults used in our experiments. For each fault injection event, a start and end time is specified and indicates when the fault should be active, we will refer to this as the fault window.

Drop Observation

Drop observation faults are injected in the send observation thread on the client. During the fault window observations are discarded and do not get sent to the policy server. Drop faults are used to demonstrate that the system can recover from dropped observation and action chunk messages.

Drop Action Chunk

Drop action chunk faults are injected in the receive actions thread on the client. During the fault window action chunks are discarded and are not passed to the main control thread.

Duplicate Observation

Duplicate observation faults are injected in the observation sender thread on the client. During the fault window each observation will be sent to the policy server twice.

Duplicate Action Chunk

Duplicate action chunk faults are injected in the receive actions thread on the client. During the fault window each action chunk message will be sent twice to the LWW register used for communication with the main thread running the control loop.

Reorder Observation

Reorder observation faults are injected in the send observation thread on the client. During the fault window observations are held until the next observation arrives at the fault injection site and then the observations are reordered and sent to the policy server.

Reorder Action Chunk

Reorder action chunk faults are injected in the receive actions thread on the client. During the fault window action chunk messages are held until the next action chunk message arrives at the fault inject site and then the action chunk messages are reordered and sent to the LWW register used for communication with the main thread running the control loop.

Latency Spike/Delay

Latency spike faults are injected in the inference loop on the main thread of the policy server. During the fault window the process will sleep and block the thread for the specified delay time.

Real System Events

Observation Triggered

Indicates if an observation request message was sent from the control loop to the observation sender thread on the client.

Action Received

Indicates if an action chunk was received by the control loop from the action receiver thread on the client.

Experiment Configuration
Parameter Value
Robot type so101
GPU RTX 4090
Number of cameras 2
Cameras camera2 (800x600), camera1 (800x600)
Policy type smolvla
Weights so101_smolvla_pickplaceorangecube_e100
Dataset so101_pickplaceorangecube_e100
Chunk size 50
FPS 60
\(s_{\min}\) 20
\(\epsilon\) 1
Latency estimator Jacobson–Karels
\(\alpha\) 0.125
\(\beta\) 0.25
\(K\) 1.5
Flow matching steps 8
RTC enabled True
RTC max guidance weight N/A
RTC attention schedule linear
RTC \(\sigma_d\) 0.2
RTC full trajectory alignment False
Filter type butterworth
Filter cutoff (Hz) 3
Gain 1.4

System Behavior Under A Variety Of Faults

image

Figure [fig:mixture_of_faults] shows DRTC operating under a variety of simulated faults injected sequentially over a 25 second episode. The fault injection schedule subplot indicates when each fault event is enabled.

Drop Events

We see that the drop observation and action events each produce a noticeable gap in the action trajectory as this starves the client of new action chunks, once the fault window has ended we see that the system gracefully recovers as expected, demonstrating that inference request triggering is not dependent on the arrival of action chunks. Additionally, we see that cooldown counter throttles inference requests as expected rather than entering a regime of re-triggering on each control step.

Duplication Events

While duplication events are active we see no change in the system behavior, this is expected as the LWW registers absorb any duplicated messages. This demonstrates that the LWW registers prevent back-pressure and wasteful inferences in the system under message duplication.

Reordering Events

For reordering events we see gaps in the trajectory, this is expected since we are holding and waiting for the next observation or action chunk message respectively. The commutative property of the LWW registers absorbs the out of order messages and we do not see an increasing in action received events, and we do not see mode switching in the trajectory which might be expected if out of order messages were allowed to be processed.

Latency Estimation Behavior

Max Of Last 10 Estimator

Figure [fig:spikes_max_of_last_10] shows the behavior of the max of last 10 measurements based latency estimator under latency delays injected into the system. We can see that latency spikes cause the estimate to remain elevated for the next 10 inference request round trips until the measurement is pushed out of the fixed size buffer.

Jacobson Karels Estimator

Figure [fig:spikes_jk] shows the behavior of the Jacobson Karels latency estimator under latency delays injected into the system. Compared to the max of last 10 strategy the execution horizon begins returning to the \(s_{min}\) sooner. The difference is not particularly large in these experiments given the already large mean latency in relation to the execution horizon, but for the case where the horizon is comparatively larger than the latency we might expect the Jacobson Karels estimator to be more clearly favorable.

image

image

Conclusion

Distributed Real-Time Chunking presents an RTC compatible approach designed for distributed client-server deployments. The cooldown mechanism provides efficient resilience to lost and delayed messages. Modeling thread message passing and action schedule merging on LWW registers and the semilattice join provides the desired characteristics for handling message reordering, duplication, and ensuring only monotone updates are allowed to propagate forward through the system. We present extensive implementation details including a multithreading and resource ownership model intentionally designed to reduce latency and lock contention. Finally, we validate the expected behavior of the system experimentally on physical hardware under transient injected faults. A direct comparison to RTC and SmolVLA under identical faults remains as future work.

Limitations and future work.

Future work includes direct baseline comparisons with RTC and SmolVLA under identical latency and fault conditions, plus broader network impairment testing beyond transient injected events.

Acknowledgements.

The DRTC implementation presented here is built on LeRobot (Cadene et al. 2024) and the LeRobot RTC implementation. Special thanks to the LeRobot team and open source contributors Eugene Mironov and Khalil Meftah for their work on the RTC implementation.

Amin, Ali, Raichelle Aniceto, Ashwin Balakrishna, Kevin Black, Ken Conley, Grace Connors, James Darpinian, et al. 2025. \(\pi^*_{0.6}\): A VLA That Learns from Experience.” arXiv Preprint arXiv:2511.14759. https://doi.org/10.48550/arXiv.2511.14759.
Black, Kevin, Manuel Y Galliker, and Sergey Levine. 2025. “Real-Time Execution of Action Chunking Flow Policies.” arXiv Preprint arXiv:2506.07339. https://doi.org/10.48550/arXiv.2506.07339.
Cadene, Remi, Simon Alibert, Alexander Soare, Quentin Gallouedec, Adil Zouitine, Steven Palma, Pepijn Kooijmans, et al. 2024. “LeRobot: State-of-the-Art Machine Learning for Real-World Robotics in Pytorch.” https://github.com/huggingface/lerobot.
Jacobson, Van. 1988. “Congestion Avoidance and Control.” In Proceedings of the ACM SIGCOMM ’88 Symposium, 314–29. Stanford, CA: ACM. https://doi.org/10.1145/52325.52356.
Lamport, Leslie. 1978. “Time, Clocks, and the Ordering of Events in a Distributed System.” Communications of the ACM 21 (7): 558–65. https://doi.org/10.1145/359545.359563.
Luo, Jianlan, Charles Xu, Jeffrey Wu, and Sergey Levine. 2024. “Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning.” arXiv Preprint arXiv:2410.21845. https://doi.org/10.48550/arXiv.2410.21845.
Shapiro, Marc, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. “A Comprehensive Study of Convergent and Commutative Replicated Data Types.” Inria. https://dsf.berkeley.edu/cs286/papers/crdt-tr2011.pdf.
Shukor, Mustafa, Dana Aubakirova, Francesco Capuano, Pepijn Kooijmans, Steven Palma, Adil Zouitine, Michel Aractingi, et al. 2025. “SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics.” arXiv Preprint arXiv:2506.01844. https://doi.org/10.48550/arXiv.2506.01844.
Soare, Alexander. 2025. “Smooth-as-Butter Robot Policies.” https://alexander-soare.github.io/robotics/2025/08/05/smooth-as-butter-robot-policies.html.
Zhao, Tony Z, Vikash Kumar, Sergey Levine, and Chelsea Finn. 2023. “Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware.” arXiv Preprint arXiv:2304.13705. https://doi.org/10.48550/arXiv.2304.13705.