End-to-end architecture, sensor fusion, and real-time gimbal control
for autonomous optical target tracking
This paper describes the architecture and key design decisions of a ground-based GPS beacon tracking system that automatically commands a two-axis brushless gimbal camera to follow an airborne target carrying an XBee 802.15.4 radio beacon. The system spans three embedded components: an airborne Teensy 4.0 fusing GPS, IMU, barometer, and magnetometer into a 9-state Extended Kalman Filter and transmitting compact binary telemetry; a Teensy 4.1 gimbal controller executing 2 kHz field-oriented motor control on two GBM5208 brushless motors via SimpleFOC; and a Python ground station running a 6-DOF EKF, a critically-damped setpoint filter, and VISCA RS-485 zoom control. End-to-end tracking latency is dominated by the critically-damped filter at the default tuning (≈ 650 ms rise time at ωn = 6.0 rad/s), with a pre-filter pipeline of approximately 73–138 ms. Measured steady-state gimbal position hold accuracy is 0.46° RMS on elevation and 0.47° RMS on azimuth under camera load at 12 V.
Optical tracking of small unmanned aerial systems (UAS) from a fixed ground station presents a compound instrumentation challenge. The target must be continuously localized in three-dimensional space, the gimbal must be driven to point a narrow-field camera at that location faster than the target moves through the field of view, and the system must remain stable in the presence of mechanical resonance, sensor quantization noise, and variable RF link quality.
The system described here addresses this problem with a GPS-primary tracking architecture: the target carries a small radio beacon that continuously transmits its GPS-derived position and IMU attitude. The ground station fuses this telemetry in an Extended Kalman Filter and drives a brushless gimbal directly from the EKF state estimate, supplemented by lead compensation to reduce the effect of end-to-end latency. No computer vision is required for basic tracking; optical feedback is reserved for a future augmentation layer.
The design goal is a practically deployable prototype system — not a research demonstrator. Hardware is commercial off-the-shelf wherever possible, the software is openly licensed, and the system is documented honestly with respect to its current performance and known limitations.
The system comprises three independently operable components connected by serial radio and USB links. Figure 1 shows the full data flow from airborne sensors to gimbal motor coils.
The airborne unit is a Teensy 4.0 (600 MHz ARM Cortex-M7) integrating four sensors: an ICM-20948 IMU via SPI1 operating its on-chip Digital Motion Processor at 112 Hz, a u-blox NEO-M9N GPS receiver at 10 Hz via UART, a Bosch BMP581 barometer via I2C Wire2, and a QMC5883L magnetometer via I2C Wire0. The DMP produces calibrated quaternion orientation without consuming Teensy cycles for sensor fusion. Raw accelerometer data from the DMP FIFO drives the EKF predict step. A 9-state NED EKF fuses GPS position and velocity with IMU-derived acceleration, publishing position, velocity, and uncertainty estimates at the IMU rate. Every 100 ms, the firmware assembles a 34-byte binary telemetry frame, appends a CRC-16/CCITT-FALSE checksum, and transmits it over XBee 802.15.4 at 230,400 baud. The same hardware simultaneously operates as a 3-axis PID gyroscope stabiliser for the carrier aircraft, reading S.Bus inputs and driving six servos.
The ground station is a Python 3.11 application running on Windows. A SerialReceiver thread drains the XBee port, validates CRC, and queues TelemetryFrame objects. An EKFEstimatorThread runs at 100 Hz, executing predict steps between GPS updates and fusing GPS fixes and velocity measurements when available. A GimbalControllerThread runs at 200 Hz, applying lead compensation, filtering the setpoint through a critically-damped second-order filter, and writing angle commands over USB Serial to the gimbal controller. A DataLoggerThread batches binary records to a MRSLOG03 flight log on disk. All inter-thread communication uses bounded collections.deque objects defined in a single ipc_channels.py module, with no upward calls between threads.
The gimbal controller is a Teensy 4.1 running SimpleFOC v2.4.0 at a 2 kHz FOC loop rate. Elevation uses cascaded angle-velocity-voltage control; azimuth uses direct-voltage angle control (angle_nocascade), which proved more stable for the low-friction azimuth bearing. Both axes run AS5048A 14-bit magnetic encoders on separate SPI buses to avoid chip-select contention. PWM is set to 25 kHz to eliminate audible switching noise. A three-tier runaway fault detector monitors shaft velocity per axis, distinguishing sensor glitches from genuine motor runaway, and halts the affected axis with a defined recovery sequence.
Understanding the latency budget is prerequisite to tuning the lead compensation parameter. Table 1 breaks down each pipeline stage from GPS fix generation to gimbal response.
| Stage | Nominal | Worst Case | Notes |
|---|---|---|---|
| GPS fix interval | 100 ms | 100 ms | NEO-M9N at 10 Hz; telemetry decimated every 25th IMU tick |
| XBee 802.15.4 RF link | 9.5 ms | 15 ms | 7–12 ms nominal, ±2–3 ms jitter at 50 m LOS |
| Serial RX + CRC parse | 1 ms | 5 ms | GS serial timeout up to 5 ms between buffer drains |
| EKF predict + update | 1 ms | 2 ms | Numpy matrix ops; dominated by 6×6 Joseph-form update |
| CDF at ωn = 6.0 rad/s | 650 ms | 650 ms | 0→90% rise time; dominant term; tunable via ωn spinbox |
| Pre-filter total (avg) | ~73 ms | ~138 ms | Excluding CDF; lead compensation targets this range |
The critically-damped filter (CDF) rise time dwarfs all other contributions at the default tuning. This is a deliberate design tradeoff: the filter eliminates the high-frequency jitter that would otherwise produce visible gimbal chatter from GPS quantization and RF jitter, at the cost of tracking lag. Lead compensation is the mechanism for recovering that lag without reintroducing noise.
Before the setpoint enters the CDF, the ground station projects the EKF position estimate forward by latency_s seconds using the EKF velocity vector:
pos_lead = pos_ned + vel_ned × latency_s
The lead position is then converted to azimuth and elevation angles, which become the CDF setpoint. The recommended tuning range is 0.4–0.7 s, which covers the average pre-filter pipeline (73 ms) plus the dominant CDF contribution. At high target velocities the prediction error grows with target acceleration — the lead term is valid only for approximately constant-velocity flight segments.
The latency_s spinbox in the ground station UI is capped at 300 ms in the current implementation — below the recommended 400–700 ms range identified by the latency budget analysis. This cap should be raised to 1000 ms to allow full tuning range without code changes.
The airborne EKF maintains a 9-element state vector in the NED (North-East-Down) frame: position (3), velocity (3), and accelerometer bias (3). This allows bias estimation without a separate calibration step, at the cost of slower convergence. The process model is a constant-velocity plant with white acceleration noise driving the velocity states and a random-walk model driving the bias states. GPS position and velocity measurements are applied at 10 Hz; the barometer provides an altitude measurement at the same rate via a scalar update on the Down position state.
GPS outlier rejection uses a threshold gate on the implied position jump between consecutive fixes: a fix that implies travel exceeding GPS_SPIKE_THRESHOLD_M (15 m) in one 100 ms epoch is discarded. This is a simple Euclidean gate rather than a Mahalanobis gate; a Mahalanobis gate on the 3×3 position innovation covariance would be more principled and is a candidate for a future revision.
The ground station runs a separate, simpler 6-state EKF (position + velocity, no bias states) on the received telemetry. Process noise is derived using the Van Loan continuous noise model, which ensures Q scales correctly with dt when the filter runs at irregular intervals. Three measurement gates are applied before each GPS update:
1. Mahalanobis distance — chi-squared test at 3σ threshold; rejects fixes inconsistent with the current covariance.
2. Acceleration plausibility — implied velocity change must not exceed 5g over one update interval.
3. Range gate — fix must be within 20 km of the ground station.
GPS measurement noise is scaled adaptively: fixes with HDOP above a threshold receive proportionally inflated measurement noise, reducing their weight in the update step without outright rejection. This handles the gradual accuracy degradation during partial sky obstruction without triggering the hard Mahalanobis gate.
The airborne firmware does not include the NEO-M9N NAV-PVT iTOW timestamp in the 34-byte telemetry frame. The ground station EKF cannot distinguish a freshly received GPS fix from a retransmitted or delayed frame, and will fuse it as if it arrived at the time the serial byte was received. Under nominal link conditions (low loss, <15 ms latency) this introduces a sub-epoch timing error of approximately 5–10 ms. Under poor link conditions the error can approach one full GPS epoch (100 ms). Adding iTOW to the telemetry frame is the highest-priority firmware revision item.
Ground station angle commands pass through a second-order critically-damped filter (damping ratio ζ = 1.0) before being transmitted to the Teensy. Critical damping is chosen to eliminate overshoot — any underdamped response would cause the camera to swing past the target on step commands, which is visible in the optical output and counterproductive for capture. The filter is discretised with semi-implicit Euler at 200 Hz. The natural frequency ωn is configurable via a UI spinbox; Table 2 shows the performance tradeoff across the tuning range.
| ωn (rad/s) | Rise time (0→90%) | Tracking lag | Use case |
|---|---|---|---|
| 3.0 | ~1.3 s | High | Very smooth, slow-moving targets |
| 6.0 (default) | ~650 ms | Moderate | General tracking; lead comp. required |
| 10.0 | ~390 ms | Lower | Fast targets; increased command noise |
| 20.0 | ~195 ms | Low | Near-passthrough; GPS jitter visible |
The elevation and azimuth axes use different SimpleFOC control modes for reasons specific to their mechanical loads. The elevation axis runs cascaded angle → velocity → voltage control (MotionControlType::angle). The velocity integrator winds up against the static camera weight load, providing stable hold without a position error offset. The azimuth axis runs direct-voltage angle control (angle_nocascade): the angle PID outputs voltage directly, bypassing the inner velocity loop. This was chosen because the azimuth bearing has low, uniform friction; the inner velocity loop added phase lag without stability benefit for this load profile.
The BirdDog MAKI Ultra 20× camera is controlled via VISCA RS-485 at 9600 baud. Zoom position is mapped to slant range using a configurable smooth-step anchor table, allowing the zoom to track the target at a consistent apparent size across a range of distances. Zoom commands are rate-limited to 2 Hz to remain within the VISCA command budget; the ground station computes zoom target at 200 Hz but the VISCA I/O thread only transmits when the computed position changes significantly. The zoom position is further smoothed by its own critically-damped filter to avoid abrupt zoom changes during range fluctuations.
The gimbal controller implements a three-tier per-axis fault state machine:
| Tier | Trigger | Action | Recovery |
|---|---|---|---|
| Instant clamp | |ω| > 40 rad/s | Disable axis, enter glitch hold | Re-enable after 2 s if |ω| < 3 rad/s |
| Runaway watchdog | |ω| > 15 rad/s for 500 ms + travel > 1 rad | Disable, 2 s pause, resume at current angle | Automatic; up to 3 events before halt |
| Repeated fault | >3 runaways or >10 spikes | Permanent halt (while(1)) | Power cycle required |
The distinction between Tier 1 and Tier 2 is the travel criterion: a high velocity reading with less than 1 radian of actual shaft travel is classified as a sensor glitch (quantization noise spike or SPI fault), not a motor runaway. This prevents the fault recovery sequence from disrupting tracking during transient encoder noise events.
Plant characterization data was collected with the camera payload installed at 12 V supply. Step response tests used ±10° commanded steps at the operating tune. Table 3 summarizes the measured results.
| Metric | Elevation | Azimuth |
|---|---|---|
| Rise time (10–90%) | 0.120 s | 0.160 s |
| Overshoot | 5.6% | 0.0% |
| Damping ratio ζ | 0.655 | ~0.757 |
| Natural frequency ωn | 3.81 rad/s | ~10.64 rad/s |
| Steady-state RMS error | 0.46° | 0.47° |
| Steady-state mean error | −0.08° | −0.08° |
| 95th percentile error | 1.03° | 0.91° |
| Closed-loop bandwidth (−3 dB) | 0.53 Hz | 1.65 Hz |
| Structural resonance | 4.9 Hz, 34 dB | — |
| Mean Vq in steady state | 0.46 V | −0.01 V |
The 0.46°/0.47° RMS steady-state error reflects residual low-frequency oscillation visible in the error signal between step transitions. The 4.9 Hz structural resonance on the elevation axis is suppressed by a second-order IIR notch filter applied post-loopFOC(), but the 34 dB peak before notching indicates significant mechanical energy at frequencies adjacent to the notch — a broader-bandwidth notch or structural damping treatment may be warranted. The non-zero mean Vq on elevation (0.46 V) confirms the velocity integrator is actively holding the camera weight against gravity, as designed. The near-zero azimuth Vq confirms the direct-voltage control mode is not fighting a constant load torque.
GPS iTOW absent from telemetry. The NEO-M9N NAV-PVT iTOW field is not transmitted, preventing the ground station from detecting stale GPS measurements. This is the highest-priority firmware revision item.
Flat-earth approximation. The NED coordinate conversion in geometry.py uses a flat-earth model valid to approximately 0.01% error at ranges up to 10 km. Operation at longer ranges requires an ellipsoidal geodetic transformation.
CDF latency compensation spinbox cap. The latency_s UI spinbox is capped at 300 ms, below the recommended tuning range. The cap should be raised to 1000 ms.
Single-frequency GPS. The NEO-M9N is an L1-only receiver. Horizontal position accuracy under open-sky conditions is approximately 2–3 m CEP, which at 100 m slant range represents a ±1.1° angular uncertainty — comparable to the measured steady-state error.
Steady-state tracking error. The 0.46°–0.47° RMS error exceeds what a narrow-field zoom camera requires for reliable target retention at long range. The primary candidate for improvement is mechanical — reducing the structural resonance amplitude or increasing the notch filter bandwidth without exciting the adjacent modes.
iTOW in telemetry frame. Adding the 4-byte GPS timestamp to the AirTelemetry_t struct enables the ground station to compensate for stale measurements and provides a common time reference for multi-sensor data fusion.
Optical augmentation. The DetectionBasedCameraTracker module in the ground station is under development. When complete, it will publish angular corrections from the camera frame to the gimbal controller, closing an optical feedback loop around the GPS-primary tracking system.
Dual-frequency GPS. Upgrading to an L1/L5 receiver (e.g., u-blox F10) would reduce position uncertainty to sub-meter CEP under open-sky conditions, relaxing the accuracy requirement on the gimbal and enabling operation at longer tracking ranges.
Mechanical damping. A review of the gimbal structure to identify and treat the source of the 4.9 Hz resonance — likely a compliant joint in the elevation axis — would reduce the notch filter requirement and potentially lower steady-state tracking error.
The described system demonstrates that a GPS-primary beacon tracking approach is viable for small UAS tracking at tactically useful ranges using commercial off-the-shelf hardware. The end-to-end pipeline from airborne sensor to camera pointing is implemented in approximately 3,500 lines of embedded C++ and 6,000 lines of Python, all openly licensed under PolyForm Noncommercial 1.0.0.
The dominant engineering constraints are the critically-damped filter latency (inherent in the smooth-tracking requirement) and the GPS position uncertainty at close range (inherent in single-frequency L1 receivers). Both are well-characterized and have defined upgrade paths. The system as built provides a solid foundation for the optical augmentation layer that represents the logical next development phase.
Source code, schematics, and plant characterization data are available at github.com/lcumbridge/mrs-beacon-tracker. Commercial licensing inquiries should be directed to microrobosys.com.