Synthesizing Detection Data from Multiple Sources | Steinmetz Symposium

We study how a social robot with limited computational resources can maintain an accurate, live estimate of a person's position by fusing noisy, asynchronous outputs from multiple detectors. In a deliberately minimal setting, two vision based methods (the "MIT Method" and "BOXES") each report a one-dimensional angular position when they detect a person, with different update rates, noise levels, and systematic biases. Using an inherited dataset of 4,207 labeled samples collected in a motion capture lab across twelve scenarios (six motion patterns under normal and low-light conditions), we compare three lightweight fusion approaches against a simple baseline. The fusion methods we compare are an Exponentially Weighted Moving Average (EWMA), a scalar Kalman Filter (KF) that models the angle as a random walk with sensor specific noise and bias, and a Dempster-Shafer Theory-based (DST) filter that maintains a discrete belief distribution over angle and combines evidence on a grid along the line. All methods process the same merged measurement stream and are evaluated against interpolated ground truth using mean absolute error (MAE), root means squared error (RMSE), and median absolute error (MedAE), with additional trajectory plots for qualitative analysis. Across all scenarios, each fusion method improves on the last value baseline, with the Dempster-Shafer approach achieving the lowest overall errors and the Kalman filter producing the smoothest and most visually stable trajectories, particularly in low-light conditions. The introduction of a bias term allows both the Kalman Filter and the Dempster-Shafer approach to correct for bias present in the foundational detection systems. Exponentially weighted moving average offers modest gains over the baseline, but is more sensitive to rapid detector fluctuations making it less reliable. Our results show that even simple, computationally inexpensive fusion schemes can meaningfully improve tracking accuracy over naive use of individual detectors, and they highlight tradeoffs between accuracy, smoothness, and bias correction that are relevant for deployment on social robots and other systems.

Presenting

Primary Speaker

Shane Mullahy

Additional Speakers

Neil Daterao

Hunter Gould

Faculty Sponsors

Aaron Cass

Abstract Details

Presentation Type

Poster

Faculty Department/Program