Publications

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

people standing in front of a screen with images and a chipboard

Our teams aspire to make discoveries that impact everyone, and core to our approach is sharing our research and tools to fuel progress in the field.

Sort By
  • Title
  • Title, descending
  • Year
  • Year, descending
1 - 15 of 11309 publications
    Preview abstract The rapid expansion of the Internet of Things (IoT) and smart home ecosystems has led to a fragmented landscape of user data management across consumer electronics (CE) such as Smart TVs, gaming consoles, and set-top boxes. Current onboarding processes on these devices are characterized by high friction due to manual data entry and opaque data-sharing practices. This paper introduces the User Data Sharing System (UDSS), a platform-agnostic framework designed to facilitate secure, privacy-first PII (Personally Identifiable Information) exchange between device platforms and third-party applications. Our system implements a Contextual Scope Enforcement (CSE) mechanism that programmatically restricts data exposure based on user intent—specifically distinguishing between Sign-In and Sign-Up workflows. Unlike cloud-anchored identity standards such as FIDO2/WebAuthn, UDSS is designed for shared, device-centric CE environments where persistent user-to-device bind-ing cannot be assumed. We further propose a tiered access model that balances developer needs with regulatory compliance (GDPR/CCPA). A proof-of-concept implementation on a reference ARMv8 Linux-based middleware demonstrates that UDSS reduces user onboarding latency by 65% and measurably reduces PII over-exposure risk through protocol-enforced data minimization. This framework provides a standardized approach to identity management in the heterogeneous CE market. View details
    Preview abstract Large Language Models (LLMs) such as ChatGPT can infer personal attributes from seemingly innocuous text, raising privacy risks beyond memorized data leakage. While prior work has demonstrated these risks, little is known about how users estimate and respond. We conducted a survey with 240 U.S. participants who judged text snippets for inference risks, reported concern levels, and attempted rewrites to block inference. We compared their rewrites with those generated by ChatGPT and Rescriber, a state-of-the-art sanitization tool. Results show that participants struggled to anticipate inference, performing a little better than chance. User rewrites were effective in just 28% of cases - better than Rescriber but worse than ChatGPT. We examined our participants’ rewriting strategies, and observed that while paraphrasing was the most common strategy it is also the least effective; instead abstraction and adding ambiguity were more successful. Our work highlights the importance of inference-aware design in LLM interactions. View details
    The Synthetic Gap: Automating Forensic Investigation of "AI Slop" with the Scaled Abuse Forensics Examiner (SAFE)
    Vahid Jalali
    Longling Wang
    Geethik Narayana Kamineni
    Utkarsh Chaudhary
    Crystal Zhao
    Lucas Liu
    2026
    Preview abstract Generative AI capabilities have enabled malicious actors to flood online platforms with "AI slop"—mass-produced, low-quality synthetic media designed to overwhelm traditional integrity systems. These adversarial campaigns often utilize coordinated networks to distribute unique, localized variations of synthetic content, rendering static detection methods ineffective. The signals to detect coordination often have recall gaps. The content is not exactly duplicative to be in the same repetitive video cluster. The abusers however show similar patterns of behavior which need forensics. Manual forensic investigations cannot scale to match the velocity of these generative attacks. To address this, we present SAFE (Scaled Abuse Forensics Examiner), an automated multi-agent architecture designed for the scalable forensics of adversarial synthetic media. The system decomposes the investigation process into specialized agents: a Cluster Understanding Agent specialized in analyzing the relations between channels in a cluster, a Behavior Understanding Agent that identifies inorganic spatiotemporal patterns, and a Content Understanding Agent that utilizes LoRA-adapted Large Language Models (LLMs) and few-shot learning to detect existing policy violations and spirit of the policy violations respectively . A Root Agent synthesizes these multimodal signals to render a final verdict. Early deployment results indicate that SAFE significantly accelerates the identification of novel synthetic threats, reducing forensic investigation time compared to human-in-the-loop workflows. View details
    Preview abstract The emergence of Agentic AI—autonomous systems capable of reasoning, decision-making, and multi-step execution—represents a paradigm shift in enterprise technology. Moving beyond simple generative tasks, these agents offer the potential to solve long-standing industry pain points, with over 90% of enterprises planning integration within the next three years. However, the transition from successful proof-of-concept (PoC) to a resilient, production-grade system presents significant hurdles. This article categorizes these challenges into three primary domains: Technical and Engineering Hurdles: Issues such as "entangled workflows" that complicate debugging, the struggle to maintain output quality and mitigate hallucinations, and the unpredictability caused by shifting underlying models or data sources. People, Process, and Ecosystem Hurdles: The high operational costs and unclear ROI of large models, the necessity of a new "Agent Ops" skillset, the complexity of integrating agents with disparate enterprise systems, and a rapidly evolving regulatory landscape. The Pace of Change and Security risks: The technical debt incurred by shifting software frameworks and the expanded attack surface created by autonomous agents. The article concludes that successful deployment requires a shift from informal "vibe-testing" to rigorous engineering discipline. By adopting code-first frameworks, establishing robust evaluation metrics (KPIs), and prioritizing functional deployment over theoretical optimization, organizations can effectively manage the lifecycle of Agentic AI and realize its transformative business value. View details
    Preview abstract Source-to-source compilers may perform inefficiently by executing transpilation passes on scripts that do not contain the specific language features a pass is designed to transform, potentially leading to redundant processing. A compiler can analyze a script to generate a per-script feature map, for example, by identifying language features in its abstract syntax tree (AST). Before executing a transpilation pass, the compiler can check this map and may bypass the pass for that script if the specific feature targeted by the pass is not present. This feature map can also be dynamically updated throughout the compilation process as other passes transform the code. This method of conditional pass execution based on content-aware analysis may reduce redundant AST traversals, which could decrease overall compilation time and computational resource consumption. View details
    Preview abstract We introduce a new context-enriched time series forecasting benchmark TimesX. TimesX contains a wide selection of high-quality real-world time series and diverse textual contexts from an automated generating pipeline, which helps address three main issues of existing benchmarks: (1) poor generalization due to low data volume and data being synthetic, (2) restricted forms of context, and (3) an inability to mitigate data leakage. We conduct a thorough empirical study of current multimodal solutions on TimesX. Our results suggest that most multimodal solutions that work well on existing benchmarks may fail on TimesX. In contrast, simple ensemble methods that leverage the rich textual context can outperform strong unimodal baselines and other multimodal baselines. ** Below this is what was submitted to ITP. ** We create a real world multimodal time-series forecasting benchmark that encompasses diverse domains and regions. Each time-series is annotated by various kinds of contexts like metadata, date and holiday information, dynamic events related to the time-series. This is sufficiently more advanced than other available benchmarks which rely wither on static metadata alone or synthetic examples. This forms a test bed for multimodal forecasting. We also present some baseline results showing that ensembles of publicly available LLMs and time-series foundation models can demonstrate non-trivial performance on this bechmark. View details
    Preview abstract Communicating spatial tasks via text or speech creates ``a mental mapping gap'' that limits an agent’s expressiveness. Inspired by co-speech gestures in face-to-face conversation, we propose \textsc{AgentHands}, an LLM-powered XR system that equips agents with hands to render responses clearer and more engaging. Guided by a design taxonomy distilled from a formative study (N=10), we implement a novel pipeline to generate and render a hand agent that augments conversational responses with synchronized, space-aware, and interactive hand gestures: using a meta-instruction, \textsc{AgentHands} generates verbal responses embedded with \textit{GestureEvents} aligned to specific words; each event specifies gesture type and parameters. At runtime, a parser converts events into time-stamped poses and motions, driving an animation system that renders expressive hands synchronized with speech. In a within-subjects study (N=12), \textsc{AgentHands} increased engagement and made spatially grounded conversations easier to follow compared to a speech-only baseline. View details
    Preview abstract **Agentic Engineering** is the rigorous discipline of treating Large Language Models as semi-autonomous systems that execute complex, multi-step workflows (trajectories) based on verifiable specifications, rather than using them as simple autocomplete engines. Here is a brief summary of its core principles: * **Main Goals:** It aims to maximize the agent's autonomous run-time, multiply a single engineer's impact by running parallel tasks, and offload tedious boilerplate coding. * **The "Harness":** A raw model is virtually useless without heavy investment in a harness—comprising tools, system prompts, and strict guardrails—to reliably guide the model and enforce coding policies. * **Loss of Micro-Control:** Engineers must surrender idiosyncratic stylistic preferences; if the agent's code passes automated linters and tests, it is accepted. * **Meta-Debugging:** When failures occur, engineers no longer fix code syntax. Instead, they debug the workflow itself—adjusting the agent's tools, search queries, or prompt constraints to ensure repeatable success. View details
    Preview abstract The remarkable success of Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in 2D computer vision has catalyzed significant research into their adaptation for the complex domain of 3D analysis. However, a fundamental dichotomy exists between the regular, dense grid of 2D images and the irregular, sparse nature of 3D data formats such as point clouds and meshes. This paper provides a comprehensive survey and a novel intellectual framework for navigating this burgeoning field. Our core contribution is a new taxonomy that organizes adaptation strategies into three distinct families: (1) Data-centric methods, which project 3D data into 2D formats to leverage off-the-shelf 2D models; (2) Architecture-centric methods, which design intrinsic network modules to directly process 3D data; and (3) Hybrid methods, which synergistically combine pre-trained 2D features with 3D modeling processing pipelines to benefit from both rich visual priors and explicit geometric reasoning. Through this taxonomic lens, we conduct a systematic review and qualitative synthesis of the field. We illuminate the fundamental trade-offs between these families concerning computational complexity, reliance on large-scale pre-training, and the preservation of geometric inductive biases. Based on this analysis, we identify and discuss critical open challenges and chart promising future research directions, including the development of 3D foundation models, advancements in self-supervised learning for geometric data, and the deeper integration of multi-modal signals. This survey serves as an essential resource and roadmap for researchers seeking to understand and advance the state-of-the-art in 3D computer vision. View details
    Preview abstract PURPOSE: To introduce Cardio Load (CL), a metric quantifying cardiovascular work from all activities across the day, and to investigate its distribution by age, gender, and workout profiles. CL adapts the Training Impulse (TRIMP) model by leveraging continuous heart rate and movement data from wearables, enabling minute-level intensity estimation. We also discuss the derivation of weekly target loads, intended to guide fitness maintenance. METHODS: A retrospective analysis was conducted on 31.2 million hours of wrist-worn wearable data collected over a six-week period. The dataset comprised a 40,000-subject subset (37.9% female) of consenting Google Pixel Watch® users in the United States, aged 18 to 80 years (18-39: 41.8%, 40-59: 43.5%, 60+: 14.6%). Measured data included minute-interval heart rate averages, resting and maximum heart rates, minute-interval averaged accelerometer log energy, and manually-logged or auto-detected activity types. Cardio Load scores and target loads were calculated daily for each subject and compared across age and gender. We also compared the proportions of CL gained during workouts and incidental daily activities for these groups. RESULTS: Overall, the study population's mean ± SD weekly CL scores were 221 ± 156 (female) and 259 ± 169 (male). Median weekly Cardio Load (CL) values exhibited consistency for individuals between 30 and 75 years of age. When analyzed in five-year age groups, the coefficient of variation (CV%) of median weekly CL values within this age range was less than 4.5%, with younger and older subjects demonstrating higher and lower median CL, respectively. The median proportion of CL accumulated during structured workouts versus incidental daily activity was 41.0% (female) and 49.0% (male) for all subjects, though this varied considerably with average weekly workout duration. CV% of weekly target load and daily target load over 6 weeks was 23.6% and 35.2% respectively. CONCLUSION: Cardio Load provides a continuous quantification of activity load from wearables, acknowledging both structured workouts and everydayincidental activity. CL is equitably rewarded for age ranges spanning 30-75 years. Weekly target loads were found to have little measurement variability and be more consistent and, consequently, more practical for planning training and physical activity than daily targets. View details
    Performance analysis of updated Sleep Tracking algorithms across Google and Fitbit wearable devices
    Arno Charton
    Linda Lei
    Siddhant Swaroop
    Marius Guerard
    Michael Dixon
    Logan Niehaus
    Shao-Po Ma
    Logan Schneider
    Ross Wilkinson
    Ryan Gillard
    Conor Heneghan
    Pramod Rudrapatna
    Mark Malhotra
    Shwetak Patel
    Google, Google, 1600 Amphitheatre Parkway Mountain View, CA 94043 (2026) (to appear)
    Preview abstract Background: The general public has increasingly adopted consumer wearables for sleep tracking over the past 15 years, but reports on performance versus gold standards such as polysomnogram (PSG), high quality sleep diaries and at-home portable EEG systems still show potential for improved performance. Two aspects in particular are worthy of consideration: (a) improved recognition of sleep sessions (times when a person is in bed and has attempted to sleep), and (b) improved accuracy on recognizing sleep stages relative to an accepted standard such as PSG. Aims: This study aimed to: 1) provide an update on the methodology and performance of a system for correctly recognizing valid sleep sessions, and 2) detail an updated description of how sleep stages are calculated using accelerometer and inter-beat intervals Methods: Novel machine learning algorithms were developed to recognize sleep sessions and sleep stages using accelerometer sensors and inter-beat intervals derived from the watch or tracker photoplethysmogram. Algorithms were developed on over 3000 nights of human-scored free-living sleep sessions from a representative population of 122 subjects, and then tested on an independent validation set of 47 users. Within sleep sessions, an algorithm was developed to recognize periods when the user was attempting to sleep (Time-Attempting-To-Sleep = TATS). For sleep stage estimation, an algorithm was trained on human expert-scored polysomnograms, and then tested on 50 withheld subject nights for its ability to recognize Wake, Light (N1/N2), Deep (N3) and REM sleep relative to expert scored labels. Results: For sleep session estimation, the algorithm had at least 95% overlap on TATS with human consensus scoring for 94% of nights from healthy sleepers. For sleep stage estimation, comparing with the current Fitbit algorithm, Cohen’s kappa for four-class determination of sleep stage increased from an average of 0.56 (std 0.13) to 0.63 (std 0.12), and average accuracy increased from 71% (std 0.10) to 77% (std 0.078) Conclusion: A set of new algorithms has been developed and tested on Fitbit and Pixel Watches and is capable of providing robust and accurate measurement of sleep in free-living environments. View details
    Preview abstract Object-Counting for remote-sensing (RS) imagery is raising increasing research interest due to its crucial role in a wide and diverse set of applications. While several promising methods for RS object-counting have been proposed, existing methods focus on a closed, pre-defined set of object classes. This limitation necessitates costly re-annotation and model re-training to adapt current approaches for counting of novel objects that have not been seen during training, and severely inhibits their application in dynamic, real-world monitoring scenarios. To address this gap, in this work we propose RS-OVC - an adaptation of existing work for Open Vocabulary Counting (OVC) approach from general computer vision to the RS domain. We show that our model is capable of accurate counting of novel object classes, that are unseen during training, based solely on textual and/or visual conditioning. View details
    Agentic Coding Needs Proactivity, Not Just Autonomy
    Georgios Evangelopoulos
    (2026) (to appear)
    Preview abstract Coding agents are rapidly changing the landscape of software development, moving from inline com- pletion to autonomous systems that edit repositories, open pull requests, respond to issues, and run scheduled or webhook triggered routines across the development life cycle. The next generation is increasingly described as proactive and long-horizon: agents should notice relevant changes before the developer asks, connect signals across tools, decide when to interrupt, and carry preferences across sessions. Yet the field lacks a precise account of what proactivity means for software development, how it differs from autonomy, what acceptance criteria proactive long-horizon tasks should satisfy, and which metrics determine whether unsolicited agent behavior is useful rather than merely active. We argue that proactive coding agents should be evaluated by the quality and improvement of their insight policy: the policy that decides what matters next, what evidence supports it, whether to surface it, and how to adapt after feedback. We re-anchor this view in mixed initiative interaction, introduce a three level taxonomy (Reactive, Scheduled, and Situation Aware), compare contemporary coding agents against five operational criteria, and sketch an active user simulation protocol with three evaluation targets: Insight Decision Quality (IDQ), Context Grounding Score (CGS), and Learning Lift (LL). View details
    A Computer Vision Problem in Flatland
    Erin Connelly
    Annalisa Crannell
    Timothy Duff
    Rekha R. Thomas
    SIAM Journal on Applied Algebra and Geometry, 10 (2026), pp. 14-45
    Preview abstract When is it possible to project two sets of labeled points of equal cardinality lying in a pair of projective planes to the same image on a projective line? We give a complete answer to this question, obtaining the following results. We first show that such a pair of projections exist if and only if the two point sets are themselves images of a common point set in projective space. Moreover, we find that for generic pairs of point sets, a common projection exists if and only if their cardinality is at most seven. In these cases, we give an explicit description of the loci of projection centers that enable a common image. View details
    DeduBB: Binary Code Size Reduction via Post-Link Basic Block De-duplication
    Chaitanya Mamatha Ananda
    Rajiv Gupta
    Mahbod Afarin
    Han Shen
    LCTES (Languages, Compilers, Tools and Theory of Embedded Systems) (2026) (to appear)
    Preview abstract Binary sizes of newer versions of software applications tend to be larger, primarily due to feature bloat. This poses various challenges, particularly for mobile applications. It affects upgrade rates directly impacting revenues, increases maintenance costs of supporting multiple versions, and prevents some users from getting critical security fixes. Code bloat also poses a problem for large warehouse-scale applications. Such applications experience performance degradation when their code size exceeds what smaller and more efficient code models can handle. In this paper, we introduce a post-link optimization tech nique called DeduBB, which deduplicates basic blocks of an application across procedure boundaries. While prior tech- niques used function outlining to de-duplicate redundant code sequences, it missed out on many opportunities as it cannot handle code that manipulates the program stack. In addition, previous techniques were either limited to the scope of a module or lacked scalable implementations required to handle large warehouse-scale applications. Our technique, DeduBB, handles all types of code duplication as we use a novel save-and-jump code pattern to execute de-duplicated code blocks. In addition, DeduBB has been designed to work on scalable post-link optimizers and can even be applied to large warehouse-scale datacenter applications. Finally, DeduBB is profile-guided and can be applied selectively to infrequently executed cold basic blocks to not affect application performance. In fact, in several cases, the performance of the smaller application binary improves due to reductions in its hot working set size. We have implemented our technique on the state-of-the-art post link optimizers, BOLT and Propeller. Experiments show that we can significantly reduce the code size of several benchmarks by 1.55% to 18.63%, on both Arm and x86 platforms, and on binaries that have already been heavily optimized for size using existing code size reduction features. Furthermore, aided by profiles, our technique can retain more than 80% of the maximal code size savings without affecting performance. View details
    ×