Ashish Shenoy

Applied researcher and machine learning engineer. Research interests span computer vision (scene text recognition), multimodal large language models (LLMs), privacy preserving ML (Federated Learning), on-device / wearables AI and Automatic Speech Recognition (ASR).

I also serve on technical program committee for EMNLP and ACL among others.

selected publications

KDD 24

Lumos: Empowering Multimodal LLMs with Scene Text Recognition

Ashish Shenoy, Yichao Lu, Srihari Jayakumar, and 11 more authors

In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Abs HTML

We introduce Lumos, the first end-to-end multimodal question-answering system with text understanding capabilities. At the core of Lumos is a Scene Text Recognition (STR) component that extracts text from first person point-of-view images, the output of which is used to augment input to a Multimodal Large Language Model (MM-LLM). While building Lumos, we encountered numerous challenges related to STR quality, overall latency, and model inference. In this paper, we delve into those challenges, and discuss the system architecture, design choices, and modeling techniques employed to overcome these obstacles. We also provide a comprehensive evaluation for each component, showcasing high quality and efficiency.
FL-ICML 23

Green Federated Learning

Ashkan Yousefpour, Shen Guo, Ashish Shenoy, and 7 more authors

2023

HTML
Interspeech 21

Adapting Long Context NLM for ASR Rescoring in Conversational Agents

Ashish Shenoy, Sravan Bodapati, Monica Sunkara, and 2 more authors

In Proc. Interspeech 2021, 2021

HTML PDF
ACL-ECNLP

ASR Adaptation for E-commerce Chatbots using Cross-Utterance Context and Multi-Task Language Modeling

Ashish Shenoy, Sravan Bodapati, and Katrin Kirchhoff

In Proceedings of the 4th Workshop on e-Commerce and NLP, Aug 2021

Abs HTML PDF

Automatic Speech Recognition (ASR) robustness toward slot entities are critical in e-commerce voice assistants that involve monetary transactions and purchases. Along with effective domain adaptation, it is intuitive that cross utterance contextual cues play an important role in disambiguating domain specific content words from speech. In this paper, we investigate various techniques to improve contextualization, content word robustness and domain adaptation of a Transformer-XL neural language model (NLM) to rescore ASR N-best hypotheses. To improve contextualization, we utilize turn level dialogue acts along with cross utterance context carry over. Additionally, to adapt our domain-general NLM towards e-commerce on-the-fly, we use embeddings derived from a finetuned masked LM on in-domain data. Finally, to improve robustness towards in-domain content words, we propose a multi-task model that can jointly perform content word detection and language modeling tasks. Compared to a non-contextual LSTM LM baseline, our best performing NLM rescorer results in a content WER reduction of 19.2% on e-commerce audio test set and a slot labeling F1 improvement of 6.4%.
Interspeech 22

Domain Prompts: Towards memory and compute efficient domain adaptation of ASR systems

Saket Dingliwal, Ashish Shenoy, Sravan Bodapati, and 3 more authors

In Proc. Interspeech 2022, Aug 2022

HTML PDF
ASRU 21

Remember the Context! ASR Slot Error Correction Through Memorization

Dhanush Bekal, Ashish Shenoy, Monica Sunkara, and 2 more authors

In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Aug 2021

HTML PDF
preprint

Now It Sounds Like You: Learning Personalized Vocabulary On Device

Sid Wang, Ashish Shenoy, Pierce Chuang, and 1 more author

Aug 2023

selected patents

patent

Wearable Device Including An Artificially Intelligent Assistant For Generating Responses Based On Shared Contextual Data, And Systems And Methods Of Use Thereof

Ashish Vishwanath Shenoy

2025

HTML
patent

Dynamic language model updates with boosting

Ashish Vishwanath Shenoy

2025

HTML
patent

Systems and Methods for Providing User Experiences on Smart Assistant Systems

Ashish Vishwanath Shenoy

2023

HTML
patent

Contextual biasing of neural language models using metadata from a natural language understanding component and embedded recent history

Ashish Vishwanath Shenoy

2023

HTML