research and things |
hii this is a summary of all the research i've done since around 2020. my main interests are
natural language processing, ai safety, computational lingusitics, digital agents, and conversational ai, although i have experience with speech and image processing, human-computer interaction, and music and ai. feel free to also check out my google scholar at this link! additionally, below, you can find brief summaries/abstracts of my publications. enjoy! ^_^
|
ProGRes: Prompted Generative Rescoring on ASR n-Best [PAPER] Conference: IEEE Spoken Language Technology Workshop
"Large Language Models (LLMs) have shown their ability to improve the performance of speech recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence scorer LLM. We evaluated our approach using different speech recognizers and observed significant relative improvement in the word error rate (WER) ranging from 5% to 25%. "President Botrick: An Analysis of Deep Learning-Based Conversational AI Models to Identify and Create Influential Political Speeches [PAPER] Conference: AAAI 2023 Workshop for AI and Diplomacy
"This paper explores the defining qualities of language that are considered influential and charismatic in the context of political speech. Transformer-based models have shown to be efficient in analyzing contextual clues and generating coherent texts in a variety of domains. With limited research in the identification and exploration of the replication of persua- sion in natural human language and generation of influential speech, we seek to analyze the aspects of public speech that are deemed persuasive and impactful, and generate text accordingly. We propose a two-part experiment: First, we train a BERT-based encoder to weigh segments of speech in or- der to predict its influence on an audience; second, we train a GPT-based decoder to use an established understanding of persuasion to generate new political speech. We show that, using these models, a speech can be created that mimics the natural language habits of prominent political figures."Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis [PAPER] Conference: Language Resources and Evaluation Conference 2022
"In this paper, we compare two different approaches to language understanding for a human-robot interaction domain in which a human commander gives navigation instructions to a robot. We contrast a relevance-based classifier with a GPT-2 model, using about 2000 input-output examples as training data. With this level of training data, the relevance-based model outperforms the GPT-2 based model 79% to 68%, and an Oracle combination set an upper-bound of 85%. We also present a taxonomy of types of errors made by each model, indicating that they have somewhat different strengths and weaknesses, so we also examine the potential for a combined model."ML-Based Eye Tracking for Augmented Reality Heads-Up Displays (AR HUDs) [PAPER] Conference: Society for Information Display Annual Display Week 2021
"3D Augmented Reality (AR) Heads-up Displays (HUDs) have the potential of overlaying virtual objects at the correct locations with accurate motion parallax. Accurate overlays require tracking the pupils of the driver’s eyes. We developed an ML- based pupil tracking system based on a convolutional neural network (CNN) to find the precise location of the pupils." |
to send us an email use: rep@heavensgate.com