research and things

M16, the Eagle nebula hii this is a summary of all the research i've done since around 2020. my main interests are natural language processing, ai safety, computational lingusitics, digital agents, and conversational ai, although i have experience with speech and image processing, human-computer interaction, and music and ai.

feel free to also check out my google scholar at this link!

additionally, below, you can find brief summaries/abstracts of my publications. enjoy! ^_^

ProGRes: Prompted Generative Rescoring on ASR n-Best [PAPER]

Conference: IEEE Spoken Language Technology Workshop

             "Large Language Models (LLMs) have shown their ability to improve the performance of speech 
              recognizers by effectively rescoring the n-best hypotheses generated during the beam search process. 
              However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is
              still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand
              the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted
              LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines 
              confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare 
              Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence 
              scorer LLM. We evaluated our approach using different speech recognizers and observed significant 
              relative improvement in the word error rate (WER) ranging from 5% to 25%. "

President Botrick: An Analysis of Deep Learning-Based Conversational AI Models to Identify and Create Influential Political Speeches [PAPER]

Conference: AAAI 2023 Workshop for AI and Diplomacy

             "This paper explores the defining qualities of language that are considered influential 
              and charismatic in the context of political speech. Transformer-based models have shown
              to be efficient in analyzing contextual clues and generating coherent texts in a variety
              of domains. With limited research in the identification and exploration of the replication 
              of persua- sion in natural human language and generation of influential speech, we seek to 
              analyze the aspects of public speech that are deemed persuasive and impactful, and generate 
              text accordingly. We propose a two-part experiment: First, we train a BERT-based encoder 
              to weigh segments of speech in or- der to predict its influence on an audience; second, we 
              train a GPT-based decoder to use an established understanding of persuasion to generate new 
              political speech. We show that, using these models, a speech can be created that mimics the 
              natural language habits of prominent political figures."

Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis [PAPER]

Conference: Language Resources and Evaluation Conference 2022

             "In this paper, we compare two different approaches to language understanding for a
              human-robot interaction domain in which a human commander gives navigation instructions to a
              robot. We contrast a relevance-based classifier with a GPT-2 model, using about 2000 input-output
              examples as training data. With this level of training data, the relevance-based model outperforms
              the GPT-2 based model 79% to 68%, and an Oracle combination set an upper-bound of 85%. We also 
              present a taxonomy of types of errors made by each model, indicating that they have somewhat
              different strengths and weaknesses, so we also examine the potential for a combined model."

ML-Based Eye Tracking for Augmented Reality Heads-Up Displays (AR HUDs) [PAPER]

Conference: Society for Information Display Annual Display Week 2021

             "3D Augmented Reality (AR) Heads-up Displays (HUDs) have the potential of overlaying
              virtual objects at the correct locations with accurate motion parallax. Accurate overlays 
              require tracking the pupils of the driver’s eyes. We developed an ML- based pupil tracking 
              system based on a convolutional neural network (CNN) to find the precise location of the pupils."


to get back home: | home! |

to send us an email use:   rep@heavensgate.com