| research and things |
EDIT: i'm no longer updating this page, but my google scholar will always have my most recent work and interests :-)
hii this is a summary of all the research i've done since around 2020. my main interests are
natural language processing, ai safety, computational lingusitics, digital agents, and conversational ai, although i have experience with speech and image processing, human-computer interaction, and music and ai.feel free to also check out my google scholar at this link! additionally, below, you can find brief summaries/abstracts of my publications. enjoy! ^_^
|
|
ProGRes: Prompted Generative Rescoring on ASR n-Best [PAPER] Conference: IEEE Spoken Language Technology Workshop
"Large Language Models (LLMs) have shown their ability to improve the performance of speech
recognizers by effectively rescoring the n-best hypotheses generated during the beam search process.
However, the best way to exploit recent generative instruction-tuned LLMs for hypothesis rescoring is
still unclear. This paper proposes a novel method that uses instruction-tuned LLMs to dynamically expand
the n-best speech recognition hypotheses with new hypotheses generated through appropriately-prompted
LLMs. Specifically, we introduce a new zero-shot method for ASR n-best rescoring, which combines
confidence scores, LLM sequence scoring, and prompt-based hypothesis generation. We compare
Llama-3-Instruct, GPT-3.5 Turbo, and GPT-4 Turbo as prompt-based generators with Llama-3 as sequence
scorer LLM. We evaluated our approach using different speech recognizers and observed significant
relative improvement in the word error rate (WER) ranging from 5% to 25%. "
President Botrick: An Analysis of Deep Learning-Based Conversational AI Models to Identify and Create Influential Political Speeches [PAPER]Conference: AAAI 2023 Workshop for AI and Diplomacy
"This paper explores the defining qualities of language that are considered influential
and charismatic in the context of political speech. Transformer-based models have shown
to be efficient in analyzing contextual clues and generating coherent texts in a variety
of domains. With limited research in the identification and exploration of the replication
of persua- sion in natural human language and generation of influential speech, we seek to
analyze the aspects of public speech that are deemed persuasive and impactful, and generate
text accordingly. We propose a two-part experiment: First, we train a BERT-based encoder
to weigh segments of speech in or- der to predict its influence on an audience; second, we
train a GPT-based decoder to use an established understanding of persuasion to generate new
political speech. We show that, using these models, a speech can be created that mimics the
natural language habits of prominent political figures."
Comparing Approaches to Language Understanding for Human-Robot Dialogue: An Error Taxonomy and Analysis [PAPER]Conference: Language Resources and Evaluation Conference 2022
"In this paper, we compare two different approaches to language understanding for a
human-robot interaction domain in which a human commander gives navigation instructions to a
robot. We contrast a relevance-based classifier with a GPT-2 model, using about 2000 input-output
examples as training data. With this level of training data, the relevance-based model outperforms
the GPT-2 based model 79% to 68%, and an Oracle combination set an upper-bound of 85%. We also
present a taxonomy of types of errors made by each model, indicating that they have somewhat
different strengths and weaknesses, so we also examine the potential for a combined model."
ML-Based Eye Tracking for Augmented Reality Heads-Up Displays (AR HUDs) [PAPER]Conference: Society for Information Display Annual Display Week 2021
"3D Augmented Reality (AR) Heads-up Displays (HUDs) have the potential of overlaying
virtual objects at the correct locations with accurate motion parallax. Accurate overlays
require tracking the pupils of the driver’s eyes. We developed an ML- based pupil tracking
system based on a convolutional neural network (CNN) to find the precise location of the pupils."
|


to send us an email use: rep@heavensgate.com