I am a third year PhD student in the Department of Statistics & Data Science at Carnegie Mellon University. I work with Professor Edward Kennedy on causal inference problems related to algorithmic fairness and bias. I also worked with Professor Rebecca Nugent on clustering methods to estimate latent class parameters in cognitive diagnosis models.
I previously worked as a Data Scientist/Quantitative Analyst Intern at Google in New York City, applying statistical machine learning to text classification. I was also a Data Science intern at Box, Inc. in Redwood City, where I built a model to identify new marketing and sales leads.
Before starting my PhD, I worked as a (Senior) Research Specialist at the Center for Advanced Study of Language at the University of Maryland. I worked on research in psycholinguistics, bilingualism, and second language acquisition.
Outside of grad school, I enjoy hiking, running, swimming, music, and studying languages.
PhD in Statistics & Data Science, 2021
Carnegie Mellon University
M.S. in Statistics, 2017
Carnegie Mellon University
B.S. in Math, 2016
University of Maryland
B.A. in Linguistics, 2009
University of Michigan
Data Scientist Intern, Summer 2018
Google, New York City, NY
Data Science Intern, Summer 2017
Box, Redwood City, CA
Senior Faculty Research Specialist, 2010--2016
University of Maryland Center for Advanced Study of Language
05/2019—08/2019: Data Scientist internship at Google in Mountain View, CA.
01/2019: Presentation at the AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society in Honolulu, HA.
07/2018: Presentation at the Doctoral Consortium at the Educational Data Mining conference in Buffalo, NY.
06/2018—08/2018: Data Scientist internship at Google in New York City.
06/2018: Poster presentation at the Classification Society Meeting in Stony Brook NY.
05/2018: Poster presentation at the Atlantic Causal Inference Conference in Pittsburgh, PA.
11/2017: National finalist at the Citadel Data Open in New York, NY.
09/16/2017: Winner of the Citadel Data Open datathon at Carnegie Mellon, with fellow students Nic Dalmasso, Kwhangho Kim, and Chirag Nagpal. (550+ student applications, around 125 students selected to compete)
05/2018—08/2018: Data Science internship at Box in Redwood City, CA.
Cognitive Diagnosis Models (CDMs) are a type of latent class model used frequently in education and psychometrics. One typical application is to estimate whether students in a learning environment have mastered or not mastered each of a set of K skills. Traditional likelihood-based estimation methods are computationally intensive and become intractable for large datasets. I am investigating how to optimally estimate skill profiles using a lightweight, model-agnostic alternative: clustering, via k-means type algorithms or hierarchical agglomerative clustering. In particular, I’m investigating how to optimally perform this clustering when there are hierarchical relationships among skills that restrict the skill space by rendering some skill profiles impossible. (Supervised by Professor Rebecca Nugent)
The subject of algorithmic fairness has received increasing attention in recent years, in particular since the publication of a 2016 ProPublic article alleging that software used in the criminal justice system to predict recidivism is biased against blacks. There has been much debate about what criteria are appropriate in order to designate an algorithm (or a decision process generally) as “fair.” I am thinking about ways to frame fairness in counterfactual terms within a causal inference framework; e.g. by answering questions like “would this person recidivate if they were granted bail?” as opposed to “what is the likelihood of this person recidivating generally?” (Supervised by Professor Edward Kennedy)
A weekly reading group around methods and theoretical issues in causal inference.
A weekly reading group around a broad range of topics including clustering, unsupervised learning, data visualization, online learning, and data science education.