Survey & Data Science · NLP · Computational Social Science · Public Opinion Measurement
Portfolio · GitHub · LinkedIn · Google Scholar
I am a Survey and Data Science graduate student at the University of Maryland, College Park, with prior graduate training in Data Science and undergraduate training in Statistics.
My work sits at the intersection of survey methodology, machine learning, NLP, public discourse analysis, and computational social science. I am especially interested in how data are generated, measured, validated, and interpreted — whether the data come from surveys, social media, administrative records, news, or large language models.
I build projects that connect statistical rigor with practical data systems: survey response modeling, LLM-assisted discourse coding, sentiment and stance analysis, policy measurement, multilevel modeling, and reproducible research workflows.
- Survey methodology, nonresponse, mode effects, and Total Survey Error
- LLM evaluation for public opinion and digital trace measurement
- NLP pipelines for framing, metaphor, stance, and sentiment analysis
- Computational social science using news, Reddit, Bluesky, YouTube, and administrative data
- Multilevel modeling, causal inference, and interpretable machine learning
- Public-facing research tools, dashboards, and reproducible workflows
|
LLM-assisted metaphor, stance, and framing analysis of AI discourse across news and social media. Methods: LLM annotation, metaphor detection, stance classification, embeddings, NLP pipelines |
Survey methodology project evaluating postcard reminders, mail vs. web mode effects, response rates, and subgroup nonresponse. Methods: response-rate analysis, bootstrap inference, logistic regression, chi-square tests |
|
Policy/data project comparing immigration narratives from news and Reddit with CBP and ICE enforcement indicators. Methods: monthly aggregation, z-scores, divergence measures, regression, residual diagnostics |
AAPOR-selected project comparing LLM-generated EV sentiment with observed public discourse from Reddit, news, and online data. Methods: sentiment analysis, LLM evaluation, platform comparison, public opinion measurement |
|
Interpretable ML project combining Global Terrorism Database event records with international news framing to analyze terrorism severity rankings. Methods: random forest, decision trees, feature importance, residual analysis, media framing |
Public narrative analytics project studying how ThuggerDaily activity temporally aligned with YSL RICO trial discourse across sentiment, engagement, volume, and topic prevalence. Methods: schema standardization, sentiment scoring, engagement normalization, topic grouping, event-window analysis, pre/post tests, lag correlations, regression summaries, DiD, interrupted time series, Streamlit, Neon/Postgres |
Survey methodology | sampling | nonresponse | mode effects | Total Survey Error
NLP | sentiment analysis | stance classification | metaphor/framing analysis
LLM evaluation | prompt workflows | text-as-data | computational social science
Multilevel modeling | causal inference | regression | interpretable machine learning
Data cleaning | reproducible reporting | dashboards | research communication
- AAPOR 80th Annual Conference — EV public sentiment and LLM comparison project
- IISA 2025 — Media-aware GTI ranking analysis
- NCSET Best Paper — Topological Data Analysis on DNA/RNA structures
- NCSET Best Paper — PageRank and HITS citation-network analysis
I am currently focused on research and applied data roles that combine:
- rigorous survey/statistical methodology,
- large-scale text and public discourse data,
- LLM evaluation and AI-assisted coding workflows,
- policy, social, and behavioral data analysis,
- reproducible data products for research and decision-making.