-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
235 lines (221 loc) · 16.9 KB
/
index.html
File metadata and controls
235 lines (221 loc) · 16.9 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
<!DOCTYPE HTML>
<!--
Miniport by HTML5 UP
html5up.net | @ajlkn
Free for personal and commercial use under the CCA 3.0 license (html5up.net/license)
-->
<html>
<head>
<title>Nina Plotko - Portfolio</title>
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1, user-scalable=no" />
<link rel="stylesheet" href="assets/css/main.css" />
</head>
<body class="is-preload">
<!-- Nav -->
<nav id="nav">
<ul class="container">
<li><a href="#top">Top</a></li>
<li><a href="#skills">Skills</a></li>
<li><a href="#portfolio">Portfolio</a></li>
<li><a href="#contact">Contact</a></li>
</ul>
</nav>
<!-- Home -->
<article id="top" class="wrapper style1">
<div class="container">
<div class="row">
<div class="col-5-large">
<span class="image fit"><img src="images/professional_headshot_NinaPlotko.jpg" alt=""/></span>
</div>
<div class="col-8 col-7-large col-12-medium">
<header>
<h1>Hi, I'm <strong>Nina Plotko</strong></h1>
</header>
<p>And I am Senior Data Scientist at ADP. I finished my B.S. in Industrial and Systems Engineering at Georgia Institute of Technology. My concentration within my major was Data Analytics and Science. This is my data science portfolio. Scroll down to see my coursework, skills, and project experience. You can also check out some of my work on <a href="https://github.com/nplotko">GitHub</a> and find me on <a href="https://www.linkedin.com/in/nina-plotko-09a7b41a7/">LinkedIn</a>.</p>
</div>
</div>
</div>
</article>
<!-- Skills -->
<article id="skills" class="wrapper style2">
<div class="container">
<header>
<h2>Coursework at Georgia Tech</h2>
</header>
<div class="row aln-center">
<div class="col-10-medium">
<section class="box style1">
<p>Data Input and Manipulation, Linear Algebra, Calculus (up to Multivariable), Discrete Mathematics, Probability with Applications, Statistics, Regression and Forecasting, Database Systems, Engineering Economy, Simulation Analysis and Design, Courses in Optimization (Engineering Optimization and Advanced Optimization), Classes in Stochastic Systems (Stochastic Manufacturing and Service Systems, Advanced Stochastic Systems), Machine Learning, Methods for Quality Improvement, Human-Computer Interaction, Foundations of Modern Data Science.</p>
</section>
</div>
</div>
<br>
<br>
<header>
<h2>My Data Science Skills</h2>
</header>
<div class="row aln-center">
<div class="col-10-medium">
<section class="box style2">
<h3>Large Language Models (LLMs) and Natural Language Processing (NLP)</h3>
<p>Prior to the popular release of GPT-3.5, I was using classic NLP techniques like embedding models (BERT), calculating semantic distance, and building sentiment/topic classifiers. Now, I have over a year of experience using LLMs for retrieval-augmented Q&A, content generation, and knowledge engineering. I'm skilled in combining traditional NLP tools with prompting techniques to create high-quality and reliable output from LLMs.</p>
</section>
</div>
<div class="col-10-medium">
<section class="box style2">
<h3>Data Visualizations</h3>
<p>Using Python packages like Matplotlib, Seaborn, and Plotly I create meaningful visualizations to drive my exploratory data analysis and convey my findings from predictive models. I create scatter plots, bar and line charts, pie charts, heatmaps, cloropleth graphs, as well as statistical graphs like boxplots and histograms. Aggregating data and plotting in various ways helps people outside of data specialists understand the dataset and view trends.</p>
</section>
</div>
</div>
<br>
<div class="row aln-center">
<div class="col-10-medium">
<section class="box style1">
<h3>Machine Learning Algorithms</h3>
<p>My deep foundational understanding of popular models like regression, neural networks, and K nearest neighbors gives me an advantage when implementing algorithms with Scikit-Learn and Statsmodels. My skill for data analytics assists in the ML pipeline when performing exploratory data analysis. I have a strong appreciation for hyperparameter tuning, in which I utilize K-fold cross-validation, validation curves, and grid search to optimize parameters. I have trained models on standard datasets and corporate data. I have also experimented with unsupervised learning techniques like LDA and dimensionality reduction techniques like PCA and UMAP. </p>
</section>
</div>
</div>
<br>
<div class="row aln-center">
<div class="col-10-medium">
<section class="box style1">
<h3>Data Analytics</h3>
<p>I have a passion for identifying trends in data. My statistical knowledge combined with my technical skills make me an excellent data analyst. Using data manipulation libraries like Pandas and NumPy, I can calculate central tendency measures and view the distributions for data to understand if, why, and how something occurred in the dataset. I have experience doing analyses with datasets on Kaggle, as well as webscraping and using APIs to analyze web data.</p>
</section>
</div>
</div>
<br>
<footer>
<p>My experience in data science has also allowed me to work with packages that run on GPU, including PyTorch and TensorFlow, which speed up calculations for large projects. Other than my proficiency in Python, I have experience using SQL and R for projects.</p>
<a href="#portfolio" class="button large scrolly">See some of my recent work</a>
</footer>
</div>
</article>
<!-- Portfolio -->
<article id="portfolio" class="wrapper style3">
<div class="container">
<header>
<h2>My Data Science Projects</h2>
<p>Below are the descriptions of some of the projects I have worked on. The projects with links will take you to my personal github for the code (in the form of a jupyter notebook).</p>
</header>
<div class="row aln-center">
<div class="col-6">
<article class="box style3">
<h3>LLM Search Snippets</h3>
<p><strong>Project as Data Scientist at ADP</strong><br>This was a proof of concept for augmenting the Enterprise Search experience with an LLM-powered Q&A. The PoC focused on creating "Google search snippets" for questions relating to payroll, retirement, benefits, and other HR topics. My role on this project was the generation and testing portion, where I did the prompt engineering, guardrails, and data manipulation from the search results. </p>
</article>
</div>
<div class="col-6">
<article class="box style3">
<h3>LLM Training Scenario Creation</h3>
<p><strong>Project as Data Scientist at ADP</strong><br>This was a proof of concept for augmenting the Learning Business Parter role with an LLM-powered training scenario creation tool. The PoC focused taking high-quality calls and using GPT models to create new scenarios for associates to learn. This solution created high-quality training scenarios at a rate 20x faster than without Generative AI. </p>
</div>
</div>
<br>
<div class="row aln-center">
<div class="col-6">
<article class="box style3">
<h3>LLM RAG Q&A for Human Resource Outsourcing</h3>
<p><strong>Project as Data Scientist at ADP</strong><br>This was a proof of concept for augmenting the HR Business Partner role with an LLM-powered Q&A. The PoC focused on family and medical leave (FMLA). I uploaded government websites, regulations, and fact sheets to S3 and set-up a vector database within AWS OpenSearch. I ran experiments and benchmarked the search parameters and the prompts for different LLMs like GPT-3, GPT-4, and Claude v1. </p>
</article>
</div>
<div class="col-6">
<article class="box style3">
<h3>Outlier Detection Algorithm</h3>
<p><strong>Project as Data Scientist at ADP</strong><br>Upstream misclassifications and unexpected salaries were affecting a downstream deep learning model (a compensation benchmarking model). I developed the algorithm which identifies the misclassifications based on semantic distance and statistical anomoly detection techniques. This algorithm now runs in production on monthly payroll data, about 20M+ records. The resulting filtered table is used for training the downstream model and also powers the many Analytics products at ADP.</p>
</div>
</div>
<br>
<div class="col">
<article class="box style2">
<h3>Cost Estimation Tool for Honeywell Building Technologies</h3>
<p><strong>Senior Design Project</strong><br>Honeywell approached the team with high deviations between their cost estimates and actual cost incurred. I led the team in building one of the deliverables, the Cost Estimation Tool, which predicts the labor cost of the project. The Cost Estimation Tool has three regression models in the back-end, one for each type of project. This project involved a meticulous feature selection process to determine which materials and project attributes should be used as variables. I led the regression assumption analysis, feature engineering and selection, and the training of the models. Each model is non-linear, with interactions between the material type and the project attributes. This project had another deliverable, the Risk Simulation Model. Together, these deliverables won best overall Industrial Engineering project in the Georgia Tech Senior Design Capstone Expo.</p>
</article>
</div>
<br>
<div class="row aln-center">
<div class="col-6">
<article class="box style3">
<h3><a href="https://github.com/nplotko/IMDB-movie-reviews">IMDB Movie Review Analysis and Classification</a></h3>
<p><strong>Personal Project</strong><br>Using the famous IMDB dataset (taken from Kaggle), I performed exploratory data analysis, created wordclouds, and pre-processed using TF-IDF vectorization. Then, I performed cross-validation on logistic regression and K nearest neighbor models using sklearn's validation_curve to visualize the increase in accuracy over the hyperparameter setting. I also compared this to LinearSVC. The packages used in this project were Pandas, Sci-Kit Learn, and SpaCy.</p>
</article>
</div>
<div class="col-6">
<article class="box style3">
<h3>Analysis of Sitewide Feedback</h3>
<p><strong>Project as Data Science and Machine Learning Intern at ADP</strong><br>Led the project kickoff to streamline to process for viewing sitewide feedback of a product. I experimented with unsupervised learning models like LDA (topic modeling) and UMAP with HDBSCAN (clustering). I also built a multi-label topic classification model identifying whether feedback was about support, system performance, UI/UX issues, and more. To do this, I fine-tuned SBERT to perform this task. I also trained a classifier to predict the sentiment around feedback. </p>
</div>
</div>
<br>
<div class="row aln-center">
<div class="col-6">
<article class="box style2">
<h3><a href="https://github.com/nplotko/university-data-EDA">Analysis of University Data</a></h3>
<p><strong>Personal Project</strong><br>To learn more about the Plotly library of Python, I performed exploratory data analysis on a dataset relating to university data. This dataset includes world rank, teaching scores, research scores, the number of students, the female/male ratio, and more. The exploratory data analysis consists of bar charts, line charts, wordclouds, and scatter plots. Everything built with Plotly is interactive. Here, I tried to learn what qualities make a university ranked higher, so I performed linear regression to predict the total score of a university given the features of the dataset.</p>
</article>
</div>
<br>
<div class="col-6">
<article class="box style2">
<h3><a href="https://github.com/nplotko/double-descent-lr">Experiments with Overparameterized Linear Regression Models</a></h3>
<p><strong>Project for Foundations of Modern Data Science Class </strong><br>After researching the novel "Double Descent Phenomenon," I conducted a series of experiments with randomly generated data to understand and visualize this phenomenon's relation to linear regression. One experiment replicates the work of Preetum Nakkiran, while the others were based on my own research. As this project featured a form of linear regression unapplicable to sklearn's estimator, I performed the linear algebra calculations to train the models used here. I also used PyTorch to speed up the matrix operations.</p>
</article>
</div>
<br>
<div class="col-6">
<article class="box style2">
<h3>Applied Data Science Capstone</h3>
<p><strong>Project for IBM Data Science Professional Certificate on Coursera</strong><br>Given a few datasets containing information on SpaceX rocket launches, I performed the full data science pipeline. This included data cleaning, normalization, and exploratory data analysis, in which I created a dashboard using Plotly's Dash package. Using this dashboard, I learned correlations between rocket characteristics and launch failures. I, then, performed feature engineering and created a classifier to predict successful missions with multiple algorithms and analyzed the effect of payload mass on landing success rates.</p>
</article>
</div>
<div class="col-6">
<article class="box style2">
<h3>Drone Grocery Delivery Database</h3>
<p><strong>Project for Database Systems Class</strong><br>In this class, I was given a series of 15 screens outlining the full functionality of a drone grocery delivery system with users including grocery store managers, drone technicians, and customers. Customers would pick a grocery store, select the groceries and quantities, and place an order. This order would be mapped to a drone to then deliver the groceries. From these screens, I created an ERD and mapped it to a relational schema. In SQL, I created the database, inserted the data into the database, and created procedures for the functionality of all users.</p>
</article>
</div>
<br>
<div class="col">
<article class="box style2">
<h3>Predicting Life Expectancy from Immunizations and Socio-Economic Factors</h3>
<p><strong>Project for Regression and Forecasting Class</strong><br>Using a Kaggle dataset, my team built multiple regression models and performed factor analysis to determine what the most significant factors were in predicting life expectancy for several countries. First, we cleaned the data. Next, we checked the assumptions for linear regression (constant variance and normality) and transformed the data to fix issues with collinearity of features. We used several techniques such as forward/backward selection, stepwise regression, and best subsets regression to learn the most significant features, and we used RMSE and R-squared as our accuracy measures. The most significant features found were mortality rates, deaths under 5 years of age, GDP, infant deaths, status of country (developing or developed), total government expenditure percentage, income composition of resources, and alcohol usage. With these features, the team was able to obtain an R-squared value of 83.76 percent. This project was done completely using R.</p>
</article>
</div>
</div>
<footer>
<a href="mailto:nplotko@gmail.com" class="button large scrolly">Email Me.</a>
</footer>
</div>
</article>
<!-- Contact -->
<article id="contact" class="wrapper style4">
<div class="container medium">
<div class="col-12">
<hr />
<h3>Find me on ...</h3>
<ul class="social">
<li><a href="https://www.linkedin.com/in/nina-plotko-09a7b41a7/" class="icon brands fa-linkedin-in"><span class="label">LinkedIn</span></a></li>
<li><a href="https://github.com/nplotko" class="icon brands fa-github"><span class="label">Github</span></a></li>
</ul>
<hr />
</div>
</div>
<footer>
<ul id="copyright">
<li>© Untitled. All rights reserved.</li><li>Design: <a href="http://html5up.net">HTML5 UP</a></li>
</ul>
</footer>
</div>
</article>
<!-- Scripts -->
<script src="assets/js/jquery.min.js"></script>
<script src="assets/js/jquery.scrolly.min.js"></script>
<script src="assets/js/browser.min.js"></script>
<script src="assets/js/breakpoints.min.js"></script>
<script src="assets/js/util.js"></script>
<script src="assets/js/main.js"></script>
</body>
</html>