My Projects

Index

Check out some of the projects I have worked on!

Foster Care Machine Learning Analysis

Personal Project

Exploration of Foster Care Data


Tools: python, pandas, sklearn, matplotlib, seaborn

  • Examine the impact of placement settings on the welfare of children in the foster care system over 20 years (2001-2021) Engineered robust data processing pipelines in Python to manage and analyze 16+ million records across two decades, enhancing the accuracy of predictive modeling for foster care placement stability.
  • Utilized advanced statistical methods to evaluate placement stability and identify factors contributing to successful outcomes, improving data-driven decision-making in child welfare policies.
  • Created predictive models to forecast the length of stay and likelihood of reunification, achieving a 75% accuracy rate, which informed strategic resource allocation for foster care services.
  • Analyzed longitudinal data to uncover trends in foster care entries and exits, providing insights that influenced the development of supportive programs for at-risk youth.
  • Presented findings to stakeholders through interactive dashboards and detailed reports, utilizing data visualization tools like Tableau to enhance understanding and facilitate action-oriented discussions.

Location, Location, Location: How Product Placement Impacts Clicks

UC Berkeley Course: Statistics

Product Placement Results Stargazer Output


Tools: R

  • Conducted a comprehensive analysis on the impact of product placement on user clicks in an e-commerce setting, leveraging over 165,000 clickstream data points.
  • Applied regression models to evaluate the statistical significance of webpage product location on click-through rates, uncovering insights on consumer behavior in digital retail environments.
  • Managed data preprocessing, including outlier removal and data transformation, to enhance model accuracy and reliability.
  • Demonstrated the negligible impact of product placement on clicks, while highlighting the significance of page number and product category on consumer engagement.
  • Identified and addressed potential model limitations and biases, proposing methodological improvements for future research.
  • Collaborated within a team to deliver actionable insights for optimizing online product presentation strategies to increase consumer engagement.
  • Final Paper

Analyzing Voting Difficulty

UC Berkeley Course: Statistics

Project 1 Image


Tools: R

  • Rigorous evaluation of voting difficulty disparities between Republican and Democratic voters using 2022 American National Election Studies (ANES) data.
  • Implemented data wrangling techniques to clean and process 1,585 survey entries, resulting in a refined dataset of 976 entries for analysis. Applied non-parametric Wilcoxon ranked-sum tests to statistically analyze differences in voting difficulty, achieving significant results with a p-value of 3.6163904 * 10-6.
  • Utilized Spearman's rank correlation to assess practical implications of voting difficulties, revealing a negative association indicating Democrats experienced greater voting challenges.
  • Communicated findings to inform outreach programs, advocacy efforts, and funding strategies for a leading Democratic NGO, highlighting the importance of addressing voting disparities.
  • Proposed enhancements for future studies, including multi-year data analysis, localized research, and improved participant incentivization to strengthen findings on voting equity.
  • Final Paper

Par for Partnerships

UC Berkeley Course: Data Engineering

Project 1 Image


Tools: Python, gmaps

  • Orchestrated a strategic golf tournament aimed at fostering relationships between CourseKey, an education software company, and potential clients in the trade school sector.
  • Leveraged data analytics to select 8 prime golf courses across 7 states, targeting 221 schools, optimizing for proximity and potential customer value.
  • Utilized MongoDB for efficient data management and Neo4j graph databases for complex relationship analysis, enhancing the targeting process for high-value prospects.
  • Applied innovative data science algorithms, including node similarity and weighted degree centrality, to prioritize invitations and optimize event logistics.
  • Implemented Redis for real-time leaderboard updates during the tournament, enhancing participant engagement and operational efficiency.
  • Achieved strategic marketing objectives by integrating advanced data analysis with practical event planning, setting the stage for significant revenue growth opportunities for CourseKey.
  • Final Presentation

Bechdel Movie Test

UC Berkeley Course: Programming for Data Scientists

Bechdel Genres with highest failure rates Bechdel Genres with highest failure rates


Tools: python, pandas, matplotlib, seaborn

  • Collaborated on a data science project to investigate the correlation between movie characteristics and the Bechdel Test scores, analyzing a merged dataset from Bechdel Test and IMDb.
  • Utilized statistical analysis techniques to process and evaluate data from over 800 movies, aiming to identify patterns related to female representation in films.
  • Employed Python for data cleaning, transformation, and analysis, ensuring rigorous examination of variables such as genre, revenue, budget, and IMDb scores.
  • Presented findings indicating no significant relationship between movie features and Bechdel Test passing rates, contributing to ongoing discussions on gender representation in media.
  • Highlighted the limitations and challenges of merging disparate datasets, addressing discrepancies in movie listings and the impact on data analysis outcomes.
  • Contributed to a comprehensive report detailing methodological approaches, trend analysis, correlation studies, and insights on female representation in films, enhancing academic and industry understanding of gender dynamics in media.
  • Final Paper

This Website

Personal Project

Website HTML/CSS Screenshot


Tools: HTML, CSS, Javascript

I started this website when I graduated in 2021. The benefits of creating it were two-fold. In my classes I had just scratched the surface of learning to program in HTML and CSS and wanted a productive way to hold onto and build this skill. And, of course, I was eager to build a portfolio where I would be able to show my skillset in a uniue and personal way. I have since realized, that this means that I will now have to maintain GitHub, LinkedIn, AND a personal website, but I actually flexing the HTML/CSS muscle since I don't use it much in my work anymore.