In my portfolio, I've undertaken eight diverse and challenging projects, reflecting my growth and expertise in data science and visualization:
-
Basic R Functions with City Tax Parcel Data: This project was an exploration of basic R functions through the lens of Syracuse, NY's city tax parcel data. It laid the foundation for my understanding of data manipulation and analysis in R.
-
Logical Grouping in R: Building on the first project, I delved into creating logical groups from the Syracuse city tax parcel data, enhancing my ability to segment and analyze data in meaningful ways.
-
Customized Graphics with R: This project advanced my skills in data visualization, where I used plotting functions in R to create customized graphics, presenting the city tax parcel data in a visually compelling format.
-
Dynamic Graphics with R Shiny: Introducing the R Shiny package, I developed dynamic graphics and interactive elements, such as a drop-down menu and season statistics highlights, offering an engaging way to present and explore data.
-
Traffic Crash Data Analysis in Tempe: I tackled real-world issues by analyzing Tempe's traffic crash data, aiming to identify causes and recommend strategies to reduce traffic-related injuries and fatalities, showcasing my ability to derive actionable insights from complex datasets.
-
Data Merging Techniques: This project highlighted my skills in combining datasets using the merge() function, a crucial technique in data science for linking related observations and uncovering deeper insights.
-
Interactive Dashboard for Traffic Accident Insights: I created a comprehensive dashboard using Tempe's traffic accidents dataset, featuring six interactive tabs and various visualization tools to provide an in-depth analysis of traffic accidents, demonstrating my proficiency in dashboard creation and user interaction design.
-
Flight Delay Prediction with Regression Models: In this advanced project, I analyzed a flight dataset to predict delays using various regression models, including Logistic Regression, KNN, and Naive Bayes. My work included data wrangling, transformation, and a detailed analysis of delays by reason and time, culminating in performance visualization through confusion matrices.
Each project reflects a step forward in my data science journey, showcasing a commitment to learning and applying a wide array of techniques and tools to solve real-world problems.
- This project was completed as part of the Introductory Data Science course in R, taught at the Andrew Young School of Policy Studies, Georgia State University.
- The course material is available in the public repository ays-r-coding-sum-2022.
- Special thanks to the course instructors:
- Professor [Jamison Crawford], whose guidance was invaluable. Jamison Crawford's GitHub.
- Professor [Jesse Lecy], whose insights were fundamental to the learning process. Jesse Lecy's GitHub.