The repository covers a project aimed to analyze Chicago crime data in relation to socioeconomic factors to provide actionable insights and recommendations for reducing crime rates through effective data visualization and trend analysis.
Chicago has been identified as a city with one of the highest crime rates in the United States. Crime rates can impact tourism, immigration, and business opportunities while also providing a measurement for how policy reform has impacted a city. This project will analyze crime trends over the past 20 years in Chicago, looking at the correlation between geography, income, and crime, and how policy has impacted crime rates, crime types, and citizen sentiment related to safety and trust of the police.
- City of Chicago wants better understanding of crime rates and police sentiment across the city analyzed over time
- Prevalence of crime in different neighborhoods
- Types of crimes being committed
- How crimes influence citizens’ feeling of safety and trust in police
- City officials want to identify heavily impacted areas
- Determine if areas need additional reforms
- Do areas need reform changes to make a higher impact
- Analysis will include
- Distribution of crime by type
- Distribution of crime by region
- Citizen sentiment by region and income
Data Model (Chicago_Crime_data_model.mwb)
- Contains the MySQL Workbench file to create relational model and EER diagram
STM (STM.xlsx)
- Contains the Google Sheets for Source to Target Mapping (STM)
Google Cloud Storage Loading (Chicago Crime Database ETL.html, ETL Script.ipynb)
- Contains a Python notebook to clean and transforma data to load into Google Cloud Storage for data storage
- Identification of datasets
- Reported incidents of crime that occurred in the City of Chicago from 2001 to present
- Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system
- Survey of citizen safety and police sentiment scores
- Development of relational models
- Data preprocessing to select only relevant columns to analysis
- Determined Primary Keys and Foreign Keys to create relationships between tables
- Extraction/Cleaning of data
- Download of data into CSV format
- Development of python script to clean/import records into relational models
- Import of records into relational model developed by team via Google Cloud
- Development of visualizations
- Development of data source modeling based on ETL tables
- Visualizations developed to address questions raised in business case
- Inclusion of filtering based on time to review time series trends
https://data.cityofchicago.org/browse?category=Public%20Safety
https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-Present/ijzp-q8t2
https://data.cityofchicago.org/Public-Safety/Police-Sentiment-Scores/28me-84fj
- MySQL Workbench to create relational model and EER diagram
- Google Sheets for Source to Target Mapping (STM)
- Python to clean and transform data and load into Google Cloud Storage
- Google Cloud Storage to store cleaned dataset
- Google BigQuery to query dataset
- Cloud Storage and BigQuery were utilized due to the size of the dataset being large enough to cause issues within SQL Server
- Tableau to create reports and dashboard
A normalized, relational diagram was created as there were not enough tables to necessitate a dimensional model. We did not require denormalization in any of our tables.
- Assume that District ID in both datasets match
- Rows containing columns that are NULL will be dropped such as: IUCR, FBI, Beat, etc.
- Average of Safety Grouped by District
- Average of Trust Grouped by District
- Average of Safety Grouped by District For Low, Medium and High Income Levels Respectively
- Average of Trust Grouped by District For Low, Medium and High Income Levels Respectively
Analysis: We analyzed the trend of crime incidents over time.
Visualization: Line plot illustrating crime counts by year.
Insight: Our data shows overall crime decreasing over time.
Analysis: We evaluated count of crime over a decade, separated into the first half and last half of the decade (5 years each).
Visualization: Heatmat of crime type by year.
Insight: Theft, Battery & Criminal Damage remain top three crime types across the last ten years.
Analysis: We assessed crime rates per capita across different neighborhoods.
Visualization: Heatmap depicting crime per capita by neighborhood.
Insight: Englewood, East Garfield Park and West Englewood have the highest crime per capita in Chicago.
Analysis: We visualized the correlation between trust scores of citizens and safety scores of neighborhoods.
Visualization: Scatterplot of overall trust plotted against overall safety.
Insight: There is a strong positive correlation between overall safety scores and trust.
Analysis: We examined the relationship between low, medium, and high income levels and their corresponding safety scores.
Visualization: Three scatterplots showing each income level plotted against overall safety scores.
Insight: High income residents trust police in their districts despite safety scores.
Analysis: We assessed safety and trust scores across various districts in Chicago.
Visualization: HHeatmap displaying overall safety and trust scores by district.
Insight: Overall Safety and Trust scores are the lowest for Englewood and Harrison.
7. Is there a difference in correlation between trust and safety scores for each income level group?
Analysis: We examined the relationship between trust and safety scores within different income level groups.
Visualization: Scatterplot illustrating the correlation between trust and safety scores by income segments, along with a correlation matrix for trust and safety ratings.
Insight: There is overall a positive correlation between safety and trust sentiment scores. However, low and medium-income residents tend to trust the police when safety scores are high. In contrast, there is a weaker correlation for high-income residents, who maintain trust in the police even when safety scores are low.
Based on the insights from this analysis
1. Allocate More Resources to Specific Crimes: Increase resources dedicated to addressing theft, battery, and criminal damage to enhance community safety.
2. Leverage the Positive Correlation Between Safety and Trust: Focus on building trust in areas with low safety and trust sentiment scores to improve overall community sentiment.
3. Target Districts:
- District 7 - Englewood (South Side)
- District 11 - Harrison (West Garfield)
4. Conduct a Deeper Analysis of Englewood: Investigate Englewood further, as it exhibits the highest crime rate per capita along with the lowest safety and trust sentiment scores.
5. Explore the Relationship Between Trust and Crime: Analyze whether higher trust levels lead to lower crime rates or if low crime rates foster higher trust in the police.