This project, LA Crime Analytics, aims to analyze and predict criminal activities in Los Angeles using data analytics and machine learning techniques. The study leverages the crime dataset obtained from Kaggle to explore crime indicators, relationships between victim demographics and crime types, and the effectiveness of predictive policing.
The analysis is intended to help law enforcement agencies enhance their understanding of crime patterns and make data-driven decisions regarding crime prevention and resource allocation.
- Course: ISDS 577 - Seminar in Information Systems Implementation, Spring 2024
- Group Members:
- Jeevan Gowda Hemanth Kumar
- Sai Krishna Mallineni
- Anwesh Reddy Malgireddy
- Bala Avinash Allam
- Are there any crime indicators that are proportionate to those with/at risk for crime?
- Is there a correlation between victim age and victim sex? Is there an association between crime factors such as crime description, location, and purpose related to age and sex?
- What is the role of crime attributes like description, location, and date of occurrence in crime prevalence?
- Can machine learning models accurately predict crime based on survey data from this dataset?
- How can data analytics help identify and control crime in relation to area name, victim age, victim sex, and weapons?
- Dataset: Los Angeles Crimes Dataset obtained from Kaggle.
- Format: CSV (Comma Separated Values)
- Legal Privacy: The dataset is public and shared under the Community Data License Agreement.
- Data Attributes: 247,989 records from 2020 to 2023 with 12 feature variables, including crime area, victim information, crime description, and weapon details.
The dataset was cleaned to remove irrelevant variables and missing values. Only data from 2022 to 2023 was retained for analysis. The dataset was split into training and testing sets for machine learning tasks. The data was further cleaned to alleviate any classification imbalance issues.
- Data Analysis and Visualization: Python, Google Colab, ggplot, pandas
- Machine Learning: Scikit-learn (classification models such as logistic regression)
- Programming Languages: Python
- Crime Indicators: Data analysis identified key indicators of crime, including crime location, victim age, and victim sex.
- Crime Patterns: Analysis showed that areas like the 77th Street Division have the highest crime rates, while other locations such as Van Nuys have lower crime rates.
- Victim Demographics: Males have a higher likelihood of becoming victims compared to females, and adults aged 26-44 are the most at-risk age group.
- Machine learning algorithms were used to predict crime occurrences. The logistic regression model showed promising results, helping predict future crimes based on the dataset.
- Accuracy: KNN Random Classifier achieved 80% accuracy in crime prediction.
- Integration with Operations: Integrate the prediction model into law enforcement operations to improve crime detection.
- Resource Allocation: Allocate police resources effectively to high-risk areas to prevent crime.
- Strategic Planning: Use predictive insights to make informed decisions and take proactive measures to reduce crime rates.
- Setup: Load the crime dataset into Google Drive and mount it in Google Colab.
- Data Cleaning: Run the Python scripts provided in the notebook to clean and preprocess the data.
- Data Analysis and Visualization: Use the provided code to visualize crime patterns and generate insights.
- Machine Learning: Train and test machine learning models to predict crime.
- Improve the accuracy of machine learning models by including more relevant attributes.
- Explore additional datasets for better crime analysis and enhanced predictive capabilities.
This project uses data from Kaggle under the Community Data License Agreement.
- Crime Health Indicator Data: Kaggle Dataset
- Software and Libraries: Python, Google Colab, ggplot, pandas, scikit-learn
- Contextual Resources: See references in the final report for related literature and studies.