Posted on

Understanding ETL development with a case study on terrorist activity and its connection to country demographics

ETL (Extract, Transform, Load) development is an essential process in data management. It enables organizations to collect, transform, and transfer data from various sources into a central location, usually a data warehouse or database, for further analysis and reporting. In this post, we will explore a case study on ETL development, where we examine the connection between terrorist activity and country demographics using large datasets.

Data Sources: For this case study, we used two CSV datasets; one containing information on terrorist activity from 1972 to 2022 and another containing information on country demographics, economic status, and military expenditures. We used web scraping to supplement the project from the New York Times API.

Data Extraction: To extract the data from the sources, we used Python and its Pandas library to read the CSV files. Web scraping was done using the BeautifulSoup library. The extracted data was then cleaned and preprocessed to ensure that it was in the correct format and ready for analysis.

Data Transformation: The extracted data was then transformed to make it usable for analysis. This involved cleaning and preprocessing the data, removing duplicate values, filling in missing values, and transforming the data into the correct format. I mainly used the Pandas library to perform these tasks.

Data Loading: The transformed data was then loaded into a MongoDB database, which is a document-oriented NoSQL database. We used PyMongo, a Python driver for MongoDB, to connect to the database and load the data.

Data Analysis: We used Python’s libraries such as Pandas, Plotly, PyMongo, and Scikit-learn to analyze the data through a machine learning model and answer the following questions:

  • Which organizations are responsible for the most deaths and injuries from 1972-2022?
  • Which states/regions of a country had the most terrorist attacks from 1972-2022?
  • What are the most common categories of terrorism within each region from 1972-2022?
  • On what particular decade where the most terroist acts committed across the world?
  • What is the demographic breakdown (gender ratio and percentage of each age range in a population) of each state where terrorism was committed?

To showcase the insights obtained from the analysis, we built a web-based dashboard using Flask, a Python web framework. The dashboard contains interactive visualizations including graphs, tables, a map, and web-scraped articles.

Conclusion: This case study demonstrates how ETL development can be used to analyze large datasets and answer complex questions. It also highlights the importance of the different steps involved in ETL development and the tools and technologies used to perform each step. With the increasing availability of data, ETL development will continue to play a critical role in data management and decision-making.

Github Repo: https://github.com/namin1993/Capstone_Project

Posted on

App for Crime and Poverty Analysis in Philadelphia

I developed a web application for visualizing and analyzing crime data in Philadelphia. The goal of the application is to provide a user-friendly interface for exploring crime patterns and trends in the city, and to give recruiters an understanding of my technical skills.

The data for the project was sourced from the Philadelphia Police Department and included information on over 2 million crimes reported in the city between 2021 and 2022. I used Python to clean and process the data, which was provided in the form of CSV files.

Those CSV files were then converted into GeoJSON files in order for the the data to be more easiliy accessible by the application. GeoJSON mapped the data by location. This allowed me to create interactive crime maps that display information on the number and type of crimes in each neighborhood.

The front-end of the application was built using Javascript, specifically LeafletJS, an open-source JavaScript library for creating maps. With LeafletJS, I was able to create an intuitive and interactive user interface that allows users to pan, zoom, and filter the data on the crime maps. I also implemented support of multiple data sets that provides multiple dimension to look at the crime data on top of different districts in Philadelphia.

This application will be further developed in the future to also includes several other visualization tools and statistics to help users understand the data. Users can view crime statistics for different neighborhoods, view time-series plots of crime rates, and compare crime rates across different time periods.

Overall, this project demonstrates my skills in data processing, web development, and visualization. I was able to take a large and complex data set and turn it into a valuable resource for understanding crime patterns and trends in Philadelphia. I believe that my web application would be an invaluable tool for anyone interested in the relationship between crime and poverty in their city.

Application: http://phillycrime.nehla.codes

Github: https://github.com/namin1993/Philadelphia_Crime_Map