Skip to content

Latest commit

 

History

History
131 lines (62 loc) · 7.84 KB

README.md

File metadata and controls

131 lines (62 loc) · 7.84 KB

Resource Allocation During Natural Disaster

Aim of the project:

  1. Analyze the IMD data and focus onto frequent flood hit areas
  2. Predicting the number of human casualties
  3. Estimating the distribution of displaced humans and thereby proposing warehouse locations

Dataset : The India Flood Inventory, a geospatial dataset developed in collaboration with the Indian Meteorological Department (IMD). This dataset provides valuable information on floods in India, including fatalities, damage, and other relevant parameters. However there is a lot of missing data that needs to be addressed before further analysis.

Glimpse of the columns present along with their Non - Null count, Data types and %age of missing values.

image

These columns : Location, Latitude , Longitude, Severity,Area affected, Human injured, Human displaced, Animal fatality and event source have most of the data missing (more than 80%)

Data Cleaning and Pre processing:

  1. Created Start month, Start year from start date, End month and end year from end date. Then dropped start date and end date columns. There were also instances when the start date was later than the end date in the dataset (~3%) so dropped the corresponding rows.

  2. Main cause column was treated with string conversion to lowercase, stripping off the whitespaces. On seeing the entries there are a couple of problems that need to be addressed in this column
    a) Entries such as 'flood' and 'floods', 'heavy rain' and 'heavy rains' are nothing but the same thing.There are a lot of entries with this problem.
    b) Apart from the most occuring data like heavy rain and flood, there are many other entries which occur only once and are in the form of a long string.
    Thus further preprocessing steps were taken which included punctuation removal + word lemmatization. I took the first 14 unique values into consideration and replaced others with 'other'.

  3. We needed to extract the latitude and longitude of the places in order to do the geospatial analysis. Upon going through the data present in these columns, these observations can be made:
    a) The essential data (i.e. data in latitude and longitude columns) is very less. We need to convert the district and state data into coordinates.
    b) First we will consider the dataframe where these latitudes and longitudes are null and the columns will be district, state and location
    c) On seeing the data, there are 410 rows where district data is null and 356 rows where state data is null. Actually state data is always present when district data is present and apart from that 54 rows contain only state information
    d) For the rest rows which contain neither coordinates nor district-state information we have data present in Location column
    We will be geocoding the data present in df_geo dataframe. It is nothing but converting an address to a location on map.

image

After we fetched the coordinates here are some plots which we created just to visualize the flood locations on a sample of 500 points. image

image

So now from the sample maps it could be seen that the north eastern and the southern parts of India are most affected from floods, we can also do a region specific geospatial analysis. Taking a window

image

image

  1. Severity, Area Affected, Human fatality, Human injured, Human Displaced, Animal Fatality, Description of Casualties/injured, Extent of damage :

    The EM_DAT event source has no information regarding the extent of the floods, so we will drop the corresponding rows. We will also take a range of coordinates to further narrow down our research to areas where most of the floods occur. As evident from the marker cluster map, we will take the north eastern part into consideration.

image

DFO source contains complete information about Severity, Area affected , Human fatality and Human Displaced whereas IMD source contains some information about Human fatality, Human injured, animal fatality, description and extent.

image

Data Visualization

image

image

image

image

image

image

image

image

Predictive Modelling

image

Predicted Human Fatality with R² 0.55 after feature selection, using Artificial Neural Networks.

image image

Now estimating humans displaced. Used KNNimputer to fill in the missing values. Achieved R² score of 0.58 using Linear Regression.

We plotted various heat maps to visually show how the impact of floods were spread across the state of Assam and its nearby region.

image

image

image

Warehouse Allocation

Now in order to determine the distribution of displaced people we assumed the distribution to be in accordance with the census population report. Since the last detailed census occurred in 2011 (Couldn't happen in 2021 due to covid), we had the following heatmap of assam population:

image

Each warehouse will correspond to one cluster. Location of warehouse will be equivalent to cluster center so as to reduce the within cluster sum of squares or the L2 distance between cluster center and other points.

image

We further boiled down to Kamrup district of Assam. This was done due to 2 main reasons : (i) More number of town population information was present in Kamrup District as of census 2011 compared to other districts. (ii) Tailoring to one district will give a more detailed solution of this problem as it will be meaningless for a state to have 5-6 warehouses.

image

image

In the context of disaster response planning in Assam, the Voronoi diagram delineates the boundaries of influence for each cluster centroid, effectively partitioning the geographical area into regions that are closest to each respective centroid.

image