Skip to content

This repository is for the team's work on measuring the integration and network effect of the 17 Sustainable Development Goals.

Notifications You must be signed in to change notification settings

PeishanLi/G5055_Practicum_Project2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

G5055_Practicum_Project2 | Fall 2021

Python R

Team Members (in alphabetical order)

  • Core Team: Qinyue Hao, Jasmine Hwang, Dan Li, Peishan Li, Rina Shin, Connie Xu, Hanyu Zhang
  • Supporting Team: Zhiwen Huang, Cara Latinazo, Xingchen Li, Soobin Oh, Lizabeth Singh, Mengying Xu, Tianqing Zhou

Project Description

This project is developed by graduate students in the Social Sciences program of Columbia University in collaboration with the UN SDG Fund. The objectives of this project was to develop models that could define and quantitatively measure the networks among the 17 Sustainable Development Goals (SDG). To do this we built two models, one text model based on indicator descriptions from UN SDG Indicator Metadata, and coefficient networks based on coefficients calculated using UN SDG Indicator Database. The team also looked at the similarity between the two models, and the generalizability of the network model from the two example countries.

The importance of this project lies in the context that the different domains of SDGs are interconnected and cannot be effectively resolved without being considered as interdependent, and the fact that although this networks should be both theoretical and evidence-based, few research have been conducted to validate their empirial groundings.

Additional information about the project can be found on the project slides here

Team member contact information can also be found on the slides.

Scoping and Methodologies:

Scoping:

For the coefficient model, we selected two specific countries: Indonesia and Guatemala.

The two countries were chosen considering countries of interest from the UN Joint SDG Fund, geographical distribution differences, similarities in factors such as population density, political stability, etc., as well as relative data availability.

For the coefficient model, the team is looking at data starting from 2012 to 2020.

Model Methodologies Used:

  1. Text Model: Network Model based on TF-IDF and cosine similarities between indicator descriptions from SDG metadata

  2. Coefficient Social Network Model: Whole Network, Positive and Negative Network Models based on coefficients for year-to-year changes in SDG indicator measures

    • Whole Network: An undirected, weighted network. All availalable indicators as nodes, statistically significant(p < 0.05) relationships as ties, and the corresponding correlation coefficents as weights of ties.
    • Positive Network: A subgraph of the whole network, with only the positive linkages and the indicators they connect.
    • Negative Network: A subgraph of the whole network, with only the negative linkages and the indicators they connect.
  3. QAP Procedure and Network Logistic Regression : QAP (Quadratic Assignment Procedure) procedure is a way to handel non-independence problem by permuting rows and columns in the matrix, while maintaining the underlying relationship. To focus on predicting the existence rather than the strength of ties, we made the positive and negative network models binary by recoding all the coefficients to 1, before doing Network Logistic Regression between them, to test for the predcitive strength of one network on another.

Final Deliverables

  1. Blog

  2. Research Paper

  3. Interactive Visualizations

  4. See key findings and other visualizations on the final presentation slides.

Repository Directory Contents:

├── Codes
	├── Data Accessing and Preprocessing 
	├── Text Model
	├── Coefficient Network Model 
		├── Composite Method
		├── Regression Models
	├── Representative Method ^^
		├── Correlation Analysis
		├── Pick Central Variable
	├── Data Visualizations
		├── coefficient network 
		├── text network 
		├── data missingness and disaggregation 
	
├── Data  
	├── Centrality_representative_results (1) ^^ 
		├── centrality_scores(after removing disaggregation)
		├── indicator_picked(before removing disaggregation)
		├── measure_picked(before removing disaggregation)
	├── Guatemala & Indonesia Correlation among Indicators (1) ^^
	├── Guatemala & Indonesia Correlation among Targets (1) ^^
	├── Guatemala & Indonesia Correlation among measurements (1) ^^ 
	├── Guatemala & Indonesia Correlation among measurements-WITHOUT disaggregation ^^ (1) 
	Guatemala & Indonesia Correlation among Targets Ungrouped.csv ^^
	Indonesia.csv & Guatemala.csv ^^
	Guatemala & Indonesia data after selecting one measurement for each indicator.csv ^^
	Guatmala & Indonesia Data Without Disaggregation.csv ^^
	├── variable_types (1) ^^
	├── variables_picked (1) ^^
	├── List of indicators (1) 
	├── Data_preprocessed_for_PCA (1) 
	├── PCA_results (1) 
	├── coefficient_network (1) ^^
	├── Text_Model_Data (2) 

├── Visualizations 
	├── Disaggregated_Data (1a) 
	├── Missing_Data (1a) 
	├── Model_Viz (1c) 	
	├── Interactive_Plots 
	├── Text_Model_Viz (2) 
	├── goal_hexcodes_edge.csv 
	├── goal_hexcodes.csv

^^ - item is likely deprecated 
(1), (2), (1 & 2) - refers to model that the folder is corresponding to. 

Codes

  1. Text Model
  1. Coefficient Social Network Model
  1. QAP Analysis
  • QAP_regression_sig.Rmd This Rmd shows the process of regressions between different networks with OLS Network models and Network Logistic Models.
  1. Data Visualization

Data pre-processing:

  1. Text Model
  1. Coefficient Social Network Model
  • UN_SDG_2_Functions.py This python package includes a function called preprocess. If users import UN_SDG_2_Functions, read an SDG file (2012-2021) CSV from the API, they can use this function to directly pivot data into indicator metric / time (year) format. Also available in more detail as .ipynb file
  • For_PCA_Data_Preprocessing.ipynb This jupyter notebook uses PCA to preprocess data for building coefficients at the indicator level.
  • Missingness_Imputation.ipynb This jupyter notebook includes code to impute missing data for UN countries' using linear regression slope fitting over time.

Please feel free to contact our team with questions, issues, and concerns. Thank you!

About

This repository is for the team's work on measuring the integration and network effect of the 17 Sustainable Development Goals.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published