New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Iss435 #437

Draft

ekgutierrez1 wants to merge 31 commits into main from iss435

Collaborator

ekgutierrez1 commented Dec 17, 2024

The city homeless student files for 2019-2022 are ready. Specifically the total homeless student file and the subgroup by race file. I'll now turn to the county versions of the same.

cdsolari and others added 5 commits

October 17, 2024 12:19


          Update README.md

a378739


          Resolve merge conflict by accepting ReadMe suggestions

98c0c50


          started updating city subgroup dofile

9d43440


          changed dofile names to identify subgroup code for 2019 forward

41ab4ed


          updated code/cleaning in city subgroup

ac9c7c9

also added to gitignore

cdsolari requested a review from rpitingolo

December 17, 2024 21:34

ekgutierrez1 added 5 commits

December 17, 2024 16:47


          updated county file

5ae1775

still working on CT and quality checks


          turns out we don't need CT crosswalk for this

4960e05


          Update .gitignore

4b25324


          updated missings/quality checks

733cd01


          added some notes for reviewer

cb70619

Collaborator Author

ekgutierrez1 commented Dec 18, 2024

The county homeless student files for 2019-2022 are ready. Specifically the total homeless student file and the subgroup by race file. One big flag: CT counties prior to 2022 are not in crosswalk and therefore not in this data. This will need to be rerun when the crosswalk is fixed.

awunderground and others added 10 commits

January 2, 2025 15:36


          Add folder for final forms

0db2a88


          Merge pull request #442 from UI-Research/forms_folder

d6b67a1

Add folder for final forms


          homeless and ela county files

f699e67


          updating place-populations crosswalk to add 2014 PEP data

ef80eaa

Added 2014 PEP population data into the crosswalk manually since the API is limited


          Update create-place-populations.qmd

ad4b566


          Adding 2014 PEP population data and re-adding the 8 CT counties throu…

f431c24

…gh 2021

As discussed, aligning with the data team decision to maintain the original 8 CT counties through 2021.
Also manually adding PEP population data for 2014 to complete previously missing data.


          Updates to the README

1028be4

Kicked off more specific documentation about the crosswalks in the README - will revisit as needed


          Merge pull request #443 from UI-Research/Iss425

6ceac83

Iss425


          Merge branch 'version2025' of https://github.com/UI-Research/mobility…

5224a25

…-from-poverty into iss435


          used updated crosswalk to include CT in county files

e4c2d74

Collaborator Author

ekgutierrez1 commented Jan 9, 2025

CT is now available for prior years based on the updated crosswalk for all homelessness 2019-2022 as well as subgroups 2019-2022

Contributor

rpitingolo commented Jan 9, 2025

@ekgutierrez1 just finished reviewing. I will have line item comments in a bit. I need to do another pull to update for the commit you just pushed.

rpitingolo requested changes

View reviewed changes

Contributor

rpitingolo left a comment

@ekgutierrez1 added line item comments! Many of the comments from the city code also apply to the county as much of the code is duplicated.

02_housing/Homelessness_metrics_city_subgroup.do

+              cap n ssc install libjson
+              net install educationdata, replace from("https://urbaninstitute.github.io/education-data-package-stata/")
+              *Set up globals and directories

Contributor

rpitingolo Jan 9, 2025

In the checklist on the repo Wiki under reproducibility there is " The program runs from start to finish without stopping due to errors or incompleteness". This currently does not pass because of some of the code here in the set up section. The code will error if the folders in the mkdir lines already exist. I'm not sure how to do this off the top of my head, but the code should check to see if they exist first and then mkdir if they don't.

Collaborator Author

ekgutierrez1 Jan 29, 2025

The error it provides (that the folders already exist) does not prevent the rest of the code from running. If it is a new user, this code auto-creates the paths they need. Since it doesn't prevent the rest of the code from running but is necessary for the paths of the code, I'd prefer to leave as is (this is how all of my files currently run, but we can ask the higher ups).

02_housing/Homelessness_metrics_city_subgroup.do

               clear all
-              global gitfolder "C:\Users\ekgut\OneDrive\Desktop\urban\Github2\mobility-from-poverty"
-              global years 2019 2020 2021 // refers to 2019-20 school year through most recent data
+              global gitfolder "C:\Users\ekgut\OneDrive\Desktop\urban\Github\mobility-from-poverty"

Contributor

rpitingolo Jan 9, 2025

Also from the checklist: "The program avoids hardcoding local file paths and instead uses global paths that will work regardless of where the program is being ran"

Collaborator Author

ekgutierrez1 Jan 13, 2025

This uses a global - the user will have to hardcode this upon download. There isn't a way around this unfortunately. That's just how globals work.

02_housing/Homelessness_metrics_city_subgroup.do

+              *****************************
+              ****City/Place Crosswalk*****
+              *****************************
+              ** Import city crosswalk file to edit names of city crosswalk to match city location strings in CCD school district data

Contributor

rpitingolo Jan 9, 2025

It would be helpful if comments described more specifically what is happening. For example it looks like you are converting the place codes from numeric to string, then adding leading zeros but that isn't really described in the comment.

Collaborator Author

ekgutierrez1 Jan 29, 2025

I added relevant comments to this and the county dofiles.

02_housing/Homelessness_metrics_city_subgroup.do

+              		}
+              save "intermediate/ccd_lea_recent_city_race.dta", replace
+              *merge two ccd datasets together

Contributor

rpitingolo Jan 9, 2025

It looks like the merge has 3,896 unmatched records. Is that expected? It would be helpful in the comment to describe what the expectation is.

Collaborator Author

ekgutierrez1 Jan 29, 2025

Yes, it's expected. I added a line and a comment to explain.

02_housing/Homelessness_metrics_city_subgroup.do

               	unzipfile "EdDataEx Homelessness `year'.zip", replace
               	}
               	cd "${gitfolder}\02_housing\data"
-              *import csvs
+              *Due to changes in EdDataExpress website, 2022-23 data must be manually downloaded. Please follow the following steps.

Contributor

rpitingolo Jan 9, 2025

This causes me a lot of stress. When I went to the website there is a big red banner that says

Due to current system issues, datasets must be limited to fewer than 150,000 rows. A selection of data greater than 150,000 rows may result in a truncated dataset.

It looks like we may need an export larger than 150k so this may be a problem. Aside from that, it requires an amount of human intervention that is prone to error.

I don't know what the best solution is if the file is not able to be downloaded programmatically. Perhaps @awunderground or @jwalsh28 can weigh in?

Collaborator Author

ekgutierrez1 Jan 13, 2025

I worked this out with Claudia as the best case solution for this - but welcome other comments. Their website is problematic right now, so we gave instructions as best as possible in the comments for how to download the data. If you follow the instructions, it should work without the 150,000 problem.

02_housing/Homelessness_metrics_city_subgroup.do

               	}
-              *new as of 4/13/23 - updated 2/8/24 - in ACS-based metrics, if it was less than 30, it's set to NA
+              	*if aggregated enrollment is less than 30, quality of the variable is 3
               	replace homeless_quality = 3 if enrollment<30

Contributor

rpitingolo Jan 9, 2025

Should the replace homeless_quality = 3 if enrollment<30 line be part of the loop?

Collaborator Author

ekgutierrez1 Jan 29, 2025

No, the next set of code/loop takes care of the individual race categories. It total homeless student count is less than 30, regardless of the individual race counts, the quality should be 3. However, if the individual race categories homeless counts are less than 10, they should be replaced with NA, because they are smaller categories than aggregated counts.

02_housing/Homelessness_metrics_city_subgroup.do

               		replace `var'_count=-1 if `var'_count<=2
               		}
+              *merge to crosswalk of places/cities
+              merge 1:1 year state city_name using "intermediate/cityfile.dta"

Contributor

rpitingolo Jan 9, 2025

We are losing a vast majority of cases on this merge? Is that right? Felt like a red flag to me.

Collaborator Author

ekgutierrez1 Jan 29, 2025

It is correct because we only care about matches 2 and 3. I added some comments in the dofiles to help clarify.

02_housing/Homelessness_metrics_city_subgroup.do

-              *check quality
+              ****************
+              *Quality Checks

Contributor

rpitingolo Jan 9, 2025

A lot of quality checks are difficult for me to interpret as someone without much topical knowledge of the data. For checks where a large number of summary stats are created it would be helpful to know what to look for in those numbers. Alternatively using pass/fail checks would be a bit easier to interpret.

Collaborator Author

ekgutierrez1 Jan 29, 2025

Added assert commands where relevant and comments for other checks.

02_housing/Homelessness_metrics_county_subgroup.do

-              	*we replace other==. to other==1 to mirror lines 147 in the other 4 race categories
-              	replace other = 1 if other==. & year==2019
+              	*we replace other==. to other==1 to mirror other 4 race categories
+              	replace other = 1 if other==.

Contributor

rpitingolo Jan 9, 2025

There are two lines of commented out code in the loop:

*replace var'="1" if var'=="S"
*destring `var', replace

Is this temporary or permanent? If permanent please delete these lines. It doesn't look like this is commented out in the city code.

Collaborator Author

ekgutierrez1 Jan 29, 2025

Great catch, I have now deleted that commented out code.

02_housing/Homelessness_metrics_county_subgroup.do Outdated

+              tab `var'_quality if `var'_share==.
+              tab `var'_quality if `var'_count==.
+              }
+              *To reviewer: in 2019, state 17 and county 061 there is an instance where black enrollment is 0 but homeless count is 3.

Contributor

rpitingolo Jan 9, 2025

If we think this is an issue caused by the underlying data then it should probably be switched to the most conservative (unreliable data) flag and documented somewhere

Collaborator Author

ekgutierrez1 Jan 29, 2025

It's already a flag = 3 (most unreliable),but based on the underlying data, I think it should actually be NA. Making this change now.

jwalsh28 and others added 5 commits

January 13, 2025 12:46


          Fix file evaluation form bug - remove as.numeric argument which was t…

8926e2e

…hrowing an error


          Adjust number of variables for de-bug tweak

24f8465


          separated the ela subgroup data

4c683ba

I separated the three subgroups from ELA subgroup data it three separate eval forms. The "ela_subgroup_county" file should be deleted
"


          Final fix to final evaluation test function - tweak geography test to…

cd9c23c

… pad with zeros


          Merge pull request #446 from UI-Research/iss445

8cbcc01

Fix final evaluation function

cdsolari assigned rpitingolo

ekgutierrez1 added 6 commits

January 24, 2025 14:11


          Merge branch 'version2025' of https://github.com/UI-Research/mobility…

af9f2b0

…-from-poverty into iss435


          remove old eval forms

7949bfa


          added eval forms

08f749f


          added place eval forms

faa2f0d


          made changes to address reviewer comments

9f2d9a0


          reran after version2025 update

72c99d8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet