-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Illinois Teacher's Salary from 1999--2012 #3
Comments
@soodoku I've written code to scrape all the teacher links, iterate over them and grab the meta data for each link, but I don't know if I feel right about blasting their server to the level that it would take to get all that data. For now, I'll push the script file to the repo. |
I see. Worry about putting load on their server is reasonable. Two aspects to that:
But still makes sense to a) go year by year, and b) do Sys.sleep(1) between requests. We can also email them to ask for the data. I am not v. optimistic that we will get something. What do you think? |
|
Yeah I have calls to I'm going out of town Wednesday morning thru Monday, so I probably won't be able to start that process until next week. If you or anyone else on the team starts it and have questions about the code, feel free to let me know. |
So I worked on this some today, quick update....I let the script run overnight to scrape all of the teacher links from each of the district links. There's a total of 2,226,915 unique teacher links (keep in mind, one teacher can have multiple links, as the data is split up by year). Assuming 0.5 seconds of computation/rvest time per request, and including a Sys.sleep of 2 seconds per request, we're looking at 1546.46 hours, or 64.4 days, of non-stop scrape time required to complete the task. Should we consider narrowing our focus on this? Maybe limit the years to the five most recent years (2008 - 2012), or filter the 2mil+ links to only include unique teachers (keeping only the most recent instance of each teacher)? Let me know what you think. I made a few general refactor edits to the scrape code today, I'll push that to the repo now. |
Awesome @ChrisMuir! 2.2M teacher-years is a lot! I agree that we should start out small. Probably do 2012 first and then go back in time slowly. One year at one time makes sense to me. And we can do it over next many ways. p.s. There are some odd things in the data including $0 salaries. |
Cool, yeah I'm letting it run on the 2012 teacher links for now. Once that's done, I'll write those results to csv and upload to the repo. We can take a look at that data and decide what to do from there. Thanks! |
Just pushed the 2012 IL teacher salaries to the repo. The data came out very clean from the website, there were over 162K records scraped and every single one returned as a neat 10 variable data frame, all with the same col headers. It made binding them all up into a single data set headache-free. I'll start the script on 2011 tonight. |
I think this is done also, right? Should we close this issue @ChrisMuir? |
No unfortunately this isn't done, I'm slowly working through each year. The number of records per year is around 160k, and each record requires a single request to the website, via If you don't think we need to go all the way back to 1999, that's no problem, just let me know. |
Righto! Thanks, man! I vote for getting all the data. Longitudinal data is great for econometrics. Paired w/ some outcome data (from health to economic outcomes), it can probably lead to imp. insights. |
Even descriptively, it would be great to know how teacher salaries have fared under Republicans, Dems., how close elections affect salaries, and also just how they compare over time to median wage in the respective areas. |
Cool, got it. I'll keep adding data to the repo as each year finishes. |
Quick update, the site has been completely down for the last ~48 hours. No "site maintenance" screen or anything, just a blank white page. I'll keep checking it. |
sigh. just checked. still down. |
Source =
http://www.familytaxpayers.org/ftf/ftf_salaries.php
For each year, list all districts. Each school in the district brings you to a clickable list of teacher and salary. Each teacher's name is clickable and gets meta data on the teacher.
Useful to produce year by year lists for now. We can merge later.
The text was updated successfully, but these errors were encountered: