You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue has been copied over from jpycroft#2 to have it on the main repository.
As a number of data issues arose when producing the get_imm_resid function within demographics.py, I'm starting this issue to keep a record of them and to allow others to comment on the decisions taken.
The production of net immigration rates, imm_rates, went through the following steps:
Try to use Eurostat immigration rates directly ... but Eurostat no longer publishes by age for the UK (presumably a post-Brexit change). For future reference, other EU countries are there.
Return to the OG-USA style of backing out the immigration rates from the population total, as done in get_imm_resid in demographics.py in OG-USA-Calibration. This works well for most ages, but does not work well for:
a. age 0, new borns (see point 3).
b. the oldest ages, especially age 90+ (see point 4).
Adjust new born values:
The OG-USA methodology uses fert_rates and applies them to 2015, 2016, 2017 populations to obtain 2016, 2017, 2018. The problem with this is that the fert_rates for 2018 are much lower than the 2015 rates. Therefore, the calculated new borns in 2016 are more than 40,000 below the actual new borns. The standard get_imm_resid allocates this shortfall to net immigration of babies, leading to a net immigration rate of 7%, while the actual rate is closer to 0.7%.
Instead of approximating, I downloaded the actual numbers of new borns in 2015, 2016 and 2017 from Eurostat. These then become the "newborn" array, from which the imm_rates[0] is calculated.
The over 90s:
The over 90s data is not consistent. The standard methodology suggests that imm_rates for some years over 90 rise to over 5%, hitting 19% for age 99. This is vanishingly unlikely to be accurate. The overall population and mortality numbers are not consistent (even when one downloads the full data for all years). Any errors in the data are amplified by the small denominators, e.g. there are less than 10,000 people aged 99.
To fix this, I have replaced the over 90s values with the average value for ages 80 to 89. This allows for some continued migration of the over 90s, but by using the data for aged 80 to 89, the errors are smoothed out and the denominators used for the calculation are much larger.
Moving average smoothing:
The above adjustments lead to a much improved imm_rates. However, there are still a number of spikes in the data, which are unlikely to contain real long-term information. Therefore a simple three-year moving average is applied.
The text was updated successfully, but these errors were encountered:
@jdebacker @rickecon
This issue has been copied over from jpycroft#2 to have it on the main repository.
As a number of data issues arose when producing the get_imm_resid function within demographics.py, I'm starting this issue to keep a record of them and to allow others to comment on the decisions taken.
The production of net immigration rates, imm_rates, went through the following steps:
Try to use Eurostat immigration rates directly ... but Eurostat no longer publishes by age for the UK (presumably a post-Brexit change). For future reference, other EU countries are there.
Return to the OG-USA style of backing out the immigration rates from the population total, as done in get_imm_resid in demographics.py in OG-USA-Calibration. This works well for most ages, but does not work well for:
a. age 0, new borns (see point 3).
b. the oldest ages, especially age 90+ (see point 4).
The OG-USA methodology uses fert_rates and applies them to 2015, 2016, 2017 populations to obtain 2016, 2017, 2018. The problem with this is that the fert_rates for 2018 are much lower than the 2015 rates. Therefore, the calculated new borns in 2016 are more than 40,000 below the actual new borns. The standard get_imm_resid allocates this shortfall to net immigration of babies, leading to a net immigration rate of 7%, while the actual rate is closer to 0.7%.
Instead of approximating, I downloaded the actual numbers of new borns in 2015, 2016 and 2017 from Eurostat. These then become the "newborn" array, from which the imm_rates[0] is calculated.
The over 90s data is not consistent. The standard methodology suggests that imm_rates for some years over 90 rise to over 5%, hitting 19% for age 99. This is vanishingly unlikely to be accurate. The overall population and mortality numbers are not consistent (even when one downloads the full data for all years). Any errors in the data are amplified by the small denominators, e.g. there are less than 10,000 people aged 99.
To fix this, I have replaced the over 90s values with the average value for ages 80 to 89. This allows for some continued migration of the over 90s, but by using the data for aged 80 to 89, the errors are smoothed out and the denominators used for the calculation are much larger.
The above adjustments lead to a much improved imm_rates. However, there are still a number of spikes in the data, which are unlikely to contain real long-term information. Therefore a simple three-year moving average is applied.
The text was updated successfully, but these errors were encountered: