Skip to content

Commit

Permalink
fix: Make corrections to SPI validation article and add to docs table…
Browse files Browse the repository at this point in the history
… of contents
  • Loading branch information
anth-volk committed Aug 7, 2024
1 parent b1b3128 commit c9c723d
Show file tree
Hide file tree
Showing 2 changed files with 12 additions and 14 deletions.
1 change: 1 addition & 0 deletions docs/book/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ parts:
- file: programs/gov/dcms/bbc/tv-licence
- file: programs/gov/revenue_scotland/land-and-buildings-transaction-tax
- file: programs/gov/wra/land-transaction-tax
- file: model/spi-validation
- caption: Using the model
chapters:
- file: usage/getting-started
Original file line number Diff line number Diff line change
Expand Up @@ -6,15 +6,15 @@
"source": [
"# Validation against the Survey of Personal Incomes\n",
"\n",
"To ensure accuracy and precision, PolicyEngine's free and open-source simulation and microsimulation model is periodically validated against existing external datasets. This process involves taking external datasets meant to represent a particular slice of British society and loading their inputs, such as household net income, into our data model. We then compare our calculated outputs with theirs in an effort to find discrepancies between the two. In this post, we'll describe our efforts to validate our model against the newly released Survey of Personal Incomes (SPI) 2020/21 dataset.\n",
"To ensure accuracy and precision, PolicyEngine's free and open-source simulation and microsimulation model is periodically validated against existing external datasets. This process involves taking external datasets meant to represent a particular slice of British society and loading their inputs, such as total individual employment income, into our data model. We then compare our calculated outputs with theirs in an effort to find discrepancies between the two. In this post, we'll describe our efforts to validate our model against the newly released Survey of Personal Incomes (SPI) 2020/21 dataset.\n",
"\n",
"## The UK Survey of Personal Incomes\n",
"\n",
"The Survey of Personal Incomes (SPI) is a comprehensive dataset produced by HM Revenue and Customs (HMRC). It provides detailed information on individuals' income, tax, and other financial data based on a sample of HMRC records. The SPI is among the most comprehensive UK individual-level financial datasets, and is used by HMRC, HM Treasury, and individual Members of Parliament, among others.[^1]\n",
"\n",
"## Validation Results\n",
"## Validation results\n",
"\n",
"We chose to use total individual income tax as a yardstick, using the SPI's input variables to compare PolicyEngine UK's calculated total income tax against the SPI's reported amount. \n",
"We chose to validate PolicyEngine against the SPI's total individual income, using the SPI's input variables to compare PolicyEngine UK's calculated total income tax against the SPI's reported amount. \n",
"\n",
"To do so, we first remove the SPI's \"composite records,\" which are used to represent smaller population groups as a collective. This group of records is only meant to be employed within society-wide modelling, and its outputs cannot be correctly computed using its inputs. This group of records contains 1,770 observations, representing around 0.2% of total records.\n",
"\n",
Expand All @@ -32,8 +32,8 @@
"| Total error (£) | Percentage of all records |\n",
"|-----------------|---------------------------|\n",
"| <=10 | 92.7% |\n",
"| <100 | 0.8% |\n",
"| <1,000 | 4.8% |\n",
"| <100 | 93.5% |\n",
"| <1,000 | 98.3% |\n",
"| >=1,000 | 1.7% |"
]
},
Expand All @@ -43,25 +43,22 @@
"source": [
"For 92.7% of individuals included within the SPI, PolicyEngine UK's calculated total income tax is within £10 of the SPI's calculated amount. We view this as a robust demonstration of PolicyEngine's accuracy and precision.\n",
"\n",
"## Analysing Model Discrepancies\n",
"## Analysing model discrepancies\n",
"\n",
"For the remaining records where our calculations do not closely match the SPI, we believe the differences are attributable to two main factors:\n",
"\n",
"1. Differences in input information: The SPI sometimes uses additional input data that is not available to PolicyEngine, leading to differences in calculations.\n",
"2. Structural differences: At times, the SPI model's structure differs from that of PolicyEngine in a way that complicates further model validation.\n",
"\n",
"Five relevant error clusters appeared among the records where the SPI and PolicyEngine calculate differing outputs:\n",
"Four relevant error clusters appeared among the records where the SPI and PolicyEngine calculate differing outputs:\n",
"\n",
"1. Cases where an individual appears to transfer a portion of their Personal Allowance to their spouse as part of the Marriage Allowance. This is because the SPI removes the transferred amount from the individual's Personal Allowance, but does not indicate that the subsequent lower amount is driven by the Marriage Allowance. This may represent as much as 37,998 records (4.8% of total).\n",
"1. Cases where an individual receives less than £100,000 in annual income and is non-resident for taxation purposes [IS THIS CORRECT? VERIFY VIA SAS], thereby receiving £0 Personal Allowance, even though our model would project a full £12,500 allowance. The SPI uses a variable that is not publicly available to calculate this status. This represents 2,054 records (0.3% of total).\n",
"1. Cases where both the SPI and PolicyEngine calculate the same amount of taxable savings income for an individual receiving any form of savings income classified by the SPI as \"Other Investment Income\" (e.g., interest on securities), but the SPI finds that they owe around £1,000 more in income tax than we do. [REASON IS?] This represents 1,276 records (0.2% of total).\n",
"1. Cases where an individual near the Personal Allowance taper value who claims Gift Aid receives a full Personal Allowance of £12,500 within the SPI model. [REASON IS?] This represents 738 records (0.1% of total).\n",
"\n",
"[DO WE WANT A GRAPH HERE?]\n",
"\n",
"1. Cases where an individual receives less than £100,000 in annual income and is non-resident for taxation purposes. In this situation, the individual receives £0 Personal Allowance, even though our model would project a full £12,500 allowance. The SPI uses a variable that is not publicly available to calculate this status. This represents 2,054 records (0.3% of total).\n",
"1. Cases where both the SPI and PolicyEngine calculate the same amount of taxable savings income for an individual receiving any form of savings income classified by the SPI as \"Other Investment Income\" (e.g., interest on securities), but the SPI finds that they owe around £1,000 more in income tax than we do. This represents 1,276 records (0.2% of total).\n",
"1. Cases where an individual near the Personal Allowance taper value who claims Gift Aid receives a full Personal Allowance of £12,500 within the SPI model. The SPI model's Gift Aid variable signifies the amount an individual deducts due to Gift Aid donations, while PolicyEngine's represents an individual's total Gift Aid-eligible donations. We then calculate the indivdual's deduction amount from this value. This error represents 738 records (0.1% of total).\n",
"\n",
"## Conclusion\n",
"PolicyEngine's UK model shows a high level of accordance with the SPI's 2020/21 data, with almost 93% of records with the same inputs returning the same total income tax calculation. While some discrepancies exist, they are limited in scope and often attributable to either testability limitations within the SPI's structure or to inputs that the SPI has not made publicly available. This validation exercise builds upon our previous validations [LINK HERE? SOMETHING ABOUT NEVER-ENDING IMPROVEMENTS TO THE MODEL?]. We welcome questions and comments on our findings or methodology - please feel free to [get in touch](mailto:[email protected])."
"PolicyEngine's UK model shows a high level of accordance with the SPI's 2020/21 data, with almost 93% of records with the same inputs returning the same total income tax calculation. While some discrepancies exist, they are limited in scope and often attributable to either testability limitations within the SPI's structure or to inputs that the SPI has not made publicly available. This validation exercise builds upon our previous validations, such as our [Autumn 2023 model calibration](https://policyengine.org/uk/research/uk-calibration-2023-q4). We welcome questions and comments on our findings or methodology - please feel free to [get in touch](mailto:[email protected])."
]
},
{
Expand Down

0 comments on commit c9c723d

Please sign in to comment.