Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fiscal data record identifiers #7

Open
2 tasks
pwalsh opened this issue Sep 7, 2017 · 15 comments
Open
2 tasks

Fiscal data record identifiers #7

pwalsh opened this issue Sep 7, 2017 · 15 comments

Comments

@pwalsh
Copy link
Member

pwalsh commented Sep 7, 2017

Description

There has long been discussion around fiscal data record identifiers. Recently, we've agreed with GIFT that we can add a transaction identifier concept to potentially support linkage with Open Contracting. Some discussion here, and then in a very recent telco, reveals that the original request there for transaction identifiers may be misleading, as what is required is budget record identifiers.

And then, there is the general fact that a budget record is actually a number of measures and dimensions, each of which potentially needs unique identifiers.

We want to add identifiers as a concept, for linkage.

Tasks

  • @jpmckinney can you please summarise the requirements succinctly from the OCDS side - I think it will really help. Some example data that demonstrates would be ideal
  • Discussion ways to have identifiers for a record, and, is this the "same implementation" as, for example, ids for dimensions on a record
@jpmckinney
Copy link

jpmckinney commented Sep 7, 2017

Modeling

In terms of modelling, the Open Contracting Data Standard models a contracting process (from planning through to implementation). It includes some budget information/links, because it is important for many use cases to reconcile contracting processes with their budgetary funding. Because budgetary information is distinct from contracting information, and because even the most granular section of a budget can fund multiple contracting processes, it makes sense to model the budgetary information outside OCDS, and to simply link to it from relevant parts of OCDS.

With respect to transactions, on the other hand, OCDS models these, because a transaction relating to a contracting process is smaller (in an information-hierarchical sense) than the contracting process it relates to, and it is straight-forward to model such transactions as part of OCDS. In other words, whether or not OFDP offers identifiers for transactions makes no difference to the use cases for OCDS, because OCDS can model transactions directly, independent of whatever choices OFDP makes.

Serialization

OCDS is serialized as JSON. Its fields (properties) for planning information are under planning in its release schema. Under planning is budget, which (among others) has these two fields that are relevant to the present discussion:

  • id: "An identifier for the budget line item which provides funds for this contracting process. This identifier should be possible to cross-reference against the provided data source."
  • uri: "A URI pointing directly to a machine-readable record about the budget line-item or line-items that fund this contracting process. Information may be provided in a range of formats, including using IATI, the Open Fiscal Data Standard or any other standard which provides structured data on budget sources. Human readable documents can be included using the planning.documents block."

Source data

The Global Open Data Index offers links to many budgetary datasets, any of which can serve as an example of source data. Budgetary data is commonly organized as a hierarchy of programs, subprograms, projects, etc. (with terminology and depth of hierarchy varying across governments), and is commonly serialized in a tabular format. Each row within such tables are what is meant by a 'budget line item' in OCDS (though feel free to refine that definition based on reputable sources).

How budget line items from such source data are mapped into OFDP as measures or dimensions is not something I fully understand, though I would happily read a worked example that clarifies how this works. In fact, such documentation may significantly help users of OFDP and OCDS with how to model and link the two datasets.

As a quick example, this URL is an identifier for a budget line item at the level of 'proyecto' (which may not be the most granular level). It's about equitable social development relating to agriculture, so you can imagine a contracting process that awards funds to a social development agency to support minority-owned, women-owned and emerging small agricultural businesses. The use case is that, starting with the contracting process, you want to see if it was funded by a sensible budget line item (in this case it was).

@jpmckinney
Copy link

Tagging @juanpane @transpresupuestaria @timgdavies as this relates to prior conversations about linking to budget data from OCDS.

@pwalsh
Copy link
Member Author

pwalsh commented Sep 8, 2017

The Global Open Data Index offers links to many budgetary datasets, any of which can serve as an example of source data

@jpmckinney I guess you know that I am quite familiar with GODI and budget data in general. I'm simply asking for some example data to demonstrate the possibility of linkage between contracts and budgets, I'm not asking for examples of budget data, of which I literally have thousands.

@jpmckinney
Copy link

jpmckinney commented Sep 8, 2017

@pwalsh What I write is a reflection of my understanding, so that others may correct any confusion or misunderstanding. What I write is not some indirect commentary on your understanding.

I provide a fictitious but realistic prose example of linkage between contracts and budgets further down my last comment. I can describe it as JSON if you want it in a data format. If you want real (not just realistic) data, I can ask the people I tagged (who work directly with publishers) for a quick example.

@pwalsh
Copy link
Member Author

pwalsh commented Sep 8, 2017

@jpmckinney ok, thanks.

It seems there has really been confusion about what was originally requested here, as we discovered on our call yesterday. I'm just trying to get it clear, and I am glad we have the chance to do so together.

This confusion still exists even with, for example, the discussions from December 2016 (ref. ref.), emails I have sent via GIFT about it in the last month, and right up til our call yesterday.

So, bear with me, but I am just trying to make sure we all know what we want here.

To be clear, repeating what I've said before:

  • Adding record identifiers to FDP (I'm avoiding usage of "transaction identifier" as it was a major source of confusion) is easy. I 100% agree that doing so is good and useful as part of the use of standards for policy change.
  • We didn't have such a thing as a first class concept in FDP v0.3, as, in the absence of source data having an explicit unique identifier field (and in the context of budgets, I have never seen this in raw source data), we need to make one as a composite key, synthesize one, etc. I discussed my previous learnings on this here
  • In the last 1.5 years, working with thousands of datasets (well beyond our work with GIFT), and even with Fiscal Data Package and OpenSpending as part of the backbone of a major linked data project, we've not really encountered a blocking issue with not having record identifiers as an explicit part of Fiscal Data Package.
  • Which brings us to now, and the only use case we have for adding this is for linkage with the Open Contracting Data Standard. I think this is an exciting and compelling use case, I just honestly would love to see some real world data that would potentially be "unlocked" by us doing this :). So, it would be great to see some data, but in any event, we also are adding this to the spec.

@timgdavies
Copy link

As a general point in standard development: We hadn't seen a field called 'contracting process identifier' in source data when we started designing OCDS, but it conceptually key to meeting user needs of tracing the full contracting process. Publishers generally have little trouble using fields inside their systems to express this latent concept. GIFT is a normative initiative, and so has a normative role in working out what a modern joined up data infrastructure for public spending should look like, and providing the frameworks for people to get to there from here.

In terms of real world use-cases and data (and the acknowledge need to get a clear conceptual understanding), this thread open-contracting/standard#483 might prove useful - and includes data.

This slide deck prepared following meetings in December with GIFT and OKF on the fringes of the OGP may also provide useful notes on use-cases, and the conceptual relationships.

I'm curious about the idea that an FDP record represents both multiple dimensions and measures. Generally in normalised data, I would anticipate one measure with multiple dimensions. If we can unpack the degree of normalisation of a common budget record, I suspect that will really help us in identifying how to unambiguously make the budget-contract-spending linkages work.

@timgdavies
Copy link

Can you say more Paul about the confusion on the transaction identifier concept. Was this due to it being used in the context of budgets?

For spending from government systems, it seems to me that this should not be too tricky a concept - and many systems do have an internal or external identifier for specific spending transactions.

I understand that transaction does not apply to budgets.

@pwalsh
Copy link
Member Author

pwalsh commented Sep 8, 2017

I'm curious about the idea that an FDP record represents both multiple dimensions and measures.

I'm using "record identifiers" here to get away from the aforementioned "transaction identifiers". While a single line in source data can have multiple measures, and FDP provides "mark up" for this, such lines likely produce multiple records. We'll need to get into the details of the correct semantics, and I expect @akariv will lead on that (looks like we already see that my use of "record identifiers" is potentially misleading).

@pwalsh
Copy link
Member Author

pwalsh commented Sep 8, 2017

Can you say more Paul about the confusion on the transaction identifier concept. Was this due to it being used in the context of budgets?

Correct.

For spending from government systems, it seems to me that this should not be too tricky a concept - and many systems do have an internal or external identifier for specific spending transactions.

Definitely. Still, I've seen some examples from published UK25k spend data that tripped us up (transaction IDs not unique, no other unique identifier provided outside of the internal system, which, clearly, must have one). I'll try to dig up those examples, but handling such examples is less an issue for the standard and more for implementations.

@jpmckinney
Copy link

jpmckinney commented Sep 10, 2017

I wrote some things offline while on a flight, so please forgive any repetition of things Tim has already written.

would love to see some real world data that would potentially be "unlocked" by us doing this

I believe, in the general case, in order for third parties to link to records contained within an OFDP dataset, those records will need identifiers. OCDS may be a first use case, but it seems like anything that needs to refer to budget line items would benefit from an identifier.

There may not be much data to refer to (though Tim offered some links and others may as well), because due to the common absence of identifiers in source data, the links between datasets are not easily accomplished. A more accessible example would be from an investigative journalist manually making the links between datasets in the absence of identifiers in order to document misspending. Or, as Juan presented on his screen, a government makes the links in an internal operational system, in order to track spending against the budget line item (very common); so, ostensibly, all contracting data would be 'unlocked' for such a use case, if I understand your sense. A common use case for these identifiers within civil society would be to repeat or check that work of the government to monitor and hold it accountable.

Also, for context with regards to linking, before OCDS, most governments – if they published contracting datasets at all – published one dataset per stage (e.g. planning, tender, award, contract, implementation) without linking the datasets or using identifiers for others to do so. Linking data across datasets is sadly still fairy new, but hopefully this issue will be a step towards making it more common.

I'm satisfied with the identifier being optional, by the way (relating to some of the linked prior issues or conversations).

@LindseyAM
Copy link

Sharing here a couple of use cases I have heard from folks in different countries.

  1. "We want to be able to check that budget is indeed available before funds are committed (aka award a contract)" - in places where contracts are awarded when funds are not actually available, the result erodes private sector confidence (as contractors will perform work and remain unpaid or under paid for extended lengths of time). Transparently showing that budget is available (through a link between the contracting process and the budget) can help to improve this trust and efficiency.

  2. "We want to be able to check that the budget lines are being used for their intended purpose" - similarly, it may be that budgets are being used to pay for contracts unrelated to their intended purpose. A link between budgets and contracts can help folks to check that the budget is being executed properly.

Hope this is helpful.

@akariv
Copy link
Member

akariv commented Sep 28, 2017

So, would a URI like this work for fetching the information from the Mexican Federal Budget for (Year=2017, MODALIDAD="A", PP="17", RAMO="7", CAPITULO="3000", CONCEPTO="3700")?

https://openspending.org/api/3/cubes/6018ab87076187018fc29c94a68a3cd2:presupuesto-mexico-2008-20164t-2017/facts/?cut=date_2.CICLO:2017|activity_ID_MODALIDAD.ID_MODALIDAD:"A"|activity_ID_PP.ID_PP:"17"|administrative_classification_2.ID_RAMO:"7"|economic_classification_ID.ID_CAPITULO:"3000"|economic_classification_ID_2.ID_CONCEPTO:"3700"

This is what it returns:

{
  "total_fact_count": 1,
  "data": [
    {
      "expenditure_type_2.ID_TIPOGASTO": "1",
      "expenditure_type_2.DESC_TIPOGASTO": "Gasto corriente",
      "functional_classification_GPO.GPO_FUNCIONAL": "1",
      "functional_classification_GPO.DESC_GPO_FUNCIONAL": "Gobierno",
      "economic_classification_ID_4.ID_PARTIDA_ESPECIFICA": "",
      "economic_classification_ID_4.DESC_PARTIDA_ESPECIFICA": "",
      "economic_classification_ID_3.ID_PARTIDA_GENERICA": "",
      "economic_classification_ID_3.DESC_PARTIDA_GENERICA": "",
      "functional_classification_ID_2.ID_SUBFUNCION": "4",
      "functional_classification_ID_2.DESC_SUBFUNCION": "Derechos Humanos",
      "activity_ID_PP.ID_PP": "17",
      "activity_ID_PP.DESC_PP": "Derechos humanos",
      "date_2.CICLO": 2017,
      "budget_line_id_2.ID_CLAVE_CARTERA": "0",
      "activity_ID_MODALIDAD.ID_MODALIDAD": "A",
      "activity_ID_MODALIDAD.DESC_MODALIDAD": "Funciones de las Fuerzas Armadas",
      "economic_classification_ID.ID_CAPITULO": "3000",
      "economic_classification_ID.DESC_CAPITULO": "Servicios generales",
      "functional_classification_ID_3.ID_AI": "3",
      "functional_classification_ID_3.DESC_AI": "Defensa de la integridad, la independencia, la soberanía del territorio nacional y la seguridad interior",
      "fin_source_2.ID_FF": "1",
      "fin_source_2.DESC_FF": "Recursos fiscales",
      "economic_classification_ID_2.ID_CONCEPTO": "3700",
      "economic_classification_ID_2.DESC_CONCEPTO": "Servicios de traslado y viáticos",
      "functional_classification_ID.ID_FUNCION": "2",
      "functional_classification_ID.DESC_FUNCION": "Justicia",
      "administrative_classification_3.ID_UR": "139",
      "administrative_classification_3.DESC_UR": "Dirección General de Derechos Humanos",
      "geo_source_2.ID_ENTIDAD_FEDERATIVA": "9",
      "geo_source_2.ENTIDAD_FEDERATIVA": "Ciudad de México",
      "administrative_classification_2.ID_RAMO": "7",
      "administrative_classification_2.DESC_RAMO": "Defensa Nacional",
      "MONTO_EJERCIDO": null,
      "MONTO_EJERCICIO": null,
      "MONTO_ADEFAS": null,
      "MONTO_PAGADO": null,
      "MONTO_MODIFICADO": null,
      "MONTO_APROBADO": 9860000.0,
      "MONTO_DEVENGADO": null
    }
  ],
  "cell": [
    {
      "ref": "date_2.CICLO",
      "operator": ":",
      "value": [
        2017
      ]
    },
    {
      "ref": "activity_ID_MODALIDAD.ID_MODALIDAD",
      "operator": ":",
      "value": [
        "A"
      ]
    },
    {
      "ref": "activity_ID_PP.ID_PP",
      "operator": ":",
      "value": [
        "17"
      ]
    },
    {
      "ref": "administrative_classification_2.ID_RAMO",
      "operator": ":",
      "value": [
        "7"
      ]
    },
    {
      "ref": "economic_classification_ID.ID_CAPITULO",
      "operator": ":",
      "value": [
        "3000"
      ]
    },
    {
      "ref": "economic_classification_ID_2.ID_CONCEPTO",
      "operator": ":",
      "value": [
        "3700"
      ]
    }
  ],
  "fields": [
    "expenditure_type_2.ID_TIPOGASTO",
    "expenditure_type_2.DESC_TIPOGASTO",
    "functional_classification_GPO.GPO_FUNCIONAL",
    "functional_classification_GPO.DESC_GPO_FUNCIONAL",
    "economic_classification_ID_4.ID_PARTIDA_ESPECIFICA",
    "economic_classification_ID_4.DESC_PARTIDA_ESPECIFICA",
    "economic_classification_ID_3.ID_PARTIDA_GENERICA",
    "economic_classification_ID_3.DESC_PARTIDA_GENERICA",
    "functional_classification_ID_2.ID_SUBFUNCION",
    "functional_classification_ID_2.DESC_SUBFUNCION",
    "activity_ID_PP.ID_PP",
    "activity_ID_PP.DESC_PP",
    "date_2.CICLO",
    "budget_line_id_2.ID_CLAVE_CARTERA",
    "activity_ID_MODALIDAD.ID_MODALIDAD",
    "activity_ID_MODALIDAD.DESC_MODALIDAD",
    "economic_classification_ID.ID_CAPITULO",
    "economic_classification_ID.DESC_CAPITULO",
    "functional_classification_ID_3.ID_AI",
    "functional_classification_ID_3.DESC_AI",
    "fin_source_2.ID_FF",
    "fin_source_2.DESC_FF",
    "economic_classification_ID_2.ID_CONCEPTO",
    "economic_classification_ID_2.DESC_CONCEPTO",
    "functional_classification_ID.ID_FUNCION",
    "functional_classification_ID.DESC_FUNCION",
    "administrative_classification_3.ID_UR",
    "administrative_classification_3.DESC_UR",
    "geo_source_2.ID_ENTIDAD_FEDERATIVA",
    "geo_source_2.ENTIDAD_FEDERATIVA",
    "administrative_classification_2.ID_RAMO",
    "administrative_classification_2.DESC_RAMO",
    "MONTO_EJERCIDO",
    "MONTO_EJERCICIO",
    "MONTO_ADEFAS",
    "MONTO_PAGADO",
    "MONTO_MODIFICADO",
    "MONTO_APROBADO",
    "MONTO_DEVENGADO"
  ],
  "order": [
    
  ],
  "page": 1,
  "page_size": 20,
  "status": "ok"
}

@pwalsh
Copy link
Member Author

pwalsh commented Nov 29, 2017

@akariv

Can you:

  1. Add here a brief description, and a sample snippet, of the new syntax/handling being proposed for v1, specifically in regards to identifiers.
  2. Additionally, link to the current text of the v1 spec in whole.

@akariv
Copy link
Member

akariv commented Dec 13, 2017

This is the current FDP draft: https://hackmd.io/BwNgpgrCDGDsBMBaAhtALARkWsPEE5posR8RxgAzffWfDIA=?view

Generally speaking, the new fiscal data package is a tabular data package. As such, it holds one or more data tables, each with a schema and a defined primaryKey.

As suggested above, a mapping of {k => v for k in primaryKey} could be used as a unique row identifier, which is also somewhat resilient to some schema changes (e.g adding or removing columns).
Exact means of encoding (i.e. should it be JSON? query parameters? base64? etc.) could be left for the implementors I think.

@pwalsh
Copy link
Member Author

pwalsh commented Jan 8, 2018

@akariv you might want to look at this extensive discussion open-contracting/standard#483

@roll roll unassigned akariv Jan 3, 2024
@roll roll transferred this issue from frictionlessdata/datapackage Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants