The OpenBudgets.eu data model

Jindřich Mynarz of University of Economics, Prague

Nov 11, 2016

Driven by the use cases of the OpenBudgets.eu project, we designed a data model for fiscal data published by the public sector, covering both planned budgets and executed transactions. It is expressed in RDF primarily in terms of the Data Cube Vocabulary (DCV) W3C standard, conceived as a set of reusable component properties that can be composed into dataset-specific data structures. The data model is described in its RDF representation, which is accompanied by a data modelling guide and reference.

The design of the data model was informed by a survey of existing vocabularies, interviews with domain experts and prospective users, and a year-long use in the OpenBudgets.eu project, in which it was applied to more than 200 datasets ranging from the EU level to the municipality level. Having ingested all this input, the data model can be considered stable.

The OpenBudgets.eu data model is based on the Data Cube Vocabulary. DCV is a vocabulary for describing multidimensional statistical data. It organizes measures, optionally qualified by attributes, in logical spaces coordinated by dimensions. Fiscal data typically consists of monetary amounts indexed by values of various dimensions, such as the fiscal year or the funded project. Amounts form the measures coordinated by dimensions in fiscal data cubes.

Generic data cube and an example OpenBudgets.eu data cube

Generic data cube and an example OpenBudgets.eu data cube

The world of fiscal datasets is a diverse one. There are hardly any fiscal datasets conforming to the same structure, though many fiscal datasets share common elements. A fixed data schema would therefore find little reuse. Instead, we decided to adopt component properties as the basic units of reuse. Component properties represent dimensions, attributes, or measures (i.e. instances of qb:ComponentProperty in DCV). Using the OpenBudgets.eu data model, data structure definitions of specific datasets can be produced by cherry-picking component properties. The core component properties can be either reused directly or extended via subproperties, such as in the case of a local classification. Values of component properties can be constrained by code lists that enumerate their allowed values. The OpenBudgets.eu data model defines several core code lists, such as the one for budget phases.

We used RDF as an enabling technology for semantic grounding of the fiscal domain, allowing us to mint terms describing the domain, and as a way to integrate heterogeneous datasets. Data integration is enabled by the reuse of component properties, which in turn allows to use fiscal datasets in combination and perform comparative analyses. Moreover, RDF makes it straightforward to combine fiscal data with other statistical datasets, such as macroeconomical indicators from Eurostat, which can provide a relevant context. Datasets modelled using the common terms from the data model are self-describing. Since definitions of the terms describing the data can be found by following their links, it is possible to use the data without prior knowledge. Not needing to ask for explanation of the data thus removes a barrier to use the data.

DCV defines cardinality restrictions for its component properties. While values of dimensions must be provided for each observation to be able to locate it in the space of the data cube, attributes may be optional. Since we encountered many properties with missing values that did not match the semantics of attributes as measure’s qualifiers, we created obeu:OptionalProperty as a new subclass of qb:ComponentProperty to represent optional properties. Although an arbitrary RDF property may be used to achieve the same goal, an obeu:OptionalProperty may be included in a dataset’s data structure definition and guide the consumption of the dataset. An example of an optional property is location, which is applicable only to monetary amounts tied to a particular place.

The following table lists the core dimensions, attributes, optional properties, and measures defined by the OpenBudgets.eu data model.

OpenBudgets.eu component properties
Dimensions Attributes Optional properties Measures
accounting­Record currency contract amount
administrative­Classification taxes­Included location
budget­Line
budget­Phase
classification
currency
date
economic­Classification
fiscal­Period
fiscal­Year
functional­Classification
operation­Character
organization
partner
payment­Phase
programme­Classification
project
taxes­Included

The data model is linked to other vocabularies, including the Payments Ontology or the SDMX statistical data model. Links support interoperability with datasets described by the linked vocabularies and guide data migrations when such datasets are integrated.

Links of the OpenBudgets.eu to other vocabularies

Links of the OpenBudgets.eu to other vocabularies

Since the data model was designed to cater for a large variety of fiscal datasets in the scope of the OpenBudgets.eu project, you may find it applicable to a wide range of datasets beyond what we use it for. So far, it has served us well in the project, so it may do the same for other efforts that require modelling of fiscal data. The data model is released under the terms of an open licence, so you may take it, reuse it, and build on it.