4. Datasets and demand

This step can be implemented as follows

The person promoting data sharing in the organisation, the data administrator, and the organisation's data protection expert work together to

identify data suitable for sharing and the person responsible for it,
determine the type of data that is in demand outside the organisation, and
compile a list of datasets that the organisation is planning to open in the future and agree on practical measures and monitoring practices.

Identification of datasets

This section describes how the organisation can map and classify its datasets and identify high-value datasets that are suitable for sharing or opening.

Public administration organisations map and classify their data, particularly on the basis of statutory tasks and recommendations. The purpose of mapping the datasets is to identify, among other things, the rights and restrictions affecting the disclosure of datasets and, consequently, the possibilities of sharing the data.

The organisation must first survey what type of data it has in general and the parts of this data that are suitable for sharing and opening. At this point, the limitations for data sharing posed by the information systems used in the organisation must also be taken into account.

Example

Large organisations often make use of hundreds of information systems for different purposes, which are used to store large amounts of data on a daily basis. For example, according to various surveys, the central government uses around 1,800 information systems, while the City of Helsinki has approximately 700–900 information systems in use. In addition to information systems, there is a huge number of documents – such as spreadsheets, images, and audio and video files – that could be provided as open data.

Data mapping methods

You can survey and map your data by using e.g. the descriptions and tools presented below.

Information management model

Government agencies, municipalities, and other information management entities referred to in the Public Information Management Act (906/2019) (in Finnish) must prepare and maintain an information management model that provides an overview of the entity’s information management practices. The information management model can be used to highlight various details, such as

the titles and purposes of information resources, and
data disclosure targets and retention periods.

The information management model helps authorities manage the continuously increasing volume of data at their disposal. The model helps them understand and manage the life cycle of data and, consequently, also identify and manage risks related to the use of new digital services. Rather than comprising a one-off obligation to produce a description, the information management model must be maintained whenever changes occur in the information management conducted by the information management entity and in the content of said management.

The information management model is useful in situations where the organisation is considering the possibilities of sharing its data and needs help choosing the data it should open. Read the Information Management Board's recommendation regarding the information management model (in Finnish) (Ministry of Finance publications 2020:29).

Information management map

The public administration information management map is a description of how information management is organised in public administration. The information management map provides an overall picture of what registers and legislation exist in each sector of public administration.

The map allows different actors to plan and develop the interoperability of the information resources and information systems used in public administration. The information management map also provides a better starting point for assessing the impacts of extensive administrative or structural changes on the actors responsible for information management.

For example, the map helps ministries to prepare regulations on information management and information resources, to guide interoperability of information resources and information systems in their sectors, and to plan ways of implementing interoperability. The map also serves the administration’s customers when a view of how data concerning a person is managed in public administration is needed.

The purpose of the information management map is to provide different actors in society with an overview of

how information management in public administration has been organised,
which data are maintained in different information resources, and
under what conditions the data could be available for the actor's needs.

The first version of the information management map was published in early 2022 on the State Treasury's Exploreadministration.fi service, and it can be used freely for different purposes.

For more information on the information management map, visit the Ministry of Finance's website.

Description to implement document publicity

Under section 28 of the Public Information Management Act (906/2019), central government agencies, municipalities and other information management entities referred to in the Act must maintain a description of the information resources and case register managed by it. This ‘description to implement document publicity’ is one way of helping citizens to know to whom they should address their requests for information.

Among other things, the description must contain information about

the information systems containing information belonging to the case register or the information management of services,
the datasets included in the information systems by data groups, and
the open access to datasets via a technical API.

The information management entity shall publish the description in a public data network in so far as the information in the description is not secret. Read the Information Management Board's recommendation on preparing descriptions to implement document publicity (in Finnish) (Ministry of Finance publications 2020:22).

The description to implement document publicity has replaced the specifications on information management systems referred to in section 18 of the Act on the Openness of Government Activities (621/1999). Some public administration actors have opened their information system descriptions or lists as open data, and in 2016, the Association of Finnish Local and Regional Authorities published its instructions for opening the list of information systems for municipalities (in Finnish) prepared by the six largest cities on Open Data service.

Data balance sheet

The data balance sheet contains an evaluation of how the organisation's data, data protection, and information security have been implemented. In particular, its aim is to map the status of data processing, life cycle management in personal data processing and the necessary development measures. The idea is that the data balance sheet would serve as a management and internal control tool, support data protection work and increase the effectiveness of activities.

The data balance sheet plays a key role in monitoring compliance with data protection and the controller's accountability referred to in the General Data Protection Regulation (Article 5 of EU 2016/679) as well as demonstration of transparency. The obligation to demonstrate compliance means that the organisation must be able to demonstrate that it complies with the General Data Protection Regulation in the processing of personal data and that it also implements the data protection principles in practice. The data balance sheet also serves as a show of trust towards the organisation's stakeholders.

The Association of Finnish Local and Regional Authorities has published a data balance sheet template (in Finnish, docx file) that organisations can use when planning and drawing up their data balance sheets. For more information on this template, see the Association’s presentation Data balance sheet – what and why (in Finnish, pdf) (11 Sep 2019).

Examples of data balance sheets of different public administration organisations:

Interoperability tools

The Interoperability Platform maintained by the Digital and Population Data Services Agency provides tools for describing the structure and semantic significance of data. These interoperability tools include: Data models, Terminologies, and Code Lists. The aim is to provide a single location for those who want to find the metadata descriptions of the data they need.

Interoperability tools allow content-publishing organisations to describe what type of data they provide and the formats in which they provide it to interested parties. In addition to describing the structure of the data, these interoperability tools can be used to describe the significance of the data, so that potential users can ascertain whether they understand the contents of the data in the same way as the party publishing it.

The basic principle of these tools is to make use of previous descriptions to the furthest extent possible and to link new descriptions to any previously made ones. The principles of data linking and the reuse of metadata descriptions are presented in the interoperability method. For a more detailed description of the interoperability platform and method, see the “Metadata description” section in step 6.

Determining the value of datasets

This section describes how the value of datasets can be determined.

When surveying datasets suitable for distribution, their value must be taken into account. The benefit and added value of sharing data is a key reason for prioritising the opening of certain datasets.

This value is no longer defined in purely monetary terms, as it can consist of several factors. A purely monetary approach to value can cause an organisation to overlook many other important aspects, which is why it is no longer considered a suitable approach in today's information society.

In other words, the value of data should be examined from a number of perspectives, which may also facilitate the final decision on whether the data should be opened. For example, the benefits of sharing and opening datasets form part of the value of a dataset. You can read more about the benefits of sharing data in step 3 of the operating model: “Motivation and organisation” and in step 5: “Planning and implementation”. The value of datasets can also be determined with, for example, the factors used in PESTEL analyses or the value pyramid.

According to the PESTEL analysis, the value of a dataset can be examined from the following perspectives, which are also used in the EU's annual Open Data Maturity assessments (pdf):

Social value promotes equality (minority rights, rights of persons with disabilities, opportunities for participation).
Economic value can be measured with financial indicators (new jobs, companies, services, tax revenue).
Ecological value promotes material efficiency and circular economy (material recycling efficiency, natural resources use, recycling).
Knowledge promotes knowledge-based decision-making, which often has significant side effects (political and societal decision-making).

The value of datasets should not be confused with so-called high-value datasets, the opening of which is specifically provided for in European Commission Regulation 2023/138 adopted under the Open Data Directive (EU) 2019/1024, and which are described in more detail below. The value of these datasets has been defined separately by the Commission, and they are subject to specific statutory obligations. However, the criteria specified in the Open Data Directive can be used as a guideline for determining the value of other datasets:

the reuse of the datasets is associated with important benefits for society, the environmental, or the economy
the datasets are suitable for creating value-added services, applications, and new, high-quality and decent jobs
the large number of potential beneficiaries of the value-added services and applications created on the basis of the datasets.

High-value datasets

High-value datasets (HVD) are defined in the European Commission Regulation 2023/138. They are public sector documents and information that, under the Open Data Directive, are considered to have particular value for society, the environment, and the economy. High-value datasets must be made available free of charge and in machine-readable form through APIs. The European Commission’s regulation is based on the Open Data Directive, which is explained in more detail in step 2 of the operating model. If a document or material does not fall within the scope of the Directive, it is also not considered a high-value dataset under the Regulation.

Licence

Any high-value datasets under the Directive and Regulation must be made available for reuse under a licence that allows for their unrestricted reuse. Suitable licences include the CC0 license or, alternatively, the Creative Commons BY 4.0 licence or an equivalent or less restrictive open license.

Metadata

Public sector bodies with high-value datasets, as defined by the Regulation, must ensure that the datasets are designated as high-value datasets in their metadata descriptions. The Implementing Regulation also contains detailed, sector-specific requirements for metadata. For more information about metadata and how it should be described, see step 6 of the operating model.

What are high-value datasets?

According to the Implementing Regulation, high-value datasets are divided into six thematic categories:

Geospatial
Earth observation and environment
Meteorological
Statistics
Company and company ownership
Mobility

What are the benefits of opening high-value datasets?

According to the European Commission, high-value datasets significantly reduce the barriers to entry into European data-driven markets and increase the reuse of datasets. They help promote research, the creation of new digital services, and the improvement of existing services or business processes.

The reuse of geospatial and mobility data can open up business opportunities for the logistics or transport sector and improve the efficiency of the provision of public services, for example by understanding traffic flows to improve transport efficiency.

Earth observation and environmental data, as well as meteorological data (e.g. radar data, air quality, soil pollution, biodiversity), can be used to support e.g. research and knowledge-based decision-making, especially in combating climate change and its impacts.

Statistical data (e.g. labour market, demographic structure, industrial production) makes it easier to predict the impacts of e.g. possible policy measures.

Company and company ownership data increases market transparency and allows for more accurate targeting of private investments or public support. The wider availability of information concerning businesses has clear social benefits, for example in the fight against crime (including financial crime), increasing civic participation, and promoting transparency in business.

Personal data

The Open Data Directive and the Regulation pertaining to high-value datasets do not apply to documents whose availability or disclosure has been restricted on the basis of the protection of personal data. Compliance with the GDPR must always be ensured with regard to the processing of personal data.

Value pyramid

Understanding the value of data has developed significantly over the past 20 years. For example, it is no longer believed that public administration should be responsible for producing value by itself – instead, it should produce value together with the rest of society. The value creation process often takes place in ecosystem-like structures, where several organisations can create value together. Organisations produce value together with their customers, i.e. both the openers and users of the data participate in the value creation process.

The opening of data generates value depending on the opened data’s utilisation rate and methods. The value pyramid can be used to determine the value of datasets by, for example, surveying data users about the order of importance of the value-bringing elements included in the pyramid. This helps to form a true picture of the value of the dataset being opened.

According to the value pyramid, value-bringing elements can be divided into five levels:

inspirational value,
individual value,
ease of doing business value,
functional value, and
minimum-requirements (table stakes) value.

Figure: Value pyramid based on Maslow’s hierarchy of needs (adapted from: The B2B Elements of Value, Harvard Business Review)

The values at the lower levels of the pyramid are more objective, which makes it easier to measure and utilise them when identifying the value of datasets. The higher the level of the pyramid, the more subjective the value, which also makes it more difficult to measure these values.

Demand for datasets

This section describes how the organisation can determine the type of data that is in demand outside the organisation and how the data to be opened could benefit both potential users and the organisation itself.

The data’s potential users may have different data-related needs, but it is also important to take the needs of your own organisation into account in this context. The organisation’s operations should be guided by its needs, which is why surveying and prioritising needs is important.

Dataset and data product

Datasets are used to structure data into a cohesive and logical entity. If necessary, subsets of data can be compiled into a single dataset from several different information systems, databases, or data warehouses.

Datasets can also be seen as data products. A data product is a maintainable product created for an organisation’s internal or external customer whose value is based on its data content. In other words, data can be considered to have customers whose needs the product meets.

Methods for identifying data needs

The demand for data and data-related needs can be surveyed, for example, with the methods below.

Information requests and feedback received by the organisation

Any information requests and feedback are recorded in the information management entity’s case register or feedback systems in the organisation. By analysing these requests and feedback, datasets may be identified for which there is a wider need. It is advisable to process your feedback in a transparent manner.

User statistics of the organisation’s website

If the organisation has published descriptive or other information about its datasets on its website, it can analyse the website’s user statistics and determine which datasets may be most interesting to its visitors.

For example, the Open Data service collects statistical data on e.g. the view counts and download numbers of datasets. These can be viewed on the service’s Statistics page and on the pages of individual datasets.

User surveys

The organisation may publish a survey, for example on social media, in which potential data users are asked about what kind of data they would like to access and for what purposes they would use it. For example, the Finnish Meteorological Institute has conducted user surveys to map users’ preferences regarding datasets to be opened.

Piloting the opening of data

Data users’ needs can be investigated and tested by means of pilots: the organisation can begin by only opening a small part of the data and targeting the supply to a limited group of users. The best-case scenario is that pilot users will quickly produce the first applications using the data, which can be presented as examples of data use. Other information needs may also emerge during the piloting.

Piloting is particularly worthwhile if it is likely that several groups will be interested in the data to be opened. The organisation can start by opening the data that is most likely to interest its intended target group in a format that benefits them. When planning piloting, the organisation should also remember that a decision to stop piloting should be made at some point. Rather than proceeding directly from piloting to production, the production system must be designed separately. After the piloting, other datasets can also be opened, and the accessibility of the data can be expanded by using different file formats.

The following are examples of different data user groups:

Authorities: It is easy for authorities to make use of the open data provided by other authorities, as the reuse of such data is not subject to any complex agreements, legal bases, or other such factors. For example, the cooperation between municipalities and the state may be based on sharing information as open data.
Innovators: The group of innovators consists of individual users who adopt new innovations quickly. They may provide good ideas for using the data, but getting this group together may be challenging.
Individual developers: Individual developers are are experimental data users who can test APIs and explore the types of data the organisation has. Testing new ideas with this group can be fast and productive.
Small companies: Small companies often take up new sources of information faster than their larger counterparts. To work together with a small company, intensive customer support may be required.
Large companies: Large companies are often driven by business pressures, and development processes can be lengthy and cumbersome. Large companies may insist on the open data being continuously and reliably accessible, with high-quality customer support available 24/7.
Data aggregators: Data aggregators promote the widest possible use of open data. They collect data from different sources, creating their own sets that they share with others. For example, an aggregator may collect timetable information on means of transport through different APIs and produce an application where users can find the timetables for all modes of transport.
Higher education students: Student groups in different fields can be easy to reach, for example by organising courses in cooperation with the educational institution, and students complete high numbers of different projects and theses.

Other ways to identify needs

Data users' needs can also be determined by means of data polls, interviews, identification of the organisation's internal needs, or by providing users with a tool for requesting datasets.

The organisation should express its willingness to receive data requests and offer a method for receiving them (including a specific e-mail address or a separate online form). Discussions on users’ preferences can also be conducted at events.

For example, HRI organises regular developer meetings, at which it receives data requests. From time to time, it presents data that it is only planning to open, enabling it to listen to future users’ preferences, for example concerning the data form and format. Read more about HRI’s data preferences and developer cooperation.

Support materials on the topic

This section contains support material related to the topics discussed in this step.

Training courses on the data.europa.eu website:

Updated: 12/20/2023