4. Planning and implementation

This part can be implemented as follows

The person promoting data sharing in the organisation and the data administrator 

  • assess the possibilities of sharing data, for example in terms of user rights.
  • assess the benefits, risks and costs of data sharing.
  • assess the need for data anonymisation and aggregation.
  • evaluate the quality of the data to be shared.
  • select the license under which the data will be shared.
  • make a decision on opening data.

A data protection specialist is consulted about 

  • identifying information security and data protection risks.
  • the need for anonymisation and aggregation.

The organisation's IT experts and the person responsible for the data to be shared

  • consider the technical implementation of data sharing, for example if the data should be shared in file format or through an interface.

Definition of the data to be opened

This part describes aspects that the organisation should assess and specify when it starts planning the opening of data in practice. 

No official recommendations exist for defining the datasets to be opened.
At the start, organisations that have already opened their data have usually identified the person who administrates and is responsible for the data that the organisation is planning to open, the information system in its background, and potential use cases. At this stage the organisation can, for example, make use of its information management model which describes the organisation’s information resources, or rely on the assistance of the person responsible for opening data, if the organisation has allocated resources to this task. 

Organisations that have opened their data have considered the aspects related to dataset ownership, copyrights, data disclosures, data protection and information security. If these factors have not prevented data sharing, the way in which the dataset could in practice be constructed and distributed technically has been examined with the data controller. At the same time, the completeness of the data set has been determined, as well as the level of accuracy at which the data can be opened without compromising on its potential usefulness or usability. 

In connection with identifying the dataset, the following have also been assessed:

  • the potential benefits, risks and costs of opening the data,
  • the quality, metadata and publication channel of the data to be opened,
  • the life cycle of the dataset, and
  • planning of sufficient communications network capacity and user support.

These aspects are described in more detail in the next steps. 

As a basic premise, organisations that have opened their data have striven to determine if national or international standards exist for opening the data in question (for example, data modelling and formats), or if some other party has already opened similar data, in which case the organisation can use its information model for opening its own data. 

The six largest cities in Finland have compiled a list of certain international standards (google sheets) which have also been used for data sharing in Finland. The DCAT-AP data model profile is used in the metadata of the datasets, for example on opendata.fi. When using international standards, organisations should note the dissimilarities between the laws of different countries, especially regarding data protection. 

At the same time, it is also advisable to consider if it would be possible to also open the data production process (calculation rules, algorithms, etc.) when opening the dataset.

For more information, visit the web service data.europa.eu:

Tips from Helsinki Region Infoshare

It is important to clearly designate a single party (responsible role) for opening the data, which parties outside the organisation can contact about open data questions. It is also a good idea to organise developer meetings and similar, where you can collect feedback even before opening the data.

Tips from the Finnish Meteorological Institute

The data controller should consider how metadata and a verbal description of the dataset will be produced, and how any need for user support will be responded to after publication.

Tips from the National Land Survey of Finland

The data controller must go through the copyrights of the data in detail, ensuring that its ownership is clear. If the data has previously been licensed against a fee, the transition period and customer communications should be planned carefully. In addition, sufficient communications network capacity must be ensured.

 

Benefits, risks and costs

This step describes how the organisation can assess the potential benefits, risks and costs of opening the planned dataset. To download a tool developed for assessing benefits, risks and costs, go to the section Method for assessing the potential benefits, risks and costs of opening data.

The Information Management Board has issued a set of recommendations for applying certain information security provisions (in Finnish) (Ministry of Finance publications 2021:65), according to which information risk management is a continuous activity, and the information management entity should describe the objectives, responsibilities and key methods related to it. The management is responsible for the organisation of and allocation of resources to information risk management. In addition, the information management entity maintains datasets comprising the risk assessment results and risk management plans and regularly assesses if this data is partly or fully secret or classified.

The Information Management Board has also issued a recommendation on the criteria for assessing information security in public administration (Julkri), which contains instructions for applying the criteria (Ministry of Finance publications 2022:43). The assessment criteria support the needs to develop and evaluate information security in all branches of public administration. They can be used to assess compliance with the Public Information Management Act, the Decree on Security Classification of Documents in Central Government and, in part, the information security requirements laid down in the General Data Protection Regulation.

Digital security

Ensuring digital security is important when planning and going ahead with opening data. Digital security includes matters related to risk management, continuity management and preparedness as well as cyber security, information security and data protection. Citizens, companies and communities must be able to rely on ethically sustainable, open and transparent public administration services also in the digital environment.

Government Resolution on Digital Security in the Public Sector (Ministry of Finance publications 2020:23) defines the principles of development and key services for promoting security in digital environments. To promote digitalisation and digital security in a balanced manner, the Ministry of Finance has appointed a strategic management group for digitalisation and digital security in public administration.
Find out about actions and documents related to developing digital security (in Finnish).

Method for assessing the potential benefits, risks and costs of opening data

As part of the operating model for data sharing, an assessment method was developed to help public administration organisations to assess the potential benefits of opening and sharing their datasets and the risks and costs associated with data sharing. This tool is also known as the BRC method (benefits, risks and costs). 

Download the BRC assessment tool (in Finnish, Excel file)

The BRC tool is an Excel sheet that provides an overview of assessment results based on the responses the user has filled in, helping to determine the potential benefits, risk profile and costs arising from opening the dataset. It should be noted that the summary based on the responses is only an indicative overview of different observations, not a recommendation. Each organisation makes the decisions on opening its data independently, taking into account legislation (including access rights and data disclosure), official guidelines and its internal policies. For example, the summary can be used as background material to justify the potential benefits of data sharing to those who make the decision on opening data.

The BRC method is based on assessment methods used by public administration organisations in Finland and internationally. Such parties as the National Institute of Standards and Technology, the University of Washington, Harvard University and several other expert organisations have been involved in developing the first version of the method.

The assessment method can be used to:

  • prioritise the order in which datasets are opened when resources are limited
  • identify datasets whose opening involves different risks
  • reach an understanding of the costs incurred from opening data
  • identify datasets with the greatest potential benefits for external stakeholders (data users) 
  • analyse possible income obtained from sharing data
  • introduce a systematic approach to the opening of datasets and decision-making on data sharing

The assessment method is intended for those responsible for opening data in organisations. They may include those responsible for datasets, heads of information management or data opening coordinators. In addition, it is advisable to involve specialists of each area from technical experts to data protection officers in the different stages of the assessment.

Ensuring data protection

This part describes the assessment of needs to aggregate and anonymise the dataset to be opened as well as organisation-specific practices for ensuring the data protection of the dataset and for carrying out any aggregation, anonymisation or pseudonymisation required. 

There are no official recommendations for assessing data aggregation and anonymisation needs.

Usually, organisations that have already opened their data have carefully assessed if the dataset to be shared is public and if it may contain personal data or other data critical to the functioning of society. If the dataset to be opened contains data concerning persons or related to the security of society in some way, the opening of the datasets and any aggregation, anonymisation or pseudonymisation of the data should be discussed with the organisation's data protection officer. The data protection officer or other data protection experts or networks of the organisation should also otherwise be consulted when assessing the aggregation and anonymisation needs of the dataset. 

According to the principle of openness (Act on the Openness of Government Activities 621/1999), official documents shall be in the public domain, unless specifically provided otherwise in the Act on the Openness of Government Activities or another Act. It is important to note, however, that a public document may contain personal data and that there must always be a legal basis for disclosing personal data, even if the document is public. The authority must assess if the personal data contained in the document can be disclosed. Consequently, public information does not necessarily mean that the information can be published, as a public document may contain personal data that cannot be published even though the document is not secret. The precondition for secrecy is fulfilling the criteria for secrecy laid down in the Act on the Openness of Government Activities, and secrecy provisions are also included in special legislation. 

Anonymisation is a way of removing personal data from a dataset. It should be noted, however, that as long as a person can be directly or indirectly identified based on the data or the data can be reverted to an identifiable form, it is still personal data, and the General Data Protection Regulation applies to it.

Anonymisation means processing the personal data in a way that eliminates the possibility of individual persons being identified on its basis. For example, the data can be aggregated and opened at a more general level, or converted into a statistical format, which means that the data concerning an individual person is no longer in an identifiable format. Identification must be prevented irrevocably and ensuring that the controller or some other third party can no longer convert the data back into an identifiable format using the data in their possession.

Pseudonymisation means processing the personal data in a way that eliminates the possibility of associating it with a certain person without additional information. Such additional information must be kept carefully separate from personal data.

The Office of the Data Protection Ombudsman is the national supervisory authority overseeing compliance with data protection legislation. The Data Protection Ombudsman and the Deputy Data Protection Ombudsmen perform their duties independently and impartially. The Office of the Data Protection Ombudsman has an Expert Board (whose term of office runs from 1 October 2020 till 30 September 2023). The Expert Board’s task is, on the Data Protection Ombudsman’s request, to issue statements on significant issues related to applying the legislation on personal data processing. For more information, visit the website of the Office of the Data Protection Ombudsman.

Under the General Data Protection Regulation, certain controllers and processors of personal data must appoint a data protection officer. This obligation applies to all authorities and public administration bodies. The data protection officer provides guidance on data protection to the controller and employees processing personal data. They monitor compliance with the GDPR and the information activities and training provision related to data protection in their organisation. The data protection officer provides advice related to impact assessments and serves as the contact point for the supervisory authority.

Read more:

State Treasury's practices

State Treasury analysts carry out analyses on assignment. In such analyses, the shared information platform of the central government is mainly used, to which datasets specified in the assignment are imported. Together with the customer, the analyst defines the data areas needed for the analysis using a data navigator. The navigator contains descriptions of the data residing in the systems of central government's joint service providers. In this description, the service provider and agencies have specified if a field may contain personal or secret information. A predefined set of data masking rules is generated for the described fields.

The analyst places an order for data that matches the assignment with the data engineer. The data engineer retrieves the necessary columns from the service providers' systems through interfaces, removing any unnecessary columns from the dataset to minimise the data, and masking the data following the regulations:

  • Text fields that may contain personal data are deleted
    • For example in financial monitoring data, description texts 1 and 2 for the monitoring target are deleted
    • For example, fields containing personal names or e-mails are deleted
  • In the dataset, personal identifiers are encrypted using an encryption algorithm, ensuring that the original value cannot be identified while preserving the uniqueness of the identifier
    • For example, using a cryptographic sealing function, the contents of fields containing personal identity codes are converted into a string from which the original value cannot be directly derived
  • The State Treasury may not have access rights to the data at a certain level of accuracy, but the data aggregates produced from it may be public. In these cases, the service provider aggregates the data to a public level defined together with the agencies on a case-by-case basis. In this context, aggregation refers to the re-grouping of data on the basis of one or more factors to a less accurate level.
    • For example, instead of the operating unit, the sum or average of the accounting unit level is shown
    • For example, trips to different continents are shown, instead of to individual countries

The masked and minimized data is transferred to the platform for the use of the analyst. The analysis is carried out based on the masked data that does not contain directly identifiable personal data. If the analyst finds that the data may contain direct personal data, however, they inform the data engineer of this, making it possible to adjust the masking rules for the fields in question, and refrain from processing the dataset before the correction has been made and the dataset is free from personal data. Once the analysis has been carried out, the analyst aggregates the results to a statistical level before the results are presented to the customer. This means ensuring that the groups to be shown contain data concerning at least five persons, making sure that no individuals can be identified in the results.

Statistics Finland's practices

To ensure the data protection of a dataset, it is important to examine if there are target units in the dataset whose identity or attributes could be disclosed directly or indirectly. A precondition for direct identification is that the dataset includes a unique attribute of the target unit, such as a name, address or business ID. Indirect identification is possible when the target unit can be identified on the basis of several attributes, for example ‘mayor’ as the occupation and the municipality in which the person works as additional information. Some attributes of an individual target unit may also be revealed without identifying the target unit when a larger group to which the target unit belongs shares some of the same properties. In a survey on well-being at work, for example, all employees of a certain department have responded to the survey and expressed their dissatisfaction with the physical working environment.

When assessing this risk of disclosure, there is a major difference between talking about unit-level datasets or data that has been aggregated in some way. When processing unit-level data in which the properties of an individual target unit are examined by target unit, indirect disclosure may still be possible even if the data has been aggregated by attributes. Longitudinal datasets which examine the situation of the target unit over the long term are a good example of this. The mobility or work history of a person can very quickly lead to a situation where the possibility of indirect identification cannot be excluded, even if the data is aggregated to some extent. In the case of unit-specific datasets, the risk of disclosure should consequently be examined across a broad front, taking several attributes into account simultaneously. In general, anonymisation of unit-level datasets using aggregation and data delimitation results in small datasets mainly used as examples. Alternative data protection methods include data scrambling, (multiple) imputation or the production of synthetic datasets. 

Statistics Finland has produced anonymous unit-level datasets intended for teaching purposes. The results obtained from these datasets may be indicative, but they are in no way suitable for producing statistical reports or scientific research. Read more about teaching datasets.

Aggregated data refers to data in which the attribute values of several target units have been compiled. This data can be divided into frequency tables describing the number of target units and quantity tables describing attribute values which, for example, describe totals or averages of an attribute. For frequency tables, the disclosure risk is determined by the cell value of each cell as a threshold value which the target units in the cell must reach at minimum. The threshold value depends on the attributes to be examined. In its official population statistics, Statistics Finland partly includes even individual persons in the statistics. Generally, however, at least three target units in a cell are required to ensure data protection. Applying this minimum value avoids situations where two target units sharing the same attributes could infer each other's values on the basis of the published data. Statistics Finland applies a higher threshold when examining data at a geographical level more accurate than a municipality (the threshold may be as high as fifty when looking at grid data) and, usually, a threshold of ten is applied to special category data referred to in the General Data Protection Regulation or crime data. 

For quantity tables, merely looking at the threshold value is not enough to prevent the attribute values of another target unit from being inferred if the target units are in the same cell. In this case, Statistics Finland also applies the dominance rule to identify cells involving a disclosure risk. This means that cells in which a single target unit, or several target units together dominate (produce the majority of the value of the cell), are flagged as subject to protection. For example, if the cell examines company turnover by industry and region, it should not be possible to infer the value of an individual large company in a cell where the other companies have very low turnovers in proportion to the largest.

The threshold value or dominance rule can be used to determine the cells presenting a primary disclosure risk. If the data is deleted (masked) in the published dataset, recalculating their values is easy if the dataset also contains marginal amounts, or totals of rows and columns. In this case, additional masking must be used to ensure data protection. Special software is available for complementary masking, the use of which ensures adequate protection when determining the cells subject to secondary masking. Such special software includes Tau-Argus and sdcTable R package. More information about software on GitHub.  

For more information on data protection, see Statistics Finland’s resources for researchers:

HRI’s instructions for assessing the aggregation and anonymisation needs of survey data

In cooperation with the City of Helsinki's data protection officer, Helsinki Region Infoshare has created instructions for opening survey datasets (and other data containing personal data) (in Finnish).

Good practices, support materials and other publications of VAHTI working groups 

VAHTI is a cooperative, preparatory, and coordinating body of organisations responsible for the development of digital security in public administration. Organisations can use best practices and VAHTI guidelines to develop different areas of security. 

VAHTI activities were transferred to the Digital and Population Data Services Agency in early 2020.

Outdated recommendations can still be applied if legislative amendments are taken into account.

The Digital and Population Data Services Agency has produced several training packages on life with digital security on eOppiva, for example online training on risk management in the digital world and the ABC of data protection.

Selecting the form of data sharing and file format

This part describes the forms in which data can be shared and what should be taken into account when selecting it. 

No official recommendations exist for determining the form in which datasets should be shared and the file formats to be used.

The selected data sharing method should be compliant with the legislation on access rights, data disclosure and providing data in machine-readable format as well as the obligations imposed by these statutes, such as sections 22 and 24 of the Act on Information Management in Public Administration. In addition, any needs to modify the datasets should be accounted for, including pseudonymisation or anonymisation.

Organisations that have already opened their data have shared data as files, through APIs or using a download service. The technical implementation of data sharing largely depends on the types of sharing solutions developed for the information system. Data in file format can be exported from the system as a batch report and/or through an API. APIs rarely exist in, or can be developed for, older information systems, which is why batch files may be the only option for sharing data. Whenever possible, the dataset should be shared in several different formats, for example offering a file in addition to an API.

When publishing open data, it is advisable to use open data formats (file formats) as far as possible. More information on selecting file formats for open data is available on data.europa.eu service. 

Which sharing method is suitable for each type of data?

Sharing data as a file

A file is a suitable way of sharing small and/or static datasets that do not change much or often. High quality open data is shared using an open file format.

An open file format means a non-commercial format that anyone can use free of charge. The use of open file formats is not restricted by copyrights, patents, trademarks or other restrictions. For example, Microsoft’s .docx or .xslx file formats are commercial, not open, and using them with free software is difficult. Open file formats usually enable software-independent re-use of data. This is important in order to ensure that commercial rights do not restrict the re-use of data.

The list below contains tips for publishing different types of datasets:

  • Text data: TXT. The easiest and most reliable file format for publishing text files is .txt.
  • Tabular data: CSV. The best and easiest file format for tables is .csv (Comma-separated Values). CSV files can be easily created using common spreadsheet programs, including Microsoft Office Excel, by selecting csv as the file format when saving.
  • Spatial data, small vector data: GeoJSON, KML, Esri shapefile (shp) or GeoPackage. The first two options use the global WGS84 coordinate system, which can be easily processed with a variety of programs and tools. An shp file, on the other hand, supports several coordinate systems, including those developed for the Finnish conditions.
  • Location data, big raster data: GeoTIFF or NetCDF. To publish data in raster format, for example GeoTIFF file format can be used. 

When sharing data in PDF format, it is advisable to pay attention to the PDF version used and ensure that the data is in a machine-readable format. Adobe developed and patented PDF in the 1990s as a commercial file format. In 2008, its version 1.7 (ISO 32000-1) was standardised as an almost open format, while some of its features remained Adobe’s property (including Adobe XML Forms Architecture, Adobe JavaScript). In PDF 2.0 version (ISO-32000-2) published in 2017, however, all features were open. Read more about open file formats

Read a comprehensive list of open file formats in Wikipedia.

In Tim Berners-Lee's five star model, data published in an open file format should have at least 3/5 stars.
 

One star: make your data available on the Web (whatever format) under an open license. Two stars: make data available as structured data (e.g., Excel instead of image scan of a table). Three stars: make data available in a non-proprietary open format (e.g., CSV instead of Excel). Four stars: use URIs to denote things, so that people can point at your data. Five stars: link your data to other data to provide context.
Tim Berners-Lee five star model

Sharing data through an API

What is an API?

Application Programming Interfaces (APIs) are documented interfaces through which software, applications, or systems can exchange data or functionalities. The API provides data or a functionality in a machine-readable, documented format, making it possible for some other software, application or system to use it programmatically. 

In this operating model, API, application programming interface and technical interface referred to in the Public Information Management Act mean the same thing. It should be noted that rather than referring to an interface intended for end users, APIs are always used by some other software, application, application component or system.

Why use APIs for data sharing?

Sharing data through APIs is in many ways advisable and useful, especially if the volume of the data is very large and the dataset is updated frequently or in real time, in other words comprises dynamic data. This includes train timetables or weather data. However, it is worth remembering that file sharing is also useful especially for those persons and parties who are unable to use APIs. If an API is not otherwise in use, file sharing may demand significantly less resources of the sharing party than implementing and maintaining a new interface.

An API can be a web-based and, for example, a file-based or database-based interface implemented with REST, SOAP or GraphQL technologies. The essential point is that the API provides data in a machine-readable, documented format, making it possible for some other software, application or system to use it programmatically. Providing the data through web-based interfaces is a good idea if this is possible and consistent with the purpose of use. 

Web-based APIs can be used in both internal and external interfaces, and a wide range of information security controls can be implemented in them. The file format to be shared depends on the communication protocol. For example, web interfaces usually use a https based communication protocol or architecture, such as REST. API interfaces are also highly suitable for sharing statistical data residing in a database. For example, see the datasets in Statistics Finland's open databases.  

It is important for the organisation to determine which types of datasets are offered or used through APIs internally and externally, and which datasets should be accessible through user interfaces. Internal access and use can be implemented through internal interfaces (internal APIs). To provide for external access and use, partner interfaces (partner APIs) or public interfaces (public APIs) can be used, depending on the classification of the data. The essential point is that the interfaces are taken into account as part of the organisation's other information management and operating processes as well as the goals of knowledge management. 

Using the national API principles in the design and development of APIs is advisable. The public administration API principles provide support and instructions for public administration actors in the development, management and file formats of APIs. Among other things, the API principles provide support for specifying the interfaces, assigning of responsibilities, promotion of interoperability, procurement, testing and implementation of APIs. When designing the API, it is important to specify how changes to the life cycle plan or service level of the API are managed.

Additional information and support material for API development, management and file formats:

Comparing the forms in which data can be shared

The table below can be used to support the selection of an appropriate form of data sharing. It strives to highlight the differences between the possible forms.

Comparison between file and an API

  File API
How easy is it to use? Usually the easiest to use. Small CSV files, for example, can be opened using standard office software APIs are often only used by people with programming skills.
The design of the API affects its use friendliness. The entire life cycle of the API should be taken into account in its design.
Technical competence required of the administrator No specific technical competence is required. Requires competence in both developing and maintaining the interface.
Data volume Small volume Large volume
Data delimitation When the data is published as a file, the entire dataset is always downloaded at once. The data is delimited based on a query, or all data can be retrieved at once.
The interface can also offer files.
Rate of data updates A file is primarily suitable for sharing data that changes very little/infrequently.
If the data changes, the updated version must be shared separately.
An API is often recommended for data that changes frequently.
Monitoring of use Challenging, as a file is easy to copy Easy, as analytics can be collected on interface requests, including the IP address, query, time, date, query-response, etc.
Practical examples Postal codes, State budget, most popular first names, small statistics Weather and timetable data, business data, mobility

Examples of how organisations are sharing data

Methods used by the Finnish Meteorological Institute to share data

The Finnish Meteorological Institute shares its data through its own interface services and Amazon AWS.

Helsinki Region Infoshare’s tips for choosing the data sharing method

Helsinki Region Infoshare service for the cities in the Helsinki Metropolitan Area has compiled tips for assessing technical feasibility. The questions below will support the assessment. 

In which format should data be opened?

As a file:

  • File in which the data is maintained (xlsx/csv/shp/ …) 
  • The data is exported manually from the system
  • The data is exported automatically from the system
  • Usually a quick and free-of-charge way to open data, but often requires manual and memory-resident updates

Through an API:

  • An API is created for data that is exported from the system automatically
  • An API is produced for the system/system copy
  • More work and resources are required at the beginning, but no separate updates are required

Questions that should be considered when selecting the data format:

  • How often is the data updated?
  • How large is the data volume?
  • Is it real-time or, for example, annually compiled data?
  • How much manual work does editing the data take?
  • What could the data be used for?
  • Are there any standards?
  • Has any other party already opened similar data? How was it done? Would it be possible to open the data in a similar format?

Helsinki Region Infoshare has developed a tool named Datasette that makes it possible to publish data available through an interface in a file format. Read more about Datasette on HRI’s website (in Finnish).  

Defining data quality

This step describes how the quality of the dataset to be opened can be evaluated, specified and described.

No official recommendations exist for defining the quality of datasets to be opened.

Organisations that have already opened their data have striven to describe their evaluation of the current quality of the data, including possible shortcomings, in the description, or metadata, of the dataset. In the opendata.fi service, for example, you can type the data quality evaluation in the Description field of the dataset’s metadata, or add the description as a separate resource in PDF format. 

It is important to note that while the quality of the dataset to be opened is not as good as the party administrating the data or their stakeholders might wish, this does not necessarily mean it cannot be shared. The dataset can be shared, drawing attention to the shortcomings in its quality in the metadata.
The public administration's shared data quality criteria and indicators developed to support improvement in the quality of public administration data can be used to evaluate and describe the data quality.

Data quality criteria

A public Data Quality Framework has been developed under the leadership of Statistics Finland and through broad-based cooperation within public administration. This work has been carried out as part of the Ministry of Finance's project on Opening up and using data. The data quality criteria and indicators were published in spring 2020. 

The data quality criteria can be used to describe and assess the quality of data. They also help data users to assess if the data is of sufficiently good quality for the intended purpose. In the longer term, the quality criteria help improve the quality of data and information resources.

The quality criteria are intended as a flexible tool; not all criteria or, in particular, indicators are necessarily relevant to all situations or data sets. It should additionally be noted that the purpose of the data affects the level of each quality criterion that should be aimed for. For instance, for some purposes continuously updated data is needed (pandemic monitoring), whereas for others, annual or even less frequent updates (location of old buildings) is sufficient. While the quality criteria and their indicators add up to a hierarchical structure, they also affect and are linked to each other.

The quality criteria of the quality framework, and especially their indicators, focus on structured data. From the data user’s perspective, the quality criteria for datasets have been grouped under three questions.

How well does the data describe reality?

  • Timeliness: Timeliness describes the time dimension of datasets. The closer the reference date of the dataset is to the present, the more timely the data will be. The reference date is the date to which the data relates.
  • Coherence (regularity, logical integrity of data): Coherence indicates that the dataset is consistent and non-contradictory. Coherence can also be used to describe consistency between different datasets.
  • Completeness (extensiveness): Completeness describes the temporal and regional coverage of the dataset as well as the target units and attribute data that are aimed for. On the other hand, completeness indicates the extent to which the dataset contains the desired data.
  • Correctness (validity): Correctness describes the extent to which the data in the dataset corresponds to reality. By examining the correctness of data, systematic distortions in the dataset may also be picked up.
  • Accuracy (unbiasedness): Accuracy describes how well the data in the dataset corresponds to what is aimed for and how precise the data is.

How has the data been described?

  • Traceability (non-repudiation): Traceability indicates that any changes made to the dataset and its data can be traced. The origin of the data is known.
  • Intelligibility (interpretability, comprehensibility): Intelligibility describes the extent to which the dataset has metadata that helps to understand the data when in use.
  • Compliance with recommendations (interoperability, semantic uniformity, consistency): Compliance indicates that the dataset and its attribute data comply with known standards, practices and statutes and that they have been reported in connection with the dataset.

How can the data be used?

  • Machine readability: Machine readability indicates if the data has been structured to enable computerised processing, and processing in different information systems.
  • Punctuality (timeliness): Punctuality means that the dataset is available on the given date and with sufficient frequency to reflect changes in the dataset.
  • Access rights: Access rights describe how the rights to use the dataset have been defined and what users may do with the data, in other words what purposes the data can be used for. 

Read more:

Defining access rights

This step describes how the dataset to be opened should be licenced, in other words what terms of use should be set for the data.  

No official recommendations exist for defining the access rights for datasets to be opened.

Data access rights are defined by selecting a suitable licence that informs data users of the terms on which the published data may be used. The licence must be cited in the metadata of the dataset to be published. 

How do you select suitable access rights for the opened data?

In order to qualify as open data, the shared data must have an open licence which allows free sharing, modification and use of the dataset for all purposes, including commercial ones. A fully open license (for example CC0) means that the organisation waives all copyrights that restrict the dataset’s use, within the limits of legislation.

Datasets published as open data are usually licenced under a Creative Commons CC BY 4.0 or CC0 licence. While no national recommendations currently exist in Finland for the licensing of public administration’s open datasets, the earlier JHS-189 recommendation on open data access rights recommended the use of CC BY 4.0 licence. 

Most common open data licences

Creative Commons CC0 1.0 Universal

The CC0 licence means that all copyrights to the data are waived. Data licenced under a CC0 licence has been fully released for free use, both for commercial and non-commercial purposes. The data user does not need to acknowledge the origin of the data or request permission for its use. 

The metadata of datasets is often published under a CC0 licence. For example, the metadata on hri.fi service is CC0-licensed, which makes it possible for the metadata to be automatically copied to the opendata.fi service.

Creative Commons Attribution International (CC BY 4.0)

CC BY 4.0 or CC Attribution International 4.0 licence obliges the data user to acknowledge the origin of the data, as the name indicates. The data user must acknowledge the source, provide a link to the licence and indicate any changes that may have been made to the data. CC BY 4.0 licensed data can be used freely.

Example of acknowledgement

Helsinki Region Infoshare service recommends the following acknowledgement of using data published on the service: "Source: Revenue and expenditure of the City of Helsinki. Data maintained by Helsinki City Executive Office. Dataset downloaded from Helsinki Region Infoshare service on 15 November 2021 under Creative Commons Attribution 4.0 licence".

Example of a disclaimer

In addition to a licence, liability towards data users may sometimes need to be limited with a disclaimer.

(Organisation name) shall not be liable for any loss, litigation, claim, prosecution, cost or damage of whatsoever nature, caused either directly or indirectly by association with the open data published by (organisation name) or the use of the open data published by (organisation name).

Why should Creative Commons licences be used?

When using Creative Commons licences, the organisation knows in advance what to do in case of various disputes concerning data access rights, for example. It is important that the organisation does not start creating licences itself, as case law related to them cannot be predicted. 

The use of known licences is also beneficial for data users

  • The Creative Commons licences are internationally recognised, which makes cross-border use possible 
  • Aggregating and re-using datasets is easier when they are subject to consistent and familiar terms and conditions.

Read more about Creative Commons

Creative Commons is an international, non-commercial organisation that promotes the sharing and use of creativity and information by means of free legal tools. The free and user-friendly copyright licenses of Creative Commons provide an easy and standardised way to give the public the right to share and re-use creative outputs on specific terms and conditions. Rather than replacing copyrights, the CC licenses operate alongside them. 

Read more about Creative Commons Finland’s work (in Finnish).

Read more about open data licences on data.europa.eu.

Creative Commons offers assistance for selecting a suitable licence.​​​​

Deciding to open data

This step describes how the organisation can proceed in decision-making on opening data.

There are no official recommendations for making decisions on opening data.

Under Finnish legislation, the decision to open a dataset is made by the authority to which the task of administering the data has been assigned in the legislation. For example, the Finnish Institute for Health and Welfare (THL) makes the decisions on providing its datasets as open data. There is no centralised body in Finland that would make decisions on the openness of data in the entire administration.

Data is opened for someone to use it. Organisations often know at least some of their customers who could use the data to be opened and see value in it.  It is worth assessing the opportunities created by opening the data, for example in a workshop, with these potential data users. As input information, a description of what data the organisation could open is needed. Organisations often have a great deal of data, and only a small part of it can be opened at once.

At the same time, a decision on managing any residual risks should be made. Residual risk refers to the remaining risk or part of a risk that the organisation cannot or chooses not to counteract with measures. Read more about residual risks in risk management instructions (in Finnish).

If the organisation intends to open several datasets, it may need to prioritise the order in which they are opened and its development measures.

Practice of Helsinki Region Infoshare

Rather than an official decision on opening data being made, in the City of Helsinki the data owner specifies the data to be opened without a formal decision-making process. The number of datasets opened in HRI is small enough to avoid any need for prioritisation. Any data opening processes that have been under preparation for longer are shown on the HRI website (in Finnish)

Practice of the Finnish Meteorological Institute

At the Finnish Meteorological Institute, decisions on opening data through an interface and prioritisation are made by a steering group operating within the Institute.

Practice of opendata.fi

The opendata.fi service is a free publication platform for the entire public administration. The platform operates on a self-service principle, which means that each authority can freely use it for opening and using data.