This step can be implemented as follows
The person promoting data sharing in the organisation and the data administrator
- prepare comprehensive metadata for the data.
- determine where the data can be published.
- publish the data in a data portal, such as opendata.fi.
- together with the organisation’s communications unit, agree on how they will communicate about the publication of the data.
The organisation's IT professionals
- prepare the data for publication.
This step discusses why and how the organisation should describe the dataset to be opened and where the description data can be published. This description is also referred to as metadata.
Why is describing the dataset important?
Describing datasets helps data users, both people and computers, to understand the data and, consequently, facilitates its re-use. Information that describes the data is called metadata. Metadata provides information, for example, about the dataset's origin, structure or access rights. Without this information it would be impossible to use the data. When data is published on the Internet, its original context can easily be lost, which makes it particularly important to provide the user with metadata.
Let’s say that a dataset contains the number 37. Without metadata, the number 37 can refer to indoor temperature, shoe size, seating position or something else. You need metadata to help you understand the meaning of the number correctly. In this case, the number 37 could refer to indoor temperature and are given in celsius degrees.
Datasets usually have two types of metadata.
- Internal metadata of the dataset describes the data fields of the dataset and their connections.
- For example, the data model can specify the date format used in a table or what the table column ‘name’ refers to.
- To define the data model, the Data Vocabularies tool on the interoperability platform can be used.
- External metadata of the dataset describes the entire set, including its administrator and quality and how the data was produced.
While no Information Management Board recommendation exists on drawing up the metadata, for example the metadata published on the opendata.fi service uses the EU countries’ common DCAT-AP 2.0 data model.
Under the INSPIRE Directive, EU Member States have an obligation to prepare metadata for the spatial data sets and services covered by the Directive and publish them on a search service, which in Finland is Paikkatietohakemisto.
What is the DCAT-AP data model?
The DCAT-AP data model is based on the DCAT (Data Catalog) vocabulary created by the World Wide Web Consortium (W3C). The purpose of the DCAT vocabulary is to harmonise the metadata used by services intended for sharing different datasets.
DCAT-AP also harmonises the metadata required by data portals in different countries by providing a common metadata model that can be used by all portals. There are separate DCAT-AP versions for spatial data (GeoDCAT-AP) and statistical data (StatDCAT-AP).
Benefits of using the DCAT-AP data model:
- When all countries describe their datasets in the same way, datasets will be more interoperable and easier to use, also internationally.
- For example, a description of the data content is available for all datasets, and the descriptions should preferably also be offered in English.
- A predefined minimum volume of descriptive data guarantees that the datasets have been described to a sufficiently high standard.
- For example, it is impossible for data users to utilise data whose description data does not contain a licence, as the user cannot know what the terms of using the data are.
- The greater the amount of metadata available, the more discoverable the data is
- For example, keywords added to metadata help data users find relevant data in different services.
How should open data be described?
The metadata should be available in both human and machine readable formats.
It is advisable to provide the metadata in Finnish, but also in Swedish and English if possible. In particular, metadata in English is useful for potential international data users. The datasets described in the opendata.fi service can be automatically found in the data.europa.eu service, which compiles European data and thus gives international visibility to open data from different countries. Organisations that have already opened their data have, in connection with publishing the dataset, published descriptive information on the data, including the title, context, producer, content and structure of the published dataset.
Organisations that have already opened their data have published the metadata in connection with the dataset, either on a data portal (such as opendata.fi), on the organisation's website or both.
What kind of metadata should be provided?
The metadata of a dataset should include at least the following:
- basic information, including the name and licence
- When you publish a dataset in a data portal, there usually are mandatory fields for the necessary basic information
- a description of the data content and possible shortcomings
- the data production process
- a description of data quality using the indicators of the public administration’s common quality criteria
- any calculation formulas associated with data production or similar, if possible.
- contact details of the data administrator.
It is also advisable to specify in the metadata how often the dataset will be updated. The date on which the data will be updated and the update cycle are important information for data users. Data distributed through an API (such as weather data) may even be updated in real time, whereas data shared in file format (for example statistical data) may be updated less frequently, for example once a year.
Metadata practices in opendata.fi service
When you publish information on opendata.fi, you must fill in at least the mandatory metadata fields. This description information is required to make it possible for users to discover the published data on the service and use it. This metadata includes a description and the licence. If you wish, you can also add a separate text file to the dataset, in which you can provide a more detailed description of the data.
The metadata required for datasets on the opendata.fi service is based on the DCAT-AP 2.0 data model. DCAT-AP is a mutual agreement between the EU Member States on what metadata should be provided for open data. Its purpose is to facilitate the utilisation of data by harmonising the metadata requirements in different countries. When all countries provide similar information on their datasets in a similar format, developing international applications is easier. This is based on the DCAT standard created by W3C (in English).
Opendata.fi has created and documented its own DCAT-AP extension (in English). Thanks to this extension, opendata.fi is also DCAT-AP compatible, the search function of the service works better, and datasets are easy to find on data.europa.eu, to which the metadata of the datasets are copied (harvested). A metadata template for describing datasets can also be found in the opendata.fi technical framework (in Finnish, Google docs).
Read more about DCAT-AP and opendata.fi extension.
Data.europa.eu offers more information about the DCAT metadata model and training videos.
Metadata policies of Paikkatietohakemisto
Paikkatietohakemisto service maintained by the National Land Survey of Finland enables the storage and sharing of metadata for actors that produce spatial data, both INSPIRE compliant and other spatial data sets and services.
The description of spatial datasets includes their geographic and temporal coverage, their production process, and any restrictions related to availability. The description of spatial data services includes the datasets offered and links to the actual service. When you enter the keyword opendata.fi in the metadata of open spatial datasets and services, they are harvested to the opendata.fi service. The keyword ‘non-inspire’ should be added to datasets and services not within the scope of the INSPIRE Directive to ensure that they do not end up in Finland’s INSPIRE monitoring results.
Nearly 300 producers of metadata have registered with the service, representing approximately 150 organisations. The metadata of more than 1,500 spatial datasets or services has been published. About one half of them are with the scope of INSPIRE.
Metadata practices of Helsinki Region Infoshare service
Datasets should be given names that can be understood by as wide a range of users as possible. It is also a good idea to write a non-technical description of the dataset, including
- the type of data it contains,
- how the data was produced or collected, and
- if the data has a particular feature the user should know about to be able to interpret it correctly.
The description should also openly inform users of any shortcomings and possible errors in the data. It would also be advisable to describe and write out the data attribute information and any abbreviations used in the dataset. The metadata of the dataset should also be translated into English.
Metadata practices of the Finnish Environment Institute
The Finnish Environment Institute uses a tool for describing the metadata (metadata editor) and has a separate metadata service intended for end users.
Metadata can be produced for datasets and information systems, interface services, environmental reports and research data. Specific metadata profiles have been implemented for each one of these. The metadata profiles of datasets and interface services are compliant with the requirements of the INSPIRE Directive.
Both the metadata editor and the metadata service offer open interfaces for harvesting metadata. Through these interfaces, descriptions of the Finnish Environment Institute’s datasets and interfaces are transferred to the Paikkatietohakemisto service maintained by the National Land Survey of Finland (metadata compliant with the INSPIRE Directive), the opendata.fi service maintained by the Digital and Population Data Services Agency (metadata of the Finnish Environment Institute’s open datasets) and CSC's Etsin.fi service.
The Finnish Environment Institute's metadata system includes a metadata editor and a metadata service intended for end users of metadata. Both services have an interface for harvesting metadata.
Metadata practices of Statistics Finland
Production of reliable statistics requires a wide array of background information about statistics and the subjects they describe. The information about statistics section contains metadata for Statistics Finland's statistics.
The metadata service contains the following metadata sets.
- Descriptions of statistics produced by Statistics Finland
- Concepts and their definitions
- Classifications that lay the foundation for all statistics.
Statistics Finland maintains and publishes national classification recommendations. Most of them are based on international standards confirmed by EU directives. Use of classification recommendations improves the comparability of statistical information produced at different times and in different areas.
Classifications consist of headings, or names of groups, of codes given to them (numerical or alphabetical codes), and of descriptions of groups (definitions). Classification refers to dividing individual items of information present in statistical data according to certain features into different groups where each unit belongs to only one group. In the classification, the groups are named and codes are issued to them. Statistics Finland classifications have also been published through an open interface service.
Using the interoperability platform
The Interoperability platform maintained by the Digital and Population Data Services Agency enables uniform description and specification of content and efficient and transparent cooperation between actors in information management. It consists of glossaries, code sets and data models needed in information flows and other information management.
The interoperability platform is intended for both public administration and the private sector. The platform is available free of charge for terminology work, code set management and data modelling. Data content producers are responsible for their own data specifications and their quality, and for keeping them up to date.
The ready-made data specifications on the interoperability platform can be freely used. Using existing code sets and data models in your system development is cost-effective and improves interoperability between the systems of different actors. The consistent use of concepts makes services easier to plan and understand.
The Interoperability platform makes use of the interoperability method (in Finnish), which helps to create and maintain the semantic interoperability of information, or processing where the meaning of information remains unchanged in information flows. The interoperability method is a shared way of producing, managing and maintaining the information specifications and metadata needed for digital services and information flows.
Central to this are the uniformity and re-use of data specifications describing data content: maximum use is made of the existing vocabularies, code sets and data models. Data content in keeping with the interoperability method is described on the Interoperability platform, which is an open source online platform for creating machine-readable data specifications.
The interoperability method informs the creation of organisation’s common core concepts, core classes and codes and the generalisation of descriptions produced by an organisation for everyone’s use. It also guides and provides a frame of reference for the use of these common descriptions when organisations produce their own descriptions.
Public documentation on the Interoperability platform (in Finnish) contains necessary information on how to join the Interoperability platform and basic instructions on how to get off to a good start with using the tools.
Publication and communication
This step describes how and where the dataset to be opened should be published and how it should be communicated about.
No official recommendations exist for publishing and communicating about the datasets to be opened.
Organisations that have already opened their data have usually published the datasets to be opened and their metadata in a public data portal, ensuring that the datasets can be found as easily and quickly as possible.
Where should the opened data be published?
Several data sharing portals are available in Finland.
- Anyone can publish their open data on opendata.fi.
- The cities in the Helsinki Metropolitan Area publish their data on the HRI service as a rule.
It is advisable to make the datasets opened by central government actors visible on opendata.fi. This way, the message reaches a large number of people interested in using the data.
Opendata.fi service is as a national open data contact point, and the metadata for the datasets available on it is harvested by and published on the data.europa.eu service administered by the European Commission. Opendata.fi contains a large volume of metadata for datasets published in other Finnish data portals – among other things, data are harvested from Helsinki Region Infoshare service and Paikkatietohakemisto.
In addition to publishing the metadata for its dataset in a data portal, the organisation may also publish the data on its website.
Compliance with FAIR principles in data publication
The FAIR principles were originally developed for research data. They are also applicable to open data publication, however, even though not all principles can necessarily be followed as such.
FAIR stands for
- Interoperable, and
By following these principles, an effort is made to ensure that data can be easily re-used and that published data and its metadata are of a high quality.
Check that the open data you publish complies with the FAIR principles:
- Publish the data in a public data portal, such as opendata.fi.
- Make sure that the data is assigned a unique identifier.
- Publish the data in an open file format.
- Describe the data comprehensively.
- License the data under an open license, such as Creative Commons BY 4.
Read more about the FAIR principles and publishing open data (in Finnish).
How should I communicate about the publication of data?
It is advisable for an organisation to market the datasets it has opened using different methods of communication and encourage the use of the data.
For example, you can spread information about the opened data
- on the organisation's website,
- in a newsletter and
- on social media.
In addition, data can be presented to the organisation's networks, and stakeholders can be encouraged to use it by organising different events where the data is utilised.
The organisation should also encourage data users to tell the organisation about applications based on the open data published by it. Possible examples of applications can be showcased on social media, for example, to inspire other application developers and data users.
Publishing data on opendata.fi
Opendata.fi is a national open data service. Its aim is to make all open data in Finland available on a single site. The advantage of a national portal is that the organisation does not need to use its own resources to develop and maintain a portal. This also makes it easy for different users to find the datasets. Opendata.fi is based on open source code, and it is available in three languages.
- A free publishing platform for open datasets.
- International visibility. Data.europa.eu service harvests the opendata.fi service, which means that all data uploaded to the service can also be found in the European open data portal.
- Statistics. Opendata.fi provides various statistics, including on the use of the service and the popularity of organisations and datasets.
- Support materials for opening and publishing data
Opendata.fi can also be used through an API. Read more about using opendata.fi through an API (in Finnish).
Publishing in Paikkatietohakemisto and on Paikkatietoikkuna.fi
Paikkatietohakemisto is a national metadata service maintained by the National Land Survey of Finland, in which data producers compile, publish and update metadata. Read more about publishing metadata in Paikkatietohakemisto (in Finnish).
Services are a key part of compliance with the INSPIRE Directive in Finland (in Finnish). The INSPIRE Directive obliges the authorities that manage or maintain spatial data sets within the scope of the directive. Read more about obligations under the INSIPIRE Directive on Land Survey Finland’s website (in Finnish).
Publishing on HRI service
The open data of the cities in the Helsinki Metropolitan Area (Helsinki, Espoo, Vantaa and Kauniainen together with their joint municipal authorities) is published on hri.fi, from where the metadata for the datasets is harvested automatically to opendata.fi.
The opened datasets and major updates should be communicated about as extensively as possible. If possible, it is advisable to communicate about such procedures as interface version updates in advance. In addition to the organisation's website and the intranet, suitable communication channels include newsletters, social media channels and various events. Information can be spread by several parties: in the Helsinki Metropolitan Area, for example, the party opening the data and the HRI service communicate about its opening.