Finna training corpora


Dataset contains TF-IDF data matrices targeted for machine learning use. Matrices are generated from document corpora based on metadata that has been extracted from the Finna.fi service in 2019 via its open API. There are corpora in Finnish, Swedish and English.

Resources (6)

Additional Info

Field Value
Keywords
Dataset visibility
Outdated No
More about the license

Koulutusmatriisit on tuottanut CSC - Tieteen tietotekniikan keskus Oy. Alkuperäisen datan on kerännyt Kansalliskirjasto.

Geographical coverage
Update frequency
Valid from
Valid until
Links to additional information
  1. https://github.com/NatLibFi/Annif-corpora/tree/master/training/2019
Collection type Open data
International benchmarks
State Active
Dataset maintainer Analytiikkaryhmä
Maintainer email analytics@csc.fi
Maintainer website
comments powered by Disqus