Finna training corpora

Dataset contains TF-IDF data matrices targeted for machine learning use. Matrices are generated from document corpora based on metadata that has been extracted from the service in 2019 via its open API. There are corpora in Finnish, Swedish and English.


Additional Info

Collection Open Data
Maintainer CSC – IT Center For Science Ltd.
Maintainer email
Links to additional information
Update frequency
Last modified 26.02.2021
Show change log
Created on 24.02.2021