CSC – IT Center For Science Ltd.

Filters
korpus

3 datasets found

    Dataset contains TF-IDF data matrices targeted for machine learning use. Matrices are generated from document corpora based on 7400 Master's and doctoral theses published in the years 2010 to 2017, collected from the...

    Enterprise
    TXT

    Dataset contains TF-IDF data matrices targeted for machine learning use. Matrices are generated from document corpora based on metadata that has been extracted from the Finna.fi service in 2019 via its open API. There...

    Enterprise
    TXT

    Dataset contains TF-IDF data matrices generated from "Ask a librarian" question/answer corpus and targeted for machine learning use. Corpus is in Finnish. Data matrices are especially suitable for training Extreme...

    Enterprise
    TXT