Data matrix (test) for training XMTC machine learning models (TF-IDF) with TNPP lemmatisation

Maturity: Current version

URL: https://www.avoindata.fi/data/dataset/ce9fdd58-e128-4755-8155-3709f68cc6d7/resource/40f1a5d0-7898-44f0-ae0d-e323fb1ab885/download/kirjastonhoitaja-fi-tnpp.sparse

Data matrix for training XMTC machine learning models (TF-IDF) with TNPP lemmatisation. Contains only the test subset of the corpus. Textual data follows the Bag-of-Words feature file format of The Extreme Classification Repositoryn (http://manikvarma.org/downloads/XC/XMLRepository.html).

The first line is formatted as:

total_documents number_of_features number_of_labels

All other lines represent one document per line:

label1,label2,...,labelk ft1:ft1_val ft2:ft2_val ft3:ft3_val .. ftd:ftd_val

i.e, comma-separated list of labels followed by all non-zero components of the TF-IDF vector given as component_number:value.

There are no views created for this resource yet.

Extra information

Last updated
February 24, 2021
Created
February 24, 2021
Format
TXT
License
Creative Commons CCZero 1.0

Technical extra information

Name
Datamatriisi (test) XMTC-koneoppimismallien koulutukseen (TF-IDF) TNPP-lemmatisointiin perustuen
Size
658574
Data status
Current version
Coordinate system
upload
SHA256
61aab49b5a5ad2d51388815c656f37a6b54e5c6d7b0dc5cde18b70121c0ff1de