File:Readability-model-results-barplot.png
Summary
| Description |
English: Summary of results for the supervised model trained on 70% SEW data and tested on 30% SEW data (held-out), VW, and Klexikon. We have three types of model for each dataset, one trained on all 7 language-agnostic features, the second on readability metrics (only FRE to remain constant across all languages) not customized for the language, and finally one on customized FRE. For each feature set, we train three types of supervised ML models — Logistic Regression (LR), Support Vector Machines (SVM, with a linear kernel), and Random Forests (RF). In the VW datasets, for German, Russian, and Spanish, the language agnostic features perform best, especially the RF models. For English, in SEW the readability features are the best. Notably, for languages other than French, the language-agnostic features always outperform the non-customized readability features. |
| Date | |
| Source | Own work |
| Author | MGerlach (WMF) |
Licensing
I, the copyright holder of this work, hereby publish it under the following license:
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.