File:Wikipedias' article depth vs number of articles.png

Summary

Category:AI-generated images including prompts
Description
English: See m:User:Tim Ocean for info on this metric.
Prompt
InfoField
"create matplotlib python code to create a chart for the following data which is in the following format:

lang (this has the label for the dot)
Number of articles (x axis)
Words per article
Depth* (y axis)
(space)
here is the data:

(the data as below)
"
Date
Source

Own work

Raw data

English 7085151 709.86 1336

Cebuano 6115898 216.86 209

German 3065898 551.76 742

French 2718357 675.29 1145

Swedish 2619342 186.88 232

Dutch 2201228 247.84 281

Spanish 2071945 634.49 1016

Russian 2069963 559.71 892

Italian 1942793 546.34 889

Polish 1673719 323.17 394

Ukrainian 1396184 395.25 603

Vietnamese 1296630 277.79 535

Portuguese 1159159 538.26 918

Catalan 783388 520.06 662

Finnish 606606 297.57 384

Czech 579646 486.58 660

Hungarian 562443 432.84 593

Serbo-Croatian 461167 264.15 503

Esperanto 377333 245.25 288

Lithuanian 223935 225.02 285

Latin 140669 154.71 169

Ido 59986 131.23 82

VolapĂĽk 45855 71.86 109

Scots 34282 225.50 359

Interlingua 30146 110.45 80

Kotava 29896 91.94 34

Interlingue 13358 301.30 155

Sardinian 7728 306.27 362

Kashubian 5495 80.62 65

Lingua Franca Nova 4490 357.24 283

Pennsylvania German 2039 75.91 106

Novial 1877 82.86 107

Tetum 1380 195.23 269

Lojban 1348 344.33 559

Gothic 976 101.75 162

Dinka 338 268.36 397

Cree 14 162.29 341

Author GPT-5 prompted by me using the data at Tim Ocean and slightly modified python code
Other versions
File:Depth star vs articles.png

PNG development

InfoField
 This plot was created with Matplotlib.
Category:PNG created with Matplotlib code#Wikipedias'%20article%20depth%20vs%20number%20of%20articles.png

Source code

Python code

import matplotlib.pyplot as plt
import numpy as np

# Data (lang, num_articles, words_per_article, depth)
data = [
    ("English",7085151,709.86,1336),
    ("Cebuano",6115898,216.86,209),
    ("German",3065898,551.76,742),
    ("French",2718357,675.29,1145),
    ("Swedish",2619342,186.88,232),
    ("Dutch",2201228,247.84,281),
    ("Spanish",2071945,634.49,1016),
    ("Russian",2069963,559.71,892),
    ("Italian",1942793,546.34,889),
    ("Polish",1673719,323.17,394),
    ("Ukrainian",1396184,395.25,603),
    ("Vietnamese",1296630,277.79,535),
    ("Portuguese",1159159,538.26,918),
    ("Catalan",783388,520.06,662),
    ("Finnish",606606,297.57,384),
    ("Czech",579646,486.58,660),
    ("Hungarian",562443,432.84,593),
    ("Serbo-Croatian",461167,264.15,503),
    ("Esperanto",377333,245.25,288),
    ("Lithuanian",223935,225.02,285),
    ("Latin",140669,154.71,169),
    ("Ido",59986,131.23,82),
    ("Volapük",45855,71.86,109),
    ("Scots",34282,225.50,359),
    ("Interlingua",30146,110.45,80),
    ("Kotava",29896,91.94,34),
    ("Interlingue",13358,301.30,155),
    ("Sardinian",7728,306.27,362),
    ("Kashubian",5495,80.62,65),
    ("Lingua Franca Nova",4490,357.24,283),
    ("Pennsylvania German",2039,75.91,106),
    ("Novial",1877,82.86,107),
    ("Tetum",1380,195.23,269),
    ("Lojban",1348,344.33,559),
    ("Gothic",976,101.75,162),
    ("Dinka",338,268.36,397),
    ("Cree",14,162.29,341),
]

langs, num_articles, words_per_article, depth = zip(*data)
num_articles = np.array(num_articles)
words_per_article = np.array(words_per_article)
depth = np.array(depth)

# Scale marker sizes (so they're visible but not huge)
area = (words_per_article - words_per_article.min()) / (np.ptp(words_per_article) + 1e-9) * 800 + 50

plt.figure(figsize=(12,8))
scatter = plt.scatter(num_articles, depth, s=area, c=words_per_article, cmap='viridis', alpha=0.85, edgecolor='k', linewidth=0.4)

# Use linear x-axis (remove log scale)
# plt.xscale('log')  # removed
plt.xlabel("Number of articles")
plt.ylabel("Depth*")
plt.title("Languages: Number of articles vs Depth (marker size = words per article, color = words per article)")

# Optionally set x limits for better visualization (uncomment and adjust if desired)
# plt.xlim(0, num_articles.max()*1.05)

cbar = plt.colorbar(scatter)
cbar.set_label("Words per article")

# Annotate a subset to avoid overcrowding: annotate top 25 by number of articles
idxs = np.argsort(-num_articles)[:25]
for i in idxs:
    plt.annotate(langs[i], (num_articles[i], depth[i]),
                 xytext=(5,3), textcoords='offset points', fontsize=9)

plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()

Licensing

I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution
This file is licensed under the Creative Commons Attribution 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
Category:Creative Commons Attribution missing SDC copyright status Category:CC-BY-4.0#Wikipedias'%20article%20depth%20vs%20number%20of%20articles.png Category:Creative Commons Attribution 4.0 missing SDC copyright licenseCategory:Self-published workCategory:Self-published work missing SDC copyright license
Public domain
This file is in the public domain because it is the work of a computer algorithm or artificial intelligence and does not contain sufficient human authorship to support a copyright claim.

The United Kingdom (legislation) and Hong Kong (legislation) provide a limited term of copyright protection for computer-generated works of 50 years from creation.
AI derivative works Legal disclaimer
Most image-generating AI models were trained using works that are protected by copyright. In some cases, such models can output content with major copyrightable image elements which are identical to or derivative of the original training data, making these outputs derivative works. Accordingly, there is a risk that AI-generated media uploaded on Commons may violate the rights of the authors of the original works. See Commons:AI-generated media for additional details.

العربية  azərbaycanca  Deutsch  English  español  فارسی  français  galego  हिन्दी  Bahasa Indonesia  日本語  한국어  မြန်မာဘာသာ  português do Brasil  русский  slovenščina  Türkçe  Tiếng Việt  中文  中文(简体)  中文(繁體)  +/−

Category:PD-algorithm#Wikipedias'%20article%20depth%20vs%20number%20of%20articles.png Category:Wikipedia article depth collaborativeness statistics Category:English-language charts Category:AI-generated charts with matplotlib code
Category:AI-generated charts with matplotlib code Category:AI-generated images including prompts Category:CC-BY-4.0 Category:Creative Commons Attribution 4.0 missing SDC copyright license Category:Creative Commons Attribution missing SDC copyright status Category:English-language charts Category:PD-algorithm Category:PNG created with Matplotlib code Category:Self-published work Category:Self-published work missing SDC copyright license Category:Wikipedia article depth collaborativeness statistics