File:Wikipedias' article depth vs number of articles.png
Summary
| Description | |||
| Prompt | "create matplotlib python code to create a chart for the following data which is in the following format: lang (this has the label for the dot) " |
||
| Date | |||
| Source |
Own work
|
||
| Author | GPT-5 prompted by me using the data at Tim Ocean and slightly modified python code | ||
| Other versions |
|
PNG development
Source code
Python code
import matplotlib.pyplot as plt
import numpy as np
# Data (lang, num_articles, words_per_article, depth)
data = [
("English",7085151,709.86,1336),
("Cebuano",6115898,216.86,209),
("German",3065898,551.76,742),
("French",2718357,675.29,1145),
("Swedish",2619342,186.88,232),
("Dutch",2201228,247.84,281),
("Spanish",2071945,634.49,1016),
("Russian",2069963,559.71,892),
("Italian",1942793,546.34,889),
("Polish",1673719,323.17,394),
("Ukrainian",1396184,395.25,603),
("Vietnamese",1296630,277.79,535),
("Portuguese",1159159,538.26,918),
("Catalan",783388,520.06,662),
("Finnish",606606,297.57,384),
("Czech",579646,486.58,660),
("Hungarian",562443,432.84,593),
("Serbo-Croatian",461167,264.15,503),
("Esperanto",377333,245.25,288),
("Lithuanian",223935,225.02,285),
("Latin",140669,154.71,169),
("Ido",59986,131.23,82),
("Volapük",45855,71.86,109),
("Scots",34282,225.50,359),
("Interlingua",30146,110.45,80),
("Kotava",29896,91.94,34),
("Interlingue",13358,301.30,155),
("Sardinian",7728,306.27,362),
("Kashubian",5495,80.62,65),
("Lingua Franca Nova",4490,357.24,283),
("Pennsylvania German",2039,75.91,106),
("Novial",1877,82.86,107),
("Tetum",1380,195.23,269),
("Lojban",1348,344.33,559),
("Gothic",976,101.75,162),
("Dinka",338,268.36,397),
("Cree",14,162.29,341),
]
langs, num_articles, words_per_article, depth = zip(*data)
num_articles = np.array(num_articles)
words_per_article = np.array(words_per_article)
depth = np.array(depth)
# Scale marker sizes (so they're visible but not huge)
area = (words_per_article - words_per_article.min()) / (np.ptp(words_per_article) + 1e-9) * 800 + 50
plt.figure(figsize=(12,8))
scatter = plt.scatter(num_articles, depth, s=area, c=words_per_article, cmap='viridis', alpha=0.85, edgecolor='k', linewidth=0.4)
# Use linear x-axis (remove log scale)
# plt.xscale('log') # removed
plt.xlabel("Number of articles")
plt.ylabel("Depth*")
plt.title("Languages: Number of articles vs Depth (marker size = words per article, color = words per article)")
# Optionally set x limits for better visualization (uncomment and adjust if desired)
# plt.xlim(0, num_articles.max()*1.05)
cbar = plt.colorbar(scatter)
cbar.set_label("Words per article")
# Annotate a subset to avoid overcrowding: annotate top 25 by number of articles
idxs = np.argsort(-num_articles)[:25]
for i in idxs:
plt.annotate(langs[i], (num_articles[i], depth[i]),
xytext=(5,3), textcoords='offset points', fontsize=9)
plt.grid(axis='y', alpha=0.3)
plt.tight_layout()
plt.show()
Licensing
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
This file is in the public domain because it is the work of a computer algorithm or artificial intelligence and does not contain sufficient human authorship to support a copyright claim.
The United Kingdom (legislation) and Hong Kong (legislation) provide a limited term of copyright protection for computer-generated works of 50 years from creation. |
| Legal disclaimer Most image-generating AI models were trained using works that are protected by copyright. In some cases, such models can output content with major copyrightable image elements which are identical to or derivative of the original training data, making these outputs derivative works. Accordingly, there is a risk that AI-generated media uploaded on Commons may violate the rights of the authors of the original works. See Commons:AI-generated media for additional details. |