Commons:Timed Text
Media community: Audio and video requests · Featured media (candidates) · Media help · Media of the Day · Timed Text · Video info · Video2commons–Upload · Video cut tool
TimedText is a custom Wikimedia Commons namespace to hold closed captioning text, or subtitles, to be associated with other media, such as audio or video files. This page intends to explain the feature’s concept and use.
Closed captioning (CC) and subtitling are both processes of displaying text on a television, video screen, or other visual display to provide additional or interpretive information. Both are typically used as a transcription of the audio portion of a program as it occurs (either verbatim or in edited form), sometimes including descriptions of non-speech elements. This aids hearing-impaired and deaf people and provides a way for non-native language speakers to understand the content in a multimedia file.
Viewing Timed Texts
Thumbnails of videos and audio clips that have closed captioning available will show the CC icon overlayed, even if the Timed Text does not contain actual Closed Captions.
After opening the player, subtitles in your language are automatically enabled.
You can find the
icon in the controls of the player to switch between languages, toggling subtitles on and off, or to change the formatting of the subtitles.
Timed Text can be used for any media that is presented in a time sequence:
- Audio file
- Silent video
- Spoken video
- Animation demonstrating a concept or how something works
Actual examples
- Commons:Timed Text Demo Page a page highlighting a few timed-text examples.
- TimedText:Krazy Kat Bugologist 1916 silent.ogv.de.srt, German captions
- TimedText:Krazy Kat Bugologist 1916 silent.ogv.en.srt, English captions
- TimedText:Wikipedia Edit 2014.webm.pl.srt, Polish captions
Discovering files having Timed Texts
TimedText: prefix, add the text after it, e.g. TimedText:Elephants_Dream.ogv).TimedText:Elephants_Dream.ogv.en.srt) to create a TimedText page - see Commons:Timed Text-
{{Allpages|102}}is rendered as and lists all pages in namespace 102.
Commons needs a means to find Timed Text files for specific languages; the following suffer from the Search function's limitations (such as: it does not show all matches; it includes non-matches; it needs regular expression support).
Search including some Timed Text .en.srt files in different languages:
English • German • French • Portuguese • Russian • Swedish • Ukrainian • Polish • Indonesian
Other methods to help users find Timed Texts:
-
{{Closed captions}}displays links to all the closed captioning files available for a file, can be placed on a media page and its talk page. -
{{special|Prefixindex/TimedText:{{PAGENAME}}.|stripprefix|1|subtitles}}yields a link to all related Timed Text files (example). - Commons:Timed Text/search by lang displays search links for all Timed Text files for a given language, useful for Commons pages, Categories and Talk pages.
Flagging and locating media needing subtitles
The {{Captions requested}} template can be used to flag media needing captions. The template adds the file to the Media needing subtitles category, so one can see for which media users or authors have requested transcripts.
This template and category is in the scope of Commons:WikiProject Deaf and its sisters meta:Deaf Wikimedians and Wikipedia:WikiProject Deaf.
General Timed Text creation instructions
There are comprehensive guides explaining and recommending certain practices in minute detail. This subsection is meant to give you just enough guidance to get you going.
TimedText page creation
Via file description page

This is the recommended method. Use the link at the top of any suitable multimedia file on Commons. A dialog lets you select the timed text content language; you do not need to look up that yourself.
Directly in the media player

By using the CC button in the toolbar of the Wikimedia HTML5 media player, you can select subtitles if they are available, or open the Subtitles editor to create subtitles for the video.
Creating a blank page
This is for for advanced users.
You can always directly create the page in Commons using the template TimedText:[Common_File_Name.extension].[language].srt, where [Common_File_Name.extension] is the name of the file, and [language] is the ISO 639 code for the language.
Using the Subtitler tool
Use the tool Subtitler to add subtitles to a video.
Input
As of 2025, Commons supports only the SRT subtitle format for timed texts. Because of its simplicity it is virtually supported by all kinds of software.
Data format
SRT subtitles are a series of numbered cards associating displayed text with a playback time window.
1
00:00:20,000 --> 00:00:24,400
Subtitle card.During playback the text Subtitle card. is displayed starting at 20 seconds and ending at 24.4 seconds (inclusive) into the media file.
Note the use of a comma instead of a period to separate seconds from milliseconds. Therefore to avoid simple syntax mistakes (e. g. writing a period out of habit) and because it becomes too tedious at some point, for creation of subtitles – i. e. if you do not want to merely fix a spelling mistake – it is strongly recommended to use proper editing software. There are also in‐browser editing sites if you do not want or cannot install any software.
SRT subtitle cards are separated by (at least) one blank line. Blank lines are lines that do not contain any characters, including space.
1
00:00:20,000 --> 00:00:21,500
Words more words.
2
00:00:21,500 --> 00:00:24,400
More.Markup
Unfortunately, MediaWiki markup is not supported. The SRT subtitle format recognizes only a small set of markup:
- Bold –
<b>Bold</b> - Italic –
<i>Italic</i> - Underlined –
<u>Underlined</u>
The casing of the tag names (<i> vs. <I>) does not matter.
Contents
As of 2025, the kind of timed text is decided on a case‐by‐case basis. You could for example transcribe music, show the score of a football match, or add citations. In practice all timed texts contain at least the dialog of the primary audio track, but may contain additional information:
- absence of a transcript because they were
[unintelligible]to the transcriber(s) - sound cues
- important sound cues, such as
[derisive snort] - unimportant sound cues, such as sounds that have been accidentally picked up by the mic, e. g. a narrator’s
[page flip](of the manuscript)
- important sound cues, such as
- annotations
- articulation description
- emphasized words (
Do <i>not</i> do that.) - unusual loudness (whispering, soft, screaming)
- singing
♪ Subtitles are so, so beautiful ♪
- emphasized words (
- speaker attribution, esp. for off‐screen speech
- conversion of imperial into metric units (or vice versa for American English subtitles)
- corrections
- correction of factual mistakes
- correction of significant wording mistakes
- if a pun (or similar) in the original language could not be emulated in the target language, a hint that there was a pun, maybe mentionining the involved words
- articulation description
With accessibility in mind, SDH are preferred. However, genuine SDH may amend dialogs to fit additional descriptive texts while also observing a reasonable CPS limit. On Commons we do not do that because there is (as of 2025) no way to document your decisions for other collaborators. Hence, the real sound is the authorative version as all people able of hearing can verify the timed text’s correctness.
At the end of media files, professional subtitles credit the subtitle authors, translators, editors and so on; this is not done on Commons because the revision history already credits them.
Extraction
Create Subtitles from DVD
To copy existing subtitles from a DVD you can use software such as SubRip. You can then copy-paste them in the wiki Commons subtitle page.
Create Subtitles with YouTube
YouTube allows users with a YouTube account to create subtitles out of any uploaded file. Keep in mind the speech recognition is automated and produces unexpected results. It is preferable to upload a transcript of the file to YouTube. This will provide a much better result. You can then copy-paste them in the wiki Commons subtitle page.
Steps to create the subtitles (a video tutorial of the steps can be found here):
- Upload the file. (The multimedia file must also include a video track but you are free to choose a blank one or any other)
- While uploading set the Video language for your file to the appropriate language under the "Show more" menu.
- Or, after uploading, select "Subtitles" in the specific videos Details or in the YouTube Studio navigation.
- Click on "Add" or "Add language".
- You can add subtitles in one of three ways:
- Upload a transcript in the proper format.
- Copy and paste the transcript.
- Type manually while watching the video.
- The captions are then integrated into the video.
- Download the .sbv file from the Subtitles menu under the three dot menu while in the "Edit Timings" view.
- Convert the contents of the .sbv file into .srt file. There are various online tools to help with this step.
- ffmpeg is one open-source option (directions).
- Upload the .srt file to the corresponding page of the video on Wikimedia Commons.
Downloading subtitles from YouTube
You can download subtitles from video on YouTube (and probably several other video websites) like so:
- Install yt-dlp
- Run
yt-dlp --list-subs url(replace url with the YouTube url) - Run e. g.
yt-dlp --write-subs en --sub-format vtt url(replace url with the YouTube URL) - Maybe srt subtitles are available too so you should use that instead of vtt or you can download all at once
- Convert the vtt subtitles (or the format you have) to srt subtitles using a tool like FFmpeg (see: #Convert YouTube Subtitles to Timed Text format) or web UI like this
- You can then paste these into the TimedText page of the video on WMC
If you use the tool video2commons one can check “Import subtitles” but that does not work for vtt subtitles (phab:T368298) so for these videos you also need to do the above steps for importing subtitles.
Converting scrolling captions to block captions
YouTube auto generated subtitles are scrolling captions. I wrote a program that converts these to block captions so they can be put on Commons. First, download the video with yt-dlp --write-auto-subs url (replace url with the url, well, duh). Then, use option 3. It should work okay but it has a habit of putting "word. word," at the end of a block, which is just so wrong because a full stop should be a good time to end a block. But the code is really long and I think I would have lots of trouble fixing it now.
Machine transcription
You can use the open source tool SoniTranslate to more easily and quickly generate machine transcribed subtitles. It would be good if you check these, especially if you also use the tool for machine translation into other languages. For example it may output years as long texts instead of numbers or get people's names wrong. How to use this tool is described in Help:AI video dubbing.[1] If there are no existing subtitles to import, this is likely the fastest way to add TimedTexts. Transcription usually only takes only a few seconds even if you don't have a GPU, depending on how long the video is.
The timings are made so that they are well-suited for getting used for dubbing videos into other languages which often is not the case for manually-made subtitles. You can edit the subtitles, then save as srt file and use that as input to the tool to let it create an audio or subtitle in another language.
Creating subtitles with whisper.cpp
as of 2024[update], the Whisper AI models [1] are the most advanced speech transcription models available and can be run locally, either using Python or whisper.cpp. Unlike the earlier Vosk models, they will also produce punctuation, bringing their output much closer to a high-quality human transcription. All the same, you should check AI-generated subtitles against the video and correct mistakes, add punctuation, check correct spelling of people and place names, check facts and figures, etc. AI subtitles are very useful as a first draft, but often also contain some silly mistakes a human transcriber would not have made.
An advantage of whisper.cpp is that it is particularly optimized for running on the CPU rather than the GPU (so it is especially useful if you have an AMD graphics card and therefore no CUDA). But CUDA and Metal (on a Mac) are also supported, therefore it can easily adapt to different hardware configurations. Another advantage is that it does not require installing any external dependencies, i.e. no Python or PyTorch, since it is written in C++, making it a much smaller download than a Python machine learning environment.
Some video editing and closed captioning GUI software now features built-in Whisper functionality: Open source examples include the video editor Kdenlive (since version 23.04; requires Python) and Subtitle Edit (either Python or C++ can be used to run Whisper models).
But running the command-line version of whisper.cpp directly to create an SRT file is not too difficult either, provided your operating system has a C compiler, make, etc. to compile it with:
First, use e.g. ffmpeg to extract a video's audio track and convert it to 16 kHz sample rate:
ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
Next, compile whisper.cpp and download a model (the base model optimized for English content is about 140 MB; "medium" can also handle other languages and is about 1.5 GB) and then start the conversion with e.g.:
./main -m models/ggml-base.en.bin -f audio.wav -t 8 -pc -osrt
This will use 8 CPU cores and create an SRT file called audio.wav.srt in the same directory.
During recognition, words will be color-coded by confidence (green = very certain, red = very uncertain), so you can quickly see if the model is having trouble.
If a smaller model delivers unusable output, you can try a larger model, e.g. medium, which will be slower but produce better results.
You can also translate from other languages, e.g. adding "-l fr -tr" to the options will translate French audio to English.
Convert YouTube Subtitles to Timed Text format
SBV Subtitles
If you export the SBV format from YouTube subtitles you can use ffmpeg to convert the subtitle file to the SRT (SubRip) format used by Commons. This feature also solves the overlap issue that is common when converting YouTube subtitles to Commons.
ffmpeg -fix_sub_duration -i ⟨input⟩.sbv ⟨output⟩.srt
XML Subtitles

This section describes how to convert XML YouTube subtitles to SubRip (srt) format, that is TimedText subtitles format used in Wikimedia Commons.
If
- the YouTube video has subtitles in some language (e.g. I created this YouTube video with subtitles in English, in Russian and in Livvi-Karelian languages),
- this video was uploaded to Wikimedia Commons (e.g. this file),
- you want to copy YouTube subtitles to the same video at Commons.
Then:
- Download the subtitle in XML, put the ID of the YouTube video at the end of the URL: http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__
- Install Ruby.
- Download a Ruby program to convert video subtitles from YouTube's XML format to the SubRip format.
- Run this program and convert XML file to .SRT file.
- Copy and paste the contents of the .SRT file into the corresponding page of the video on Wikimedia Commons.
Transitions
Subtitle cards may not overlap at any point in time. For the transition between subtitles there are two different styles:
- Subtitles for continuous speech are immediately next to each other. The end time of the earlier subtitle equals the start time of the subsequent subtitle. This can help keeping the CPS at low values.
- There is always a constant gap (e. g. ⅒ of a second) between any two subtitles of continuous speech giving the impression of a “flashing” effect and thus drawing the viewer’s attention to the subtitles. Sometimes speakers repeat one and the same phrase, the “flashing” can underscore this repetition.
Commons does not prescribe which style you follow, but you must be consistent. You may not switch between styles within the same timed text file. Both styles are in use for professionally created subtitles.
Lead and Tail
As to not “reveal” information before it is “disclosed” in speech, subtitles should not start before the respective speaker commences speech. The modal verb should means you may deviate from this rule. In general, you may add a more generous lead to educational contents (e. g. narration in documentaries) as are predominantely found on Commons. Some people are slow readers or just appreciate being granted more time when learning about stuff.
Similarly, subtitles should disappear shortly after the utterance concluded. Prefer increasing the tail time over increasing the lead time if the CPS value is high.
On occassions you may have a “negative tail”: In elongated speech – in particular singing people stretching speech, e. g. as in operas – it does not make sense to keep the subtitle on screen until the speaker finally finished uttering the last syllable.
Videos
Subtitling videos requires taking account of the picture: Considering a standard video frame rate of 25 FPS it is highly frowned upon to show or hide subtitles within 1 – 4 frames of a w:hard cut. Subtitles must appear or disappear either
- exactly on time as the hard cut happens, or
- significantly sooner or later (usually more than 4 frames at 25 FPS) before or after a hard cut.
As of 2025, the most common form of displaying subtitles is the overlay method. Subtitles are rendered over (on) the video picture. This, however, may hide information such as the name and professional title of a person interviewed. Unfortunately, as of 2025 the accepted subtitle data formats do not offer a way to ensure such information is not covered by subtitles, in particular a “show this subtitle card at top edge” command is not available.
With accessibility in mind, subtitles for videos can – beside the a. m. contents – include cards highlighting the absence of sounds.
For example the picture shows two people arguing orally (= mouth movement) but it cannot be heard what they are saying.
This discrepancy can be clarified with [no audible dialog] or similar.
Furthermore, you may want to consider adding annotating subtitle cards about on‐screen symbols, such as
- translations, e. g. of banners or signs,
- transcriptions of texts written in a foreign script, or
- for localized subtitles, explanation of relevant symbols that are virtually unknown in the target locality.
File name
In order to associate timed texts with their media files, the beginning of the timed text’s file name has to match the respective media file’s name.
That means all timed texts for File:some.video have TimedText:some.video as their prefix.
What comes after that is up to you, yet to provide a reasonable user experience it is customary to use a suffix indicating the principal natural language and file format, e. g. .[[:Template:TRANSLATIONLANGUAGE]].srt for template:translationlanguage‐language subtitles.
Nonlingual
The “language” code zxx indicates non‐lingual content, for example a timed text showing a real time clock of surveillance footage.
Multilingual
For polylingual media there are multiple options:
- include all speech but leave untranslated
*.mul.srtfile (mul= multiple languages code)
provides kind of same experience as hearing people have
description of audio cues is unmanageable
- omit untranslated, yet still indicate:
[speaks Asian]
*.[[:Template:TRANSLATIONLANGUAGE]].srtfile
provides comparable experience as hearing people have
unsatisfactory especially if prolonged
- tag and translate:
[speaks German] I doubt it.
*.[[:Template:TRANSLATIONLANGUAGE]].srtfile
even for polyglots switching between multiple languages can be difficult
subtitles do not become too heavy (as in the bilingual option below)
- bilingual subtitles (include both the original and translated version), spread across two lines
no convention, but this option offers the option to #REDIRECT
readers proficient in the other language can read the original
because you cannot spread text across two lines you need more frequent cuts (more subtitle cards)
the “reading speed” can differ a lot between the languages, thus for one language the speed is too generous, for the other too fast
Again, use your best judgment, but stay consistent with your choice.
Signing
For signed speech there are multiple options, but de facto the last one is virtually always used.
- subtitle only spoken speech (and possibly describe sound cues) and name the timed text according to the primarily spoken language – this makes sense if there is a sign language interpreter purely for accessibility, yet the interpretation deviates (e. g. because of time pressure)
- indicate signed glosses – however
- some notations require precise control of formatting that is, as of 2025, not possible
- there is no ISO 639‑3 code for notations, e. g.
gsgdenotes German sign language yet it does not imply any specific notation
- translate signed speech to the closest corresponding orally spoken language, e. g. ASL → (American) English
Accordingly the timed text is named with a .en.srt suffix.
Multimedia
No naming scheme has been established for media files containing multiple streams differing in contents. By convention, without any extra indication in the file name, users expect timed texts to be suitable for the primary video track and primary audio track.
Of course this is not much of an issue if there is no intent to ever supply timed texts; e. g. there is a separate M & E soundtrack, which is actually meant to facilitate creation of dubbed versions, not to be listened to on its own.
Internationalization
Steps
Preferably after the subtitles have been transcribed in the original language of the video onto a Timed Text file, they can be translated into other languages as follows:
- Open the Timed Text file in the original language, say English for example TimedText:Elephants Dream.ogv.en.srt, in edit mode and copy the whole of the page.
- In the address bar replace
enwith the language code of your choice, say[[:Template:TRANSLATIONLANGUAGE]], then paste the original text in the new page. - View the original video, then translate the text into your language.
- After saving the new page, the video with the subtitles should load onto the page; you can view it to check the timing of the subtitles.
- Add a category link to the talk page [[Category: Timed Text in Language Name|Language Name]]. For example, see TimedText talk:Elephants Dream.ogv.fr.srt.
Finding media that need subtitles translation
One way to find such videos, is to open one of the subcategories of Category: Files with closed captioning depending on the preferred starting language, and then to use Help: FastCCI (on the top right of the page) to include only the videos that do not have subtitles for your preferred target language.
- To find videos with subtitles in English to translate them, go to Category: Files with closed captioning in English.
- Then, click on the FastCCI arrow to open the sub-menu and select “In this category but not in …”
- In the textbox, enter the corresponding category depending of your preferred target language:
- For German, enter
Files with closed captioning in German - For French, enter
Files with closed captioning in French - For Russian, enter
Files with closed captioning in Russian
- For German, enter
etc.
Maintenance tasks
- Patrol changes in the TimedText namespace: Recent changes
- Find Orphaned TimedText pages that have no associated media file (any longer).
Timed Text talk
The namespace is for discussing the respective Timed Text pages, but it could also be used to link and categorize the Timed Text page.
Linking
This section needs expansion.
How to associate closed captions with multimedia files?
- Redirect to avoid duplicated content, for example TimedText:Elephants Dream (high quality).ogv.pt.srt redirects to the existing TimedText:Elephants Dream.ogv.pt.srt. This ensures the closed captions template displays the correct file name of the caption files (this could be important with movie clips).
- {{Closed captions}}'s parameter is an alternative
- more support is needed for the Timed Text function;
- Categorizing: Not possible to categorize the Timed Text page itself, but the Timed Text Talk can be.
A possible categorization scheme is:
[[:Category:File formats]] + [[:Category:Media types]]
|
[[:Category:Timed Text]] + [[:Category:Legend in German]]
|
[[:Category:Timed Text in German]]
+ [[:Category:Legend in French]]
|
[[:Category:Timed Text in French]]
+ [[:Category:Legend in English]]
|
[[:Category:Timed Text in English]]
Related categories: Category:Files with closed captioning
See also
- {{Captions requested}}
- SubRip
- [[Special:MyLanguage/Help:Namespaces|Help:Namespaces]] lists all Commons namespaces
- Category:Video, base category for media about video
- Category:Videos, base category for video files
External sites
- National Captioning Institute (in English)
- The W3C Timed Text homepage (in English)
- Captions For Deaf and Hard-of-Hearing Viewers, National Institute on Deafness and Other Communication Disorders (in English)
Wikipedia articles about the topics of Timed Text or subtitles
These are articles about either Q844253: Timed text, or Q204028: subtitle.
- Dansk: Undertekster
- Deutsch: Untertitelung
- Ελληνικά: Υπότιτλοι
- English: Timed Text is also termed subtitles, closed captioning and closed caption text. See also Subtitle (captioning).
- Esperanto: Subtekstoj
- Español: Subtítulo
- Français : sous-titrage
- Interlingua: Subtitulos
- Italiano: Sottotitolo
- 日本語: 字幕
- 한국어: 자막
- Македонски: Толкување
- Nederlands: Ondertiteling
- Norsk bokmål: Undertekster
- Português: Legenda
- Русский: Субтитры
- Slovenščina: podnaslovi
- Svenska: Textning
- Українська: Субтитри
- 粵語: 字幕
- 中文:字幕
- Bahasa Indonesia: Teks Berwaktu
References
- 1 2 AI — Artificial Intelligence