Commons:Timed Text/zh

定時文本(TimedText)是一个自定义的維基媒體共享資源的命名空间,用于容纳隱藏字幕、或翻譯字幕的文本,使其与其他媒体(例如音频或视频文档)相关联。本页旨在解释该功能的概念和使用方法。

隱藏字幕 (Closed captioning, CC)和翻譯字幕都是显示在电视、视频屏幕或其他视觉显示器上的文字的一種進程,以提供额外的或是作為說明的信息。 這两者通常都是作为某个演出,在它出現時的音频部分的转录(可能是逐字记录、也可能是以已编辑好的形式),有时甚至還會包括某些非语音元素的描述。 这有助于听障人士和失聰人士,并为非母语人士提供了理解多媒体文件内容的途径。

使用方法

视频播放器示例

有隱藏字幕的视频和音频片段的缩略图会出現覆蓋的CC图标。 打开播放器后,您所使用语言的翻譯字幕会自动被启用。 您可以在播放器的控件中找到$1图标来切换语言、开关翻譯字幕或更改翻譯字幕的格式。

定时文本可用于任何以时间顺序呈现的媒体:

  • 音频文件
  • 无声视频(默劇)
  • 口语视频
  • 演示一个概念或某物如何运作的动画

实际案例

寻找

查找某個定时文本
Add below the name of the video to search
(do not delete the TimedText: prefix, add the text after it, e.g. TimedText:Elephants_Dream.ogv).
REMINDER : If the TimedText doesn't exist, don't forget to add language & extension, e.g. TimedText:Elephants_Dream.ogv.en.srt) to create a TimedText page - see Commons:Timed Text
  • {{Allpages|102}}會以TimedText來呈現,並列出命名空间102號中的所有页面。

共享資源需要一种針對特定语言查找定时文本文件的方法;下列語言正困擾於搜索功能的限制(例如:它不會显示所有匹配的結果;它包含有不匹配的內容;它需要有正则表达式的支持)。 搜索也包括會去一些不同语言的定时文本的.srt文件:

EnglishGermanFrenchPortugueseRussianSwedishUkrainianPolishIndonesian

其他帮助用户查找定时文本的方法:

  • {{Closed captions}} 會显示出文件所有可用的隱藏字幕文件的链接,可將它放在某个媒体页面及其討論页面上。
  • {{special|Prefixindex/TimedText:{{PAGENAME}}.|stripprefix|1|subtitles}} 會生成一個指向所有相关的定时文本文件的链接(范例)。
  • Commons:Timed Text/search by lang會显示出所指定的语言中所有定时文本文件的搜索链接,這对共享資源页面、分类、和討論页面有用。

标记和查找需要翻譯字幕的视频

The {{Captions requested}} template can be used to flag media needing captions. The template adds the file to the Media needing subtitles category, so one can see for which media users or authors have requested transcripts.

此模板和类别属于Commons:WikiProject Deaf及其姐妹元維基:維基失聰人Wikipedia:WikiProject Deaf的范围內。

General Timed Text creation instructions

There are comprehensive guides explaining and recommending certain practices in minute detail. This subsection is meant to give you just enough guidance to get you going.

TimedText page creation

Via file description page

定时文本链接

This is the recommended method. 選項1:在檔案的共享資源檔案頁面中(推薦使用) A dialog lets you select the timed text content language; you do not need to look up that yourself.

選項2:直接在媒體播放器中

CC-按鈕的位置

使用維基媒體HTML5媒體播放器工具列上的CC按鈕,您可以選擇可用的翻譯字幕,或開啟翻譯字幕編輯器為影片製作翻譯字幕。

選項3:建立一個空白頁(適用於進階使用者)

This is for for advanced users. 您總是可以直接在共享資源中使用模板 TimedText:[Common_File_Name.extension].[language].srt 來建立頁面,其中 [Common_File_Name.extension] 是檔案名稱,而 [language] 則是語言的ISO代碼。

Using the Subtitler tool

Use the tool Subtitler to add subtitles to a video.

Input

As of 2025, Commons supports only the SRT subtitle format for timed texts. Because of its simplicity it is virtually supported by all kinds of software.

Data format

SRT subtitles are a series of numbered cards associating displayed text with a playback time window.

1
00:00:20,000 --> 00:00:24,400
Subtitle card.

During playback the text Subtitle card. is displayed starting at 20 seconds and ending at 24.4 seconds (inclusive) into the media file.

Note the use of a comma instead of a period to separate seconds from milliseconds. Therefore to avoid simple syntax mistakes (e. g. writing a period out of habit) and because it becomes too tedious at some point, for creation of subtitles – i. e. if you do not want to merely fix a spelling mistake – it is strongly recommended to use proper editing software. There are also in‐browser editing sites if you do not want or cannot install any software.

SRT subtitle cards are separated by (at least) one blank line. Blank lines are lines that do not contain any characters, including space.

1
00:00:20,000 --> 00:00:21,500
Words more words.

2
00:00:21,500 --> 00:00:24,400
More.

標記

Unfortunately, MediaWiki markup is not supported. The SRT subtitle format recognizes only a small set of markup:

  • Bold <b>Bold</b>
  • Italic <i>Italic</i>
  • Underlined <u>Underlined</u>

The casing of the tag names (<i> vs. <I>) does not matter.

Contents

As of 2025, the kind of timed text is decided on a case‐by‐case basis. You could for example transcribe music, show the score of a football match, or add citations. In practice all timed texts contain at least the dialog of the primary audio track, but may contain additional information:

  • absence of a transcript because they were [unintelligible] to the transcriber(s)
  • sound cues
    • important sound cues, such as [derisive snort]
    • unimportant sound cues, such as sounds that have been accidentally picked up by the mic, e. g. a narrator’s [page flip] (of the manuscript)
  • annotations
    • articulation description
      • emphasized words (Do <i>not</i> do that.)
      • unusual loudness (whispering, soft, screaming)
      • singing ♪ Subtitles are so, so beautiful ♪
    • speaker attribution, esp. for off‐screen speech
    • conversion of imperial into metric units (or vice versa for American English subtitles)
    • corrections
      • correction of factual mistakes
      • correction of significant wording mistakes
    • if a pun (or similar) in the original language could not be emulated in the target language, a hint that there was a pun, maybe mentionining the involved words

With accessibility in mind, SDH are preferred. However, genuine SDH may amend dialogs to fit additional descriptive texts while also observing a reasonable CPS limit. On Commons we do not do that because there is (as of 2025) no way to document your decisions for other collaborators. Hence, the real sound is the authorative version as all people able of hearing can verify the timed text’s correctness.

At the end of media files, professional subtitles credit the subtitle authors, translators, editors and so on; this is not done on Commons because the revision history already credits them.

Extraction

從 DVD 製作翻譯字幕

要複製DVD現有的字幕,您可以使用SubRip等等軟體。然後,您可以將它們複製貼上wiki共享資源的翻譯字幕頁中。 You can then copy-paste them in the wiki Commons subtitle page.

使用YouTube製作翻譯字幕

YouTube允許那些擁有YouTube帳戶的使用者們從任何上傳的檔案製作翻譯字幕。請記住,語音辨識是自動化的且會產生意想不到的結果。最好是將檔案的文字記錄上傳到YouTube。這會提供更好的結果。然後,您可以將它們複製貼到wiki共享資源的翻譯字幕頁面中。

製作翻譯字幕的步驟(可以在這裡找到步驟的視訊教學)。

  1. 上傳檔案。(多媒體檔案必須同時包含視訊軌,但您可以自由選擇空白的視訊軌或任何其他視訊軌)。
  2. 上傳時,在「顯示更多」功能表下將檔案的視訊語言設定為適當的語言。
  3. 或是,在上傳之後,在特定影片的詳細資訊中或在YouTube Studio的導覽中選擇「翻譯字幕」 。
  4. 按一下「新增」或「新增語言」。
  5. 您可以用三種方式的其中之一來新增翻譯字幕:
    1. 上傳一個適當格式的文字記錄。
    2. 複製並貼上文字記錄。
    3. 觀看影片時手動輸入。
  6. 然後字幕就會被整合到視訊中。
  7. 在「編輯調速」檢視中,從三個點(...)選單下的翻譯字幕選單下載 .sbv 檔案。
  8. 將 .sbv 檔案的內容轉換為 .srt 檔案。有多種線上工具可協助您完成此步驟。
    1. ffmpeg是開放原始碼的一個選項 (指引)。
  9. 上傳.srt檔案至維基媒體共享資源的視訊的對應頁面。

從YouTube下載翻譯字幕

您可以從YouTube(可能還有其他幾個視訊網站)上的視訊下載翻譯字幕,像是這樣:

  1. 安裝yt-dlp
  2. 執行yt-dlp --list-subs url(將url改為youtube的url)
  3. 執行例如yt-dlp --write-subs en --sub-format vtt url(將url改為youtube的url)
  4. 也許srt翻譯字幕也是可用的,所以您應該用它而不是用vtt字幕,或者您可以一次下載所有字幕
  5. 使用FFmpeg(請參閱:#Convert YouTube Subtitles to Timed Text format)或是這個之類的web UI工具,將vtt翻譯字幕(或您所擁有的格式)轉換為srt翻譯字幕。
  6. 然後您可以將這些貼到WMC上視訊的TimedText頁面上

如果您使用video2commons工具,可以勾選「匯入翻譯字幕」,但這對vtt翻譯字幕 (phab:T368298)不起作用,因此對於這些影片,您也需要執行上述匯入翻譯字幕的步驟。

Converting scrolling captions to block captions

YouTube auto generated subtitles are scrolling captions. I wrote a program that converts these to block captions so they can be put on Commons. First, download the video with yt-dlp --write-auto-subs url (replace url with the url, well, duh). Then, use option 3. It should work okay but it has a habit of putting "word. word," at the end of a block, which is just so wrong because a full stop should be a good time to end a block. But the code is really long and I think I would have lots of trouble fixing it now.

機器轉錄

您可以使用開放原始碼工具SoniTranslate更簡單快速地產生機器轉譯字幕。 如果您能檢查一下這些內容,尤其是如果您也使用該工具進行其他語言的機器翻譯,那就更好了。 例如,它可能會將年份輸出為長文本而非數字、或者弄錯人名。 如何使用這個工具在Help:AI video dubbing中有說明。[1] 如果沒有現有的翻譯字幕要匯入,這可能是加入定時文本最快的方法了。 即使您沒有GPU,轉錄也通常只需要幾秒鐘,這取決於視訊有多長。

這些定時製作的字幕非常適合用來為其他語言的視訊配音,而手動製作的字幕通常不會這樣。您可以編輯翻譯字幕,然後將其儲存為 srt 檔案,並將其作為工具的輸入,讓它以其他語言製作音訊或翻譯字幕。

使用whisper.cpp製作翻譯字幕

截至2024年Whisper語音辨識系統AI模型[1]是目前最先進的語音轉錄模型,可使用Python或whisper.cpp在本機執行。 與早期的Vosk模型不同,它們也會產生標點符號,使其輸出更接近高品質的人類轉錄。 同樣地,您應該根據影片檢查AI產生的翻譯字幕,並修正錯誤、加上標點符號、檢查人名和地名的拼字是否正確、檢查事實和數字等。 AI翻譯字幕作為初稿非常有用,但通常也會包含一些人類轉譯者不會犯的愚蠢錯誤。

whisper.cpp的一項優點在於它特別針對在CPU而非GPU上執行進行最佳化 (所以如果您有AMD顯示卡、因此沒有輝達的CUDA,它就特別有用)。 但CUDA和(在Mac上的)Metal也受支援,因此它可以輕鬆適應不同的硬體配置。 另一個優點是它不需要安裝任何外部的相依,也就是不需要Python或PyTorch,因為它是用C++寫成的,造就它的下載量比Python的機器學習環境小很多。

一些視訊編輯和隱藏字幕GUI軟體現在已內建Whisper功能: 開放原始碼的範例包括視訊編輯器Kdenlive (自23.04版起;需要Python) 和 Subtitle Edit (Python或C++都可以用來執行Whisper模型)。

但直接執行命令列版本的whisper.cpp來建立SRT檔案也不是太困難,只要您的作業系統有C語言編譯器、make等工具來編譯它:

首先,使用例如 ffmpeg 擷取視訊的音軌,並將其轉換為 16 kHz 取樣率:

ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav
ffmpeg -i some_video.ogv -ar 16000 -ac 1 -c:a pcm_s16le audio.wav

接下來,編譯whisper.cpp並下載模型 (針對英文內容最佳化的基本模型約為140MB;「中型」也可處理其他語言,約為1.5GB),然後開始轉換,例如:

./main -m models/ggml-base.en.bin -f audio.wav -t 8 -pc -osrt

這將使用8個CPU核心,並在同一目錄中建立一個名為audio.wav.srt的SRT檔案。 在識別過程中,字詞會以顏色標示信心度(綠色=非常確定、紅色=非常不確定),因此您可以快速查看模型是否有問題。 如果較小的模型產生無法使用的輸出,您可以嘗試較大的模型,例如中型,這會較慢,但會產生較好的結果。

您也可以從其他語言翻譯,例如在選項中加入"-l fr -tr"就可以將法文音訊翻譯成英文。

將YouTube的翻譯字幕轉換為定時文本格式

SBV翻譯字幕

如果您從YouTube翻譯字幕匯出SBV格式,您可以使用ffmpeg將翻譯字幕檔案轉換成共享資源所使用的SRT(SubRip)格式。這項功能也解決了將YouTube字幕轉換到共享資源時常出現的重疊問題。

ffmpeg -fix_sub_duration -i ⟨input⟩.sbv ⟨output⟩.srt

XML翻譯字幕

投影片10-12描述了在YouTube Creator Studio製作翻譯字幕、下載SRT格式的 YouTube翻譯字幕、將字幕結果上傳至維基媒體共享資源的TimedText。

本節說明如何將 XML YouTube翻譯字幕轉換成 SubRip (srt) 格式,也就是 維基媒體共享資源所使用的 TimedText 字幕格式。

若是:

  • YouTube影片有某些語言的翻譯字幕 (例如:我製作的YouTube影片 有英語、俄語和利維卡累利阿語三種語言的翻譯字幕)、
  • 這個影片已上傳至維基媒體共享資源(例如,這個檔案),
  • 您要複製YouTube翻譯字幕到在共享資源上相同的視訊。

那麼:

  1. 下載XML格式的翻譯字幕,將YouTube視訊的ID放在URL結尾: http://video.google.com/timedtext?hl=en&lang=en&v=__youtube_video_ID__
  2. 安裝Ruby。
  3. 下載Ruby程式,將YouTube的XML格式視訊翻譯字幕轉換成SubRip格式。
  4. 執行此程式並將XML檔案轉換為.SRT檔案。
  5. 將.SRT檔案的內容複製然後貼到維基媒體共享資源上視訊的對應頁面。

Transitions

Subtitle cards may not overlap at any point in time. For the transition between subtitles there are two different styles:

  • Subtitles for continuous speech are immediately next to each other. The end time of the earlier subtitle equals the start time of the subsequent subtitle. This can help keeping the CPS at low values.
  • There is always a constant gap (e. g. ⅒ of a second) between any two subtitles of continuous speech giving the impression of a “flashing” effect and thus drawing the viewer’s attention to the subtitles. Sometimes speakers repeat one and the same phrase, the “flashing” can underscore this repetition.

Commons does not prescribe which style you follow, but you must be consistent. You may not switch between styles within the same timed text file. Both styles are in use for professionally created subtitles.

Lead and Tail

As to not “reveal” information before it is “disclosed” in speech, subtitles should not start before the respective speaker commences speech. The modal verb should means you may deviate from this rule. In general, you may add a more generous lead to educational contents (e. g. narration in documentaries) as are predominantely found on Commons. Some people are slow readers or just appreciate being granted more time when learning about stuff.

Similarly, subtitles should disappear shortly after the utterance concluded. Prefer increasing the tail time over increasing the lead time if the CPS value is high.

On occassions you may have a “negative tail”: In elongated speech – in particular singing people stretching speech, e. g. as in operas – it does not make sense to keep the subtitle on screen until the speaker finally finished uttering the last syllable.

Videos

Subtitling videos requires taking account of the picture: Considering a standard video frame rate of 25 FPS it is highly frowned upon to show or hide subtitles within 1 – 4 frames of a w:hard cut.

Subtitles must appear or disappear either

  • exactly on time as the hard cut happens, or
  • significantly sooner or later (usually more than 4 frames at 25 FPS) before or after a hard cut.

As of 2025, the most common form of displaying subtitles is the overlay method. Subtitles are rendered over (on) the video picture. This, however, may hide information such as the name and professional title of a person interviewed. Unfortunately, as of 2025 the accepted subtitle data formats do not offer a way to ensure such information is not covered by subtitles, in particular a “show this subtitle card at top edge” command is not available.

With accessibility in mind, subtitles for videos can – beside the a. m. contents – include cards highlighting the absence of sounds. For example the picture shows two people arguing orally (= mouth movement) but it cannot be heard what they are saying. This discrepancy can be clarified with [no audible dialog] or similar.

Furthermore, you may want to consider adding annotating subtitle cards about on‐screen symbols, such as

  • translations, e. g. of banners or signs,
  • transcriptions of texts written in a foreign script, or
  • for localized subtitles, explanation of relevant symbols that are virtually unknown in the target locality.

File name

In order to associate timed texts with their media files, the beginning of the timed text’s file name has to match the respective media file’s name. That means all timed texts for File:some.video have TimedText:some.video as their prefix. What comes after that is up to you, yet to provide a reasonable user experience it is customary to use a suffix indicating the principal natural language and file format, e. g. .en.srt for English‐language subtitles.

Nonlingual

The “language” code zxx indicates non‐lingual content, for example a timed text showing a real time clock of surveillance footage.

Multilingual

For polylingual media there are multiple options:

  • include all speech but leave untranslated
    • *.mul.srt file (mul = multiple languages code)
    •   provides kind of same experience as hearing people have
    •   description of audio cues is unmanageable
  • omit untranslated, yet still indicate: [speaks Asian]
    • *.en.srt file
    •   provides comparable experience as hearing people have
    •   unsatisfactory especially if prolonged
  • tag and translate: [speaks German] I doubt it.
    • *.en.srt file
    •   even for polyglots switching between multiple languages can be difficult
    •   subtitles do not become too heavy (as in the bilingual option below)
  • bilingual subtitles (include both the original and translated version), spread across two lines
    • no convention, but this option offers the option to #REDIRECT
    •   readers proficient in the other language can read the original
    •   because you cannot spread text across two lines you need more frequent cuts (more subtitle cards)
    •   the “reading speed” can differ a lot between the languages, thus for one language the speed is too generous, for the other too fast

Again, use your best judgment, but stay consistent with your choice.

Signing

For signed speech there are multiple options, but de facto the last one is virtually always used.

  • subtitle only spoken speech (and possibly describe sound cues) and name the timed text according to the primarily spoken language – this makes sense if there is a sign language interpreter purely for accessibility, yet the interpretation deviates (e.g. because of time pressure)
  • indicate signed glosses – however
    • some notations require precise control of formatting that is, as of 2025, not possible
    • there is no ISO 639‑3 code for notations, e. g. gsg denotes German sign language yet it does not imply any specific notation
  • translate signed speech to the closest corresponding orally spoken language, e. g. ASL → (American) English

Accordingly the timed text is named with a .en.srt suffix.

Multimedia

No naming scheme has been established for media files containing multiple streams differing in contents. By convention, without any extra indication in the file name, users expect timed texts to be suitable for the primary video track and primary audio track.

Of course this is not much of an issue if there is no intent to ever supply timed texts; e. g. there is a separate M & E soundtrack, which is actually meant to facilitate creation of dubbed versions, not to be listened to on its own.

国际化

Steps

翻譯字幕以視訊的原始語言轉錄到Timed Text檔案後,就可以依下列方式翻譯成其他語言:

  • 在編輯模式中開啟原始語言的 Timed Text 檔案,例如英文的TimedText:Elephants Dream.ogv.en.srt ,然後複製整個頁面。
  • 在位址列中,將「en」換成您所選擇的語言代碼,例如「fr」,然後將原始文字貼到新頁面中。
  • 觀看原始視訊,然後將文字翻譯成您的語言。
  • 儲存新頁面後,帶翻譯字幕的視訊應該會載入頁面;您可以檢視視訊以檢查翻譯字幕的時序。
  • 在討論頁 [[Category: Timed Text in Language Name|Language Name]] 加入分類連結。例如,請看一下TimedText talk:Elephants Dream.ogv.fr.srt

寻找需要翻譯字幕的视频

查找此类视频的方法之一是根据首选起始语言打开 Category: Files with closed captioning 中的一个子类别,然后使用[[Help: FastCCI|Help:FastCCI]](位于页面右上方),去包含没有想要的目标语言字幕的视频。

  1. 要查找有英文字幕的视频以进行翻译,请访问Category: Files with closed captioning in English
  2. 然后,单击FastCCI箭头打开子菜单,选择 "在此类别中但不在......"。
  3. 在文本框中,根据您首选的目标语言输入相应的类别:
    • 德语请输入Files with closed captioning in German
    • 法语请输入Files with closed captioning in French
    • 俄语请输入Files with closed captioning in Russian

等等..

維護工作

定时文本的讨论

TimedText talk命名空间用于讨论各自的定时文本页面,但也可用于链接和分类定时文本页面。

链接

本節需要擴充

如何將隱藏字幕與多媒體檔案關聯?

一個可能的歸類方案是:

 [[:Category:File formats]] + [[:Category:Media types]]
                       |
               [[:Category:Timed Text]] + [[:Category:Legend in German]]
                                   | 
                           [[:Category:Timed Text in German]]
 
                                   + [[:Category:Legend in French]]
                                   | 
                           [[:Category:Timed Text in French]]
 
                                   + [[:Category:Legend in English]]
                                   | 
                           [[:Category:Timed Text in English]]

相關類別:Category:Files with closed captioning

参见

External sites

維基百科有關定時文字或翻譯字幕主題的條目

這些是與「若不是」Q844253:定時文字「就是」Q204028:翻譯字幕的條目。

References

  1. 1 2 AI 人工智慧
Category:Timed Text/Translations#%20 Category:Commons features/Translations#Timed%20Text Category:Commons video resources/Translations#Timed%20Text
Category:Commons features/Translations Category:Commons video resources/Translations Category:Timed Text/Translations