Help - Search - Member List - Calendar
Full Version: Question about UTF-8 and file formats
Subtitles Translator Forums > Subtitles Translator > Questions & Answers
mikebell
Hi mironto & rest of the ST users!

First of all, thank you so much, mironto, for developing such a wonderful program! I started using it a week ago and love it. I do have a bunch of questions that I hope you, or someone else for that matter, could help me with.

1) I have noticed that there are two types of subtitle files out there... one looks like this:

The SRT file:

CODE

3
00:00:41,303 --> 00:00:43,350
That's the rock
Michael described.
.... etc.


Here's what I think these things are: 3 is the number of the subtitle in the list. 00:00:41 is the begin time and 00:00:43 is the end time for how long the subtitle will be displayed.

What are 303 and 350? miliseconds??

Second type I found looks like this:

The SUB file:

CODE

{1}{1}25.000
{71}{94}:"בפרקים הקודמים של "אבודים
.... etc.


I'm guessing the 25.000 is the framerate? What exactly is 71 & 94? Are these frames or times in some kind of a weird format?

2) If the framerate is not specified in SUB file, is it safe to assume it's 25 (PAL)?

3) Foreign charcter text in these files... it is not in UTF-8 format, right? Why don't they use UTF-8 or UTF-16? Is it OK to conver them to UTF-8 when I save them?

Thanks!!!
adicoto
1. You are right, there are miliseconds. And 3 is the number of the line in the subtitle. Don't mess up this, you'll end up with an unreadable subtitle.
2. Yes, they are frame numbers. For example, on a 25 fps file, it will look something like this:

{25}{50}How are you
{51}{100}Fine, thank you.

So, the line "how are you" will be displayed for 1 second (25 frames, from 25 to 50) and the second line will be for 2 seconds (50 frames, from 51 to 100)
The framerate is not specified into the file (some translators do write it in the first line, some don't). I willassume that the framerate is 23,976, as this is more usual.

I suggest to let the encoding in plain text format, as some players are still uncomfortable with UTF encoding.

PS. Welcome into our world, enjoy your staying.
mikebell
adicoto,

Thank you so much for clearing up the confusion! I now uderstand what's going on.

One quick thing about this:

QUOTE
I suggest to let the encoding in plain text format, as some players are still uncomfortable with UTF encoding.


How is the "code page" specified? If I'm editing a file in some weird codepage, how do I know which code page to open it as? I just don't understand this bit... This is why I was thinking about UTF-8...
adicoto
Just use ANSI. Open the file with Notepad, if it can open it, then any player will do.
On the other hand, the script used it's a different thing. If the font used it's for example chinese and you open it without having installed support for those fonts, you will end up seeing...what you saw in that file.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.
Invision Power Board © 2001-2012 Invision Power Services, Inc.