MPĀ£ File Internal Structure
These diagrams are my attempt to visualise the internal structure of an mp3 file using ID3 version 2.3 tags. They are based on what I have read on the Internet. I confess that I did not always fully understand what I was reading, so should you find that I have anything wrong, please contact me and I will happily correct it.
To fully understand what is going on you may need to do some homework. You will need to know about these topics. (Or you can just use the dll without worrying too much about what is happening inside.)
This diagrams shows the overall structure of a MP3 file using ID3 version 2.3 metadata. In this context 'Tag' refers to the block of the file containing all of the v2.3 metadata. Beware - this is potentially confusing as normally 'tag' refers to a simple item of metadata, for example the artit's name. But not here.
There is the main header, and, optionally, an extended header. (None of the files in my collection had an extended header.)
Remember that in an mp3 file the tag is a block of the file holding all the metadata - i.e all of the things we commonly refer to as tags: artist, title, etc.
The tag is made up of frames, plus, optionally, padding. We are most interested in text frames as these hold the information about our music, one frame per item. There are other types of frame, most of which we can ignore.
A frame holds a single piece of information about the file.
These are the frames that hold the textual data describing the track: artist='Prince', track='Purple Rain' etc.
We may also be interested in COMMENT frames. These may hold proprietory binary data, for example added by iTunes, or simple textual comments. I have chosen to process textual comments, whilst ignoring binary comments.
I could not find an online explanantion of the internal format of these that I was fully able to understand and that corresponded to what I saw in the handful of test files that I examined. The diagram below is my best guess, but take it with a 'pinch of salt'.
In February 2021 I received several e-mails from 'Timmy' suggesting that I may have some of this wrong. Here are his comments on my COMM frame diagram:
----------------------------------------
Text Encoding, 1 byte
0x00 for ISO-8859-1, or
0x01 for UTF-16 with BOM, or
0x02 for UTF-16BE without BOM, or
0x03 for UTF-8.
Language Code, 3 bytes
E.g. "eng" (0x65, 0x6E, 0x67).
Comment Description, n byte(s)
Only if the text encoding is 0x01 UTF-16, the
description must start with a BOM; xFE xFF or xFF xFE.
The description may have a length of 0, but must end
with a terminating NULL character formatted accordingly:
- If the text encoding is 0x00 ISO-8859-1 or 0x03
UTF-8,
the NULL character is 1 byte; 0x00.
- If the text encoding is 0x01 UTF-16 or 0x02
UTF-16BE,
the NULL character is 2 bytes; 0x00 0x00.
Comment, n byte(s)
Comment ends abruptly wityhout ant termination by
NULL
characters.
----------------------------------------
Thanks Timmy!
If anyone else knows more than me, and sees errors with these notes, please contact me and I will either annotate them appropriately or take them down.