What is MIME ( Multi-Purpose Internet Mail Extensions )

What is MIME?

The MIME stands for Multi-Purpose Internet Mail Extensions. As the name indicates, it is an extension to the Internet email protocol that allows it’s users to exchange different kinds of data files over the Internet such as images, audio, and video. The MIME is required if text in character sets other than ASCII. Virtually all human-written Internet email and a fairly large proportion of automated email is transmitted via SMTP in MIME format. Actually, MIME was designed mainly for SMTP, but the content types defined by MIME standards are important also in communication protocols outside of email, such as HTTP. In 1991, Nathan Borenstein of Bellcore proposed to the IETF that SMTP be extended so that Internet (but mainly Web) clients and servers could recognize and handle other kinds of data than ASCII text. As a result, new file types were added to “mail” as a supported Internet Protocol file type.

MIME Working

The web servers insert the MIME header at the beginning of any Web transmission. Clients use this content type or media type header to select an appropriate “player” application for the type of data the header indicates.

MIME headers

Now, let’s see the MIME headers. There are many sub parts come under MIME headers. Let’s see each in detail.

1) MIME-Version

2) Content-Type

3) Content-Disposition

4) Content-Transfer-Encoding

MIME-Version

MIME-Version header is used to indicate that the message is MIME formatted. This header is having a value of “1.0” typically. So this header will be as follows.

eg: MIME-Version: 1.0

When the MIME was developed, the developers had a plan to further issue the newer versions, but the problems caused by changes in a standard discouraged further release of the same.

Content-Type

This header is used to describe the content type of the message. The Content-Type header includes the type and a sub-type parts.

eg: Content-Type: text/plain

Through the use of the multi-part type, MIME allows mail messages to have parts arranged in a tree structure where the leaf nodes are any non-multipart content type and the non-leaf nodes are a variety of multi-part types. This mechanism supports:

1) Simple text messages using text/plain (the default value for “Content-Type: “).

2) Text plus attachments (multipart/mixed with a text/plain part and other non-text parts). A MIME message including an attached file generally indicates the file’s original name with the “Content-disposition:” header, so the type of file is indicated both by the MIME content-type and the (usually OS-specific) filename extension.

3) Reply with original attached (multipart/mixed with a text/plain part and the original message as a message/rfc822 part).

4) Alternative content, such as a message sent in both plain text and another format such as HTML (multipart/alternative with the same content in text/plain and text/html forms).

5) Image, audio, video, and application (for example, image/jpeg, audio/mp3, video/mp4, and application/msword and so on).

6) Many other message constructs.

Content-Disposition

The original MIME specifications only described the structure of mail messages. They did not address the issue of presentation styles. The content-disposition header field was added in RFC 2183. It specifies the presentation style.

There are two types of Content-Disposition:

1) Inline Content-Disposition

2) Attachment Content-Disposition

Inline Content-Disposition

The Inline Content-Disposition would be automatically displayed when the message is shown.

Attachment Content-Disposition

The Attachment Content-Disposition will not be displayed automatically. It needs the user action to get opened and displayed.

The content-disposition header also provides fields for specifying the name of the file, the creation date, and modification date, which can be used by the reader’s mail user agent to store the attachment.

eg: Content-Disposition: attachment; filename=genome.jpeg;

modification-date=”Wed, 12 Feb 1997 16:29:51 -0500″;

Now a day, a good majority of mail user agents do not follow this prescription fully. The widely used Mozilla Thunderbird mail client makes its own decisions about which MIME parts should be automatically displayed, ignoring the content-disposition headers in the messages. Thunderbird prior to version 3 also sends out newly composed messages with inline content-disposition for all MIME parts.

Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the content-disposition header. This practice is discouraged – the file name should be specified either through just the filename parameter, or through both the filename and the name parameters.

Content-Transfer-Encoding

In June 1992, MIME (RFC 1341) defined a set of methods for representing binary data in formats other than ASCII text format. The content-transfer-encoding: MIME header has 2-sided significance:

It indicates whether or not a binary-to-text encoding scheme has been used on top of the original encoding as specified within the Content-Type header:

1) If such a binary-to-text encoding method has been used, it states which one.

2) If not, it provides a descriptive label for the format of content, with respect to the presence of 8-bit or binary content.

The RFC and the IANA’s list of transfer encodings define the values shown below, which are not case sensitive. Note that ‘7bit’, ‘8bit’, and ‘binary’ mean that no binary-to-text encoding on top of the original encoding was used. In these cases, the header is actually redundant for the email client to decode the message body, but it may still be useful as an indicator of what type of object is being sent. Values ‘quoted-printable’ and ‘base64’ tell the email client that a binary-to-text encoding scheme was used and that appropriate initial decoding is necessary before the message can be read with its original encoding (e.g. UTF-8).

Suitable for use with normal SMTP:

1) 7bit – up to 998 octets per line of the code range 1..127 with CR and LF (codes 13 and 10 respectively) only allowed to appear as part of a CRLF line ending. This is the default value.

2) quoted-printable – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient and readable when used for text data consisting primarily of US-ASCII characters, but also containing a small proportion of bytes with values outside that range.

3) base64 – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. It is sometimes used for text data that frequently uses non-US-ASCII characters.

Suitable for use with SMTP servers that support the 8BITMIME SMTP extension (RFC 6152):

1) 8bit – up to 998 octets per line with CR and LF (codes 13 and 10 respectively) only allowed to appear as part of a CRLF line ending.

2) Suitable for use with SMTP servers that support the BINARYMIME SMTP extension (RFC 3030):

3) Binary – any sequence of octets.

Encoded-Word

Since RFC 2822, conforming message header names and values should be ASCII characters. Values that contain non-ASCII data should use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding (the “charset”) and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.

The form is: “=?charset?encoding?encoded text?=”.

charset: The charset can be any character set registered with IANA. Typically, it would be the same charset as the message body.

encoding: The encoding can be either “Q” denoting Q-encoding that is similar to the quoted-printable encoding, or “B” denoting base64 encoding.

encoded text: The encoded text is the Q-encoded or base64-encoded text.

encoded-word: An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLF SPACE) may be used.