EDDYMENS

Published 2 weeks ago

What Is The Supplementary Multilingual Plane (SMP)?

Like me you have likely encountered the Basic Multilingual Plane (BMP) [β†—] or simply Plane, the section of Unicode that includes most of the characters, symbols, and emojis we use every day. BMP is often what we think of when they refer to Unicode. Most of us know there is an extended version but likely never worked with them directly.

In this article I want to take a look at it through the lenses of working with emojis because thats was my first encounter with extended Unicode, I think.

Lets start with the Basic Multilingual Plane?

As I mentioned eariler, the Basic Multilingual Plane (BMP) is the first and most commonly used range of Unicode characters.

Unicode is a standard that assigns a unique code to every character, symbol in digital text. Characters within the BMP are encoded using hex values between 0000 to FFFF.

01: a.textContent= "\u00aa" // a

Everything after \u or u+ (tells the programming language or reader that its unicode) is a hex value that resolves to 4 bits in binary, e.g.: F is 1111 X 4(for each character) thus 16-bits, meaning the computer stores each unicode character using 16-bits, anything outside of this falls under the SMP.

Without going in dept the reason you need to keep the 16-bit number in mind is the fact that most programming languages like Javascript handle strings using 16-bit or better known as UTF-16.

What’s Outside the BMP? Meet the Supplementary Multilingual Plane

Now that we know about BMP and what UTF-16 is all about, lets talk about Supplementary Multilingual Plane (SMP).

Characters that were later added were assigned unicode starting from U+10000, and this includes emojis. require additional bits and are stored in Supplementary Planes. The Supplementary Multilingual Plane (SMP), one of these extended planes, contains characters such as:

  • Additional emojis
  • Rare or ancient scripts
  • Special symbols for math, music, and technical fields

Lets talk about surrogate pairs

The crescent moon emoji (U+1F319) is part of the SMP, not the BMP. Because of this, it cannot be represented with a simple \u sequence, as it requires more than 16 bits. Instead, we use surrogate pairs in UTF-16, like this:

01: a.textContent = "\uD83C\uDF19 Dark Mode"; // Displays the crescent moon emoji with Dark Mode text

How to Work with SMP Characters

When dealing with characters outside the BMP (like U+1F319 for πŸŒ™), you need to be aware of how they’re encoded. Here are some quick guidelines:

  • BMP characters: U+0000 to U+FFFF (e.g., \u2600 for β˜€οΈ) are easy to use with \u escape sequences.
  • SMP and other planes: U+10000 and above (e.g., U+1F319 for πŸŒ™) require surrogate pairs in UTF-16.

To check if a character is in the BMP or SMP, look at the code point:

  • BMP: Code points up to U+FFFF (4 hex digits).
  • SMP and Beyond: Code points U+10000 and above (5 hex digits).

Summary

  • The BMP is Unicode’s most commonly used range, covering characters from U+0000 to U+FFFF.
  • The SMP is an extended range for less common characters, starting at U+10000.
  • BMP characters are simple to work with using \u sequences, while SMP characters need special handling with surrogate pairs.

Understanding these two planes can help you manage emojis and special characters smoothly in your projects!

Here is another article you might like 😊 A Look At The Basic Multilingual Plane (BMP)