Concatenation and SMS Encoding
Introduction
Effective use of SMS Marketing requires a solid understanding of how SMS messages are encoded. There are two main encoding systems: GSM Encoding and Unicode Encoding. The choice between them depends on the set of characters used in the message.
Summary
The summary below provides a simplified overview. For full details on concatenation and encoding, please refer to the complete article.
- GSM Encoding – Limit of 160 characters per message; supports only a limited set of common alphanumeric characters, excluding symbols, emojis, and special characters.
- Unicode Encoding – Limit of 70 characters per message; allows representation of a wide range of characters, including symbols, emojis, and non-Latin scripts, making it ideal for multilingual messages.
GSM Encoding
GSM (Global System for Mobile Communications) encoding is widely used for transmitting text messages via mobile networks. It supports a restricted set of alphanumeric characters, including letters, numbers, and some basic symbols.
Closum supports all standard GSM characters as well as characters from the GSM extended table.
This type of encoding is optimized to use fewer network resources, allowing faster data transmission.
- Standard GSM characters are encoded using 7 bits per character.
- Extended table characters require two characters for encoding: an ESC prefix followed by the extended character.
Advantages of GSM encoding:
- High transmission efficiency
- Allows more characters per SMS
- Reduced network usage
- Broad compatibility with mobile devices and telecom providers worldwide
Limitation: GSM encoding does not support all characters from more comprehensive systems like Unicode.
Unicode Encoding
Unicode is a universal character encoding standard that includes most of the world's writing systems. It can represent all characters, regardless of language or script, using unique numeric codes.
Scripts such as Arabic, Chinese, Korean, Japanese, and Cyrillic—as well as the use of emojis—require Unicode encoding. These characters are encoded using 16-bit UCS-2 format.
Even characters that exist in the GSM set are encoded using UCS-2 when Unicode is selected for the message.
Encoding Examples
Message | Type | Bytes per Character | Total Bytes | Character Set Used |
---|---|---|---|---|
bonjour monde | Text | 1 | 13 | GSM Standard |
Isto ^ Aquilo | Text | 1 (2 for "^" - extended) | 14 | GSM Standard + Extended Table |
こんにちは世界 | Unicode | 2 (UCS-2) | 14 | Unicode |
Advantages of Unicode encoding:
- Global language support
- Allows use of emojis and special symbols
- Enables visually engaging and dynamic messages
- Essential for international marketing campaigns
Maximum Character Limits
The maximum size of a single SMS message is 140 bytes. This corresponds to:
- 160 characters (7-bit) using GSM encoding
- 70 characters (16-bit) using Unicode (UCS-2)
Messages exceeding this limit are split into multiple parts, each billed separately.
Parts | Max Characters | Calculation |
---|---|---|
1 | 160 | No UDH, full 160 characters |
2 | 304 | (160 - 7) × 2 = 306 |
3 | 456 | (160 - 7) × 3 = 459 |
4 | 608 | (160 - 7) × 4 = 612 |
If you are sending a message in Unicode, each character will use 2 bytes.
Closum supports messages up to 3,200 characters, but not all carriers accept this length. As a best practice, avoid sending messages that exceed 6 parts.
Conclusion
GSM encoding offers efficient transmission and universal support, making it ideal for simple, text-based messages. Unicode encoding provides flexibility, enabling businesses to reach a global audience with localized, multilingual messages—along with the use of emojis and symbols to enhance engagement.
Choosing the right SMS encoding depends on your audience, system capabilities, and message content. It's important to weigh the advantages of each format and select the one that ensures optimal delivery and clarity for your recipients.
Updated on: 10/07/2025
Thank you!