Base64 Encoder & Decoder
Result
—
Output Length
—
What Is Base64 Encoding? A Deep Dive Into the Algorithm, Use Cases, and Implementation
Base64 is a binary-to-text encoding scheme that converts arbitrary binary data into a string composed exclusively of printable ASCII characters. Defined in RFC 4648 and originally described in RFC 2045 as part of the MIME specification, Base64 serves a critical role in modern computing: it allows binary data to travel safely through systems designed to handle only text. Without Base64, embedding an image in an HTML page, attaching a file to an email, or including a cryptographic signature in a JSON web token would be far more complicated.
The name "Base64" comes from the encoding's alphabet of exactly 64 characters: the 26 uppercase letters A-Z, the 26 lowercase letters a-z, the 10 digits 0-9, and two additional characters — traditionally + (plus) and / (forward slash). A 65th character, = (equals sign), serves as padding. This carefully chosen alphabet ensures that every character in the output is safe for transmission across virtually any text-based system, including email, URLs (with modifications), and XML documents.
How the Base64 Encoding Algorithm Works Step by Step
The encoding process follows a precise mathematical procedure. The algorithm takes input bytes three at a time (24 bits total), then divides those 24 bits into four 6-bit groups. Each 6-bit group (which can hold values 0 through 63) maps to exactly one character in the Base64 alphabet. Here is the process broken down:
Step 1 — Convert to binary: Take the first three bytes of input and write their 8-bit binary representations. For example, the word "Man" in ASCII is M=77 (01001101), a=97 (01100001), n=110 (01101110), giving the 24-bit sequence 010011010110000101101110.
Step 2 — Split into 6-bit groups: Divide the 24 bits into four groups of 6: 010011 | 010110 | 000101 | 101110. These evaluate to decimal values 19, 22, 5, 46.
Step 3 — Map to characters: Look up each value in the Base64 alphabet table. Index 19 = T, index 22 = W, index 5 = F, index 46 = u. So "Man" encodes to "TWFu".
The Base64 Alphabet Table
| Index Range | Characters | Description |
|---|---|---|
| 0-25 | A-Z | Uppercase letters |
| 26-51 | a-z | Lowercase letters |
| 52-61 | 0-9 | Digits |
| 62 | + | Plus (or - in Base64URL) |
| 63 | / | Slash (or _ in Base64URL) |
| Padding | = | Used when input length is not divisible by 3 |
Understanding Base64 Padding
Because the algorithm processes input in groups of 3 bytes, the input length is not always a perfect multiple of 3. When the final group has only 1 byte (8 bits), it is padded with zeros to make two 6-bit groups, producing two Base64 characters followed by == padding. When the final group has 2 bytes (16 bits), it is padded to three 6-bit groups, producing three Base64 characters followed by a single = padding. The padding signals to the decoder exactly how many bytes to discard when reconstructing the original data.
For example, "Ma" (two bytes: 77, 97) produces 24 bits after zero-padding: 01001101 01100001 00000000. Split into 6-bit groups: 010011 | 010110 | 000100 | 000000. The first three groups map to "TWE" and the fourth is padding, giving "TWE=". Meanwhile, "M" alone (one byte: 77) produces: 01001101 00000000 00000000, splitting to 010011 | 010000 | 000000 | 000000, which maps to "TQ==".
The 33% Size Overhead Explained
Base64 encoding always increases the size of the data by approximately 33%. This is an inherent mathematical consequence: three bytes of input (24 bits) produce four bytes of output (32 bits), giving a ratio of 4/3 = 1.333. For a 1 MB image, the Base64 representation will be roughly 1.33 MB. While this overhead is acceptable for small payloads like API tokens or email attachments, it makes Base64 impractical for large file transfers where binary protocols are available. When line breaks are added (as in MIME), the overhead increases slightly further.
Real-World Use Cases for Base64 Encoding
Data URIs in HTML and CSS: You can embed images, fonts, and other binary files directly in HTML or CSS using the data URI scheme. For example, data:image/png;base64,iVBORw0KGgo... embeds a PNG image inline. This eliminates an HTTP request, which can improve performance for small files, though it increases document size by 33% and prevents browser caching of the asset.
Email attachments (MIME): The MIME standard uses Base64 to encode binary attachments in email, which is a text-based protocol (SMTP transmits 7-bit ASCII). Every file you attach to an email — PDFs, images, ZIP archives — is Base64-encoded before transmission and decoded by the recipient's email client.
JSON Web Tokens (JWTs): JWTs encode their header and payload segments using Base64URL encoding. When you see a JWT like eyJhbGciOiJIUzI1NiJ9.eyJzdWIiOiIxMjM0NTY3ODkwIn0.signature, the first two segments are Base64URL-encoded JSON objects that you can decode to inspect the token's claims.
API authentication: HTTP Basic Authentication encodes the username:password string in Base64 and sends it in the Authorization header. While this is not encryption (the credentials are trivially decodable), it ensures the credentials travel safely through HTTP headers that only support printable ASCII characters.
Binary data in XML and JSON: Since XML and JSON are text formats that cannot natively represent arbitrary binary data, Base64 provides a standard way to embed binary content. AWS SDKs, for instance, frequently Base64-encode binary parameters in API requests.
Base64 vs Base64URL: When URL Safety Matters
Standard Base64 uses + and / as characters 62 and 63, but both have special meaning in URLs (+ represents a space in query strings, and / is a path separator). Base64URL, defined in RFC 4648 section 5, replaces + with - (hyphen) and / with _ (underscore), and typically omits the = padding. This variant is used in JWTs, filename-safe encoding, and any context where the encoded string might appear in a URL. When switching between standard Base64 and Base64URL, you simply need to swap these two characters and handle padding accordingly.
Base64 in JavaScript: btoa(), atob(), and Unicode
JavaScript provides two native functions for Base64: btoa() (binary to ASCII) for encoding and atob() (ASCII to binary) for decoding. However, these functions only handle Latin-1 characters (code points 0-255). For Unicode text containing characters beyond Latin-1 — such as Chinese, Arabic, or emoji — you must first encode the string to UTF-8 bytes. This converter uses the pattern btoa(unescape(encodeURIComponent(text))) for encoding and decodeURIComponent(escape(atob(base64))) for decoding, which correctly handles full Unicode input. In Node.js, you would use Buffer.from(text).toString('base64') instead.
Common Mistakes When Working with Base64
Treating Base64 as encryption: Base64 provides zero security. It is an encoding, not a cipher. Anyone can decode Base64 instantly. Never Base64-encode passwords, API keys, or sensitive data without first encrypting them with a proper algorithm like AES-256.
Ignoring the size overhead: Base64-encoding large files increases their size by 33%, consuming more bandwidth and storage. For large binary payloads, use multipart form data or binary protocols instead.
Mixing up Base64 and Base64URL: Using standard Base64 in URLs without converting + to - and / to _ will cause parsing errors. Always use the URL-safe variant for JWTs, query parameters, and filenames.
Double-encoding: Encoding data that is already Base64-encoded produces a valid but incorrect result. If your decoded output looks like Base64, you may need to decode it a second time.
Frequently Asked Questions
What is Base64 encoding and why is it used?
Base64 is a binary-to-text encoding scheme that converts binary data into a string of 64 printable ASCII characters (A-Z, a-z, 0-9, +, /). It is used whenever binary data needs to be transmitted through text-only channels, such as embedding images in HTML via data URIs, attaching files in email (MIME), or including binary tokens in JSON APIs. The encoding ensures that binary data survives transmission through systems that may modify or reject non-text bytes.
How does the Base64 encoding algorithm work?
Base64 takes every 3 bytes (24 bits) of input and splits them into four 6-bit groups. Each 6-bit value (0-63) maps to one of 64 characters in the Base64 alphabet. If the input length is not divisible by 3, padding characters (=) are added to make the output length a multiple of 4. One trailing byte produces two Base64 characters plus ==, and two trailing bytes produce three Base64 characters plus =. This means Base64 output is always approximately 33% larger than the original data.
What is the difference between Base64 and Base64URL?
Standard Base64 uses + and / as its 62nd and 63rd characters, which have special meanings in URLs (+ represents a space, / is a path separator). Base64URL (RFC 4648 section 5) replaces + with - (hyphen) and / with _ (underscore), and typically omits the = padding. This variant is used in JWTs (JSON Web Tokens), data URIs with URL parameters, and any context where the encoded string appears in a URL or filename.
Does Base64 provide encryption or security?
No. Base64 is an encoding, not encryption. It provides zero security — anyone can decode Base64 instantly with no key. It is designed for safe data transport through text-only channels, not for confidentiality. If you need to protect data, use proper encryption (AES-256, RSA) and then optionally Base64-encode the encrypted ciphertext for transport through text-based protocols.
How much larger does data become after Base64 encoding?
Base64 encoding increases data size by approximately 33%. Every 3 bytes of input (24 bits) produce 4 bytes of output (32 bits), giving a ratio of 4/3 or 1.333. A 1 MB file becomes approximately 1.33 MB when Base64-encoded. With line breaks added (as in MIME format), the overhead increases slightly further to about 37%. This predictable overhead makes Base64 impractical for large file transfers where binary protocols are available, but acceptable for small payloads like API tokens, email attachments, and inline data URIs.
How do I encode or decode Base64 in different programming languages?
In JavaScript, use btoa() to encode and atob() to decode (for Unicode, use the encodeURIComponent/TextEncoder pattern). In Python 3, use base64.b64encode() and base64.b64decode(). In Java, use java.util.Base64 encoder/decoder. In PHP, use base64_encode() and base64_decode(). Use our ASCII converter to understand the underlying character codes that Base64 operates on.