URL Encoding Explained: What Is Percent-Encoding
Every time you click a link, submit a form, or type a web address, your browser handles URL encoding behind the scenes. Those cryptic strings of percent signs and hexadecimal numbers you see in URLs are not random; they are a carefully designed mechanism that makes the internet work reliably. Understanding URL encoding is essential for web developers, API designers, and anyone who works with web technologies. This guide explains what URL encoding is, why it exists, how it works, and the security implications you need to know about.
What Is URL Encoding?
URL encoding, formally known as percent-encoding, is a mechanism
defined in RFC 3986 that converts characters into a format that can be
safely transmitted within a Uniform Resource Identifier (URI). The URL
specification only allows a subset of ASCII characters to appear
directly in a URL. Any character outside this allowed set must be
encoded as a percent sign (%) followed by two hexadecimal
digits representing the character's byte value.
For example, the space character cannot appear directly in a URL. When
encoded, it becomes %20, where 20 is the
hexadecimal representation of 32, the ASCII code for a space.
Similarly, the less-than symbol < becomes
%3C, because the ASCII code for < is 60,
which is 3C in hexadecimal.
Why URL Encoding Is Necessary
URLs have a specific syntax that uses certain characters as
delimiters. The question mark ? separates the path from
the query string. The ampersand & separates query
parameters. The equals sign = separates parameter names
from values. The forward slash / separates path segments.
If these characters appear in the data being transmitted, they would
be misinterpreted as URL structure rather than content.
Consider what happens when a user searches for "rock & roll" on a
website. Without encoding, the URL would look like
/search?q=rock & roll. The browser would interpret
the & as a query parameter separator, splitting the
query into two parameters: q=rock and a second parameter
named roll with no value. URL encoding fixes this:
/search?q=rock%20%26%20roll.
How Percent-Encoding Works
The encoding process follows a straightforward algorithm defined by the URI specification:
-
Identify unreserved characters: The characters
A-Z,a-z,0-9,-,.,_, and~are unreserved. They never need encoding and can appear directly in a URL. -
Identify reserved characters: The characters
:,/,?,#,[,],@,!,$,&,',(,),*,+,,,;, and=are reserved for URL syntax. They must be encoded when used as data rather than delimiters. - Encode everything else: Any character not in the unreserved or reserved set must be percent-encoded. This includes spaces, control characters, non-ASCII characters, and any character outside the ASCII range.
The Encoding Process for Non-ASCII Characters
Non-ASCII characters (like accented letters, CJK characters, and emoji) require an additional step. First, the character is converted to its UTF-8 byte sequence. Then each byte is percent-encoded individually. This means a single character can produce multiple percent-encoded triplets.
For example, the Euro sign (€) has the UTF-8 byte
sequence E2 82 AC. When URL-encoded, it becomes
%E2%82%AC. The Chinese character for "middle"
(中) has the UTF-8 sequence E4 B8 AD,
producing %E4%B8%AD. The emoji 😀 (grinning face)
has a 4-byte UTF-8 sequence, encoding to %F0%9F%98%80.
Common Encoded Characters
The following table shows the most frequently encountered URL-encoded characters:
| Character | Encoded | ASCII Code | Reason |
|---|---|---|---|
| Space | %20 | 32 (0x20) | Not allowed in URLs |
| ! | %21 | 33 (0x21) | Reserved (sub-delim) |
| # | %23 | 35 (0x23) | Reserved (fragment delimiter) |
| $ | %24 | 36 (0x24) | Reserved (sub-delim) |
| & | %26 | 38 (0x26) | Reserved (query separator) |
| ' | %27 | 39 (0x27) | Reserved (sub-delim) |
| ( | %28 | 40 (0x28) | Reserved (sub-delim) |
| ) | %29 | 41 (0x29) | Reserved (sub-delim) |
| + | %2B | 43 (0x2B) | Reserved (space in query) |
| , | %2C | 44 (0x2C) | Reserved (sub-delim) |
| / | %2F | 47 (0x2F) | Reserved (path separator) |
| : | %3A | 58 (0x3A) | Reserved (scheme delimiter) |
| ; | %3B | 59 (0x3B) | Reserved (param separator) |
| = | %3D | 61 (0x3D) | Reserved (value delimiter) |
| ? | %3F | 63 (0x3F) | Reserved (query delimiter) |
| @ | %40 | 64 (0x40) | Reserved (authority delimiter) |
| [ | %5B | 91 (0x5B) | Reserved (IPv6 literal) |
| ] | %5D | 93 (0x5D) | Reserved (IPv6 literal) |
URL Encoding vs HTML Encoding
URL encoding and HTML entity encoding are often confused, but they serve entirely different purposes and operate in different contexts.
URL Encoding
URL encoding converts characters to %XX format for safe
transmission within a URL. It is governed by RFC 3986 and is applied
to URL components like the path, query string, and fragment.
HTML Encoding
HTML encoding converts characters to entity references like
&, <, and
> for safe rendering within an HTML document. It
prevents the browser from interpreting content as HTML markup. HTML
encoding is governed by the HTML specification.
Key Differences
| Aspect | URL Encoding | HTML Encoding |
|---|---|---|
| Context | URLs and URIs | HTML documents |
| Format | %XX (percent + hex) | &name; or &#NNN; |
| Example for < | %3C | < |
| Example for & | %26 | & |
| Example for " | %22 | " |
| Purpose | Safe URL transmission | Prevent HTML injection |
| Specification | RFC 3986 | HTML Living Standard |
In practice, you often need both. When embedding a URL in HTML (like
an href attribute), the URL itself needs URL encoding,
and the entire attribute value may need HTML encoding if it contains
characters like & or ". For example, a
link to /search?q=rock&roll in HTML would be written
as <a href="/search?q=rock%26roll"> (URL-encoded
ampersand) or
<a href="/search?q=rock&roll">
(HTML-encoded ampersand). Both are valid, but URL-encoding the
ampersand is the more correct approach because it preserves the
intended meaning of the data.
URL Encoding in Different Programming Languages
Every major programming language provides built-in functions for URL encoding and decoding. However, the exact behavior varies, and understanding the differences is crucial for avoiding bugs.
JavaScript
JavaScript provides three encoding functions, each with different behavior:
// encodeURI - encodes a complete URL
// Preserves: :, /, ?, #, &, =, +, @, ;, ,, !, ~, *, ', (, )
const url = encodeURI("https://example.com/search?q=hello world");
// Result: "https://example.com/search?q=hello%20world"
// encodeURIComponent - encodes a URL component
// Encodes ALL special characters including URL delimiters
const param = encodeURIComponent("price=100&discount=20");
// Result: "price%3D100%26discount%3D20"
// Never use escape() - it is deprecated
// It does not handle Unicode correctly
The critical distinction is that encodeURI preserves URL
structure characters, while encodeURIComponent encodes
everything. Use encodeURI when you have a full URL and
encodeURIComponent when you are encoding a single
parameter value.
Python
from urllib.parse import quote, quote_plus, urlencode
# quote - standard URL encoding (spaces as %20)
encoded = quote("hello world&more")
# Result: "hello%20world%26more"
# quote_plus - spaces become + instead of %20
encoded_plus = quote_plus("hello world")
# Result: "hello+world"
# urlencode - encodes a dictionary as a query string
params = {"q": "hello world", "lang": "en"}
query = urlencode(params)
# Result: "q=hello+world&lang=en"
PHP
// urlencode - spaces become +
$encoded = urlencode("hello world&more");
// Result: "hello+world%26more"
// rawurlencode - RFC 3986 compliant (spaces as %20)
$raw = rawurlencode("hello world&more");
// Result: "hello%20world%26more"
The Space Character: %20 vs +
The space character has two common encodings, and understanding when to use each is important:
- %20: The standard percent-encoding defined by RFC 3986. This is the correct encoding for spaces in the path component of a URL and is always safe.
-
+ (plus sign): Defined by the
application/x-www-form-urlencodedmedia type, which is used for encoding form data in query strings. In this encoding, spaces are replaced with plus signs, and plus signs themselves are encoded as%2B.
The distinction matters because decoding + as a space is
only correct in the context of
application/x-www-form-urlencoded data. In the URL path
component, + is a literal plus sign, not a space. Most
server-side frameworks handle this correctly for query strings, but it
can cause subtle bugs when encoding paths or working with custom URL
schemes.
Security Implications
URL encoding has significant security implications that every web developer must understand.
Double Encoding Attacks
Double encoding occurs when data is URL-encoded more than once. An
attacker might submit %2527, which decodes to
%27 on the first pass and to a single quote
' on the second pass. If a security filter only checks
the first decode, it would miss the malicious character. This can
bypass input validation, cross-site scripting (XSS) filters, and SQL
injection protections.
For example, consider a filter that blocks the string
<script>. An attacker might submit
%253Cscript%253E, which the filter sees as harmless
percent-encoded text. If the application decodes it twice, it becomes
<script>, executing the attack.
URL Encoding Is Not Encryption
A common misconception is that URL encoding hides or protects data. It does not. URL encoding is trivially reversible; anyone can decode a percent-encoded string. Never use URL encoding as a substitute for encryption, authentication, or access control. Sensitive data like passwords, API keys, and personal information should never appear in URLs, encoded or not, because URLs are logged by browsers, servers, proxies, and can be visible in referrer headers.
Open Redirect Vulnerabilities
Some applications use URL-encoded redirect parameters like
/redirect?url=https%3A%2F%2Fevil.com. If the application
does not validate the target URL after decoding, attackers can use
this to redirect users to phishing sites. Always validate decoded URL
values against an allowlist of permitted domains.
Path Traversal
URL encoding can be used to disguise path traversal sequences. The
string ../../etc/passwd might appear as
%2e%2e%2f%2e%2e%2f%65%74%63%2f%70%61%73%73%77%64. Web
servers must decode and normalize paths before checking for traversal
attacks.
Practical Examples
Encoding Query Parameters
When building URLs with dynamic query parameters, always encode the values:
const baseUrl = "https://api.example.com/search";
const query = "user input with special chars: & ? = #";
const url = `${baseUrl}?q=${encodeURIComponent(query)}`;
// Result: "https://api.example.com/search?q=user%20input%20with%20special%20chars%3A%20%26%20%3F%20%3D%20%23"
Encoding URL Paths
Path segments that contain special characters need encoding, but you must not encode the path separators:
const category = "electronics & gadgets";
const product = "USB-C cable (6ft)";
const url = `/products/${encodeURIComponent(category)}/${encodeURIComponent(product)}`;
// Result: "/products/electronics%20%26%20gadgets/USB-C%20cable%20(6ft)"
Decoding URL Components
const encoded = "hello%20world%3F%26%3D%23";
const decoded = decodeURIComponent(encoded);
// Result: "hello world?&=#"
Common Pitfalls
-
Using the wrong encoding function: Using
encodeURIfor query parameter values leaves&and=unencoded, breaking the query string structure. Always useencodeURIComponentfor individual parameter values. -
Encoding twice: If your framework already encodes
URL parameters, manually encoding them again results in
double-encoded values like
%2520instead of%20. Know your framework's behavior. - Not decoding server-side: Most web frameworks automatically decode URL parameters, but if you are processing raw request URLs, you must decode them yourself. Failing to do so means your application receives encoded strings instead of the actual data.
-
Confusing URL encoding with Base64: Base64 encoding
is a different mechanism used for encoding binary data as ASCII
text. It produces strings like
aGVsbG8=, not percent-encoded strings. They are not interchangeable. -
Assuming all encodings are the same: URL encoding,
HTML encoding, JavaScript string escaping (
\n,\t), and CSS escaping are all different. Use the correct encoding for each context.
Need to encode or decode URLs quickly? Try our free online URL encoder and decoder tools.
Encode URL Decode URLFrequently Asked Questions
What is URL encoding?
URL encoding, also called percent-encoding, is a mechanism that converts characters into a format that can be safely transmitted in a URL. It replaces unsafe or reserved characters with a percent sign followed by two hexadecimal digits representing the character's byte value. For example, a space becomes %20 and a question mark becomes %3F.
Why do some URLs contain %20?
%20 is the URL-encoded representation of a space character. Spaces are not allowed in URLs because they can be ambiguous and break URL parsing. The percent sign followed by 20 represents the hexadecimal value 0x20, which is the ASCII code for a space. Some systems also use + for spaces in query strings, but %20 is the standard encoding.
What is the difference between URL encoding and HTML encoding?
URL encoding converts characters to percent-encoded format (%XX) for safe transmission in URLs. HTML encoding converts characters to entity references (&, <, etc.) for safe display in HTML documents. They serve different purposes: URL encoding is for URLs, HTML encoding is for HTML content. A value like < needs both: %3C in a URL and < in HTML.
When should I use encodeURIComponent vs encodeURI?
Use encodeURI when encoding a complete URL, as it preserves URL structure characters like :, /, ?, &, and =. Use encodeURIComponent when encoding a single URL component value (like a query parameter), as it encodes ALL special characters including those that have structural meaning in URLs. Always use encodeURIComponent for query parameter values.
Can URL encoding be used for security?
URL encoding alone is not a security measure. It is a transport mechanism, not encryption or authentication. In fact, double encoding attacks can bypass security filters by encoding malicious input multiple times. Always validate and sanitize input on the server side, use parameterized queries for databases, and apply proper output encoding for the target context (HTML, URL, JavaScript).