ToolHub
查看所有文章

URL Encoding Explained: What Is Percent-Encoding

Every time you click a link, submit a form, or type a web address, your browser handles URL encoding behind the scenes. Those cryptic strings of percent signs and hexadecimal numbers you see in URLs are not random; they are a carefully designed mechanism that makes the internet work reliably. Understanding URL encoding is essential for web developers, API designers, and anyone who works with web technologies. This guide explains what URL encoding is, why it exists, how it works, and the security implications you need to know about.

What Is URL Encoding?

URL encoding, formally known as percent-encoding, is a mechanism defined in RFC 3986 that converts characters into a format that can be safely transmitted within a Uniform Resource Identifier (URI). The URL specification only allows a subset of ASCII characters to appear directly in a URL. Any character outside this allowed set must be encoded as a percent sign (%) followed by two hexadecimal digits representing the character's byte value.

For example, the space character cannot appear directly in a URL. When encoded, it becomes %20, where 20 is the hexadecimal representation of 32, the ASCII code for a space. Similarly, the less-than symbol < becomes %3C, because the ASCII code for < is 60, which is 3C in hexadecimal.

Why URL Encoding Is Necessary

URLs have a specific syntax that uses certain characters as delimiters. The question mark ? separates the path from the query string. The ampersand & separates query parameters. The equals sign = separates parameter names from values. The forward slash / separates path segments. If these characters appear in the data being transmitted, they would be misinterpreted as URL structure rather than content.

Consider what happens when a user searches for "rock & roll" on a website. Without encoding, the URL would look like /search?q=rock & roll. The browser would interpret the & as a query parameter separator, splitting the query into two parameters: q=rock and a second parameter named roll with no value. URL encoding fixes this: /search?q=rock%20%26%20roll.

How Percent-Encoding Works

The encoding process follows a straightforward algorithm defined by the URI specification:

  1. Identify unreserved characters: The characters A-Z, a-z, 0-9, -, ., _, and ~ are unreserved. They never need encoding and can appear directly in a URL.
  2. Identify reserved characters: The characters :, /, ?, #, [, ], @, !, $, &, ', (, ), *, +, ,, ;, and = are reserved for URL syntax. They must be encoded when used as data rather than delimiters.
  3. Encode everything else: Any character not in the unreserved or reserved set must be percent-encoded. This includes spaces, control characters, non-ASCII characters, and any character outside the ASCII range.

The Encoding Process for Non-ASCII Characters

Non-ASCII characters (like accented letters, CJK characters, and emoji) require an additional step. First, the character is converted to its UTF-8 byte sequence. Then each byte is percent-encoded individually. This means a single character can produce multiple percent-encoded triplets.

For example, the Euro sign () has the UTF-8 byte sequence E2 82 AC. When URL-encoded, it becomes %E2%82%AC. The Chinese character for "middle" () has the UTF-8 sequence E4 B8 AD, producing %E4%B8%AD. The emoji 😀 (grinning face) has a 4-byte UTF-8 sequence, encoding to %F0%9F%98%80.

Common Encoded Characters

The following table shows the most frequently encountered URL-encoded characters:

Character Encoded ASCII Code Reason
Space %20 32 (0x20) Not allowed in URLs
! %21 33 (0x21) Reserved (sub-delim)
# %23 35 (0x23) Reserved (fragment delimiter)
$ %24 36 (0x24) Reserved (sub-delim)
& %26 38 (0x26) Reserved (query separator)
' %27 39 (0x27) Reserved (sub-delim)
( %28 40 (0x28) Reserved (sub-delim)
) %29 41 (0x29) Reserved (sub-delim)
+ %2B 43 (0x2B) Reserved (space in query)
, %2C 44 (0x2C) Reserved (sub-delim)
/ %2F 47 (0x2F) Reserved (path separator)
: %3A 58 (0x3A) Reserved (scheme delimiter)
; %3B 59 (0x3B) Reserved (param separator)
= %3D 61 (0x3D) Reserved (value delimiter)
? %3F 63 (0x3F) Reserved (query delimiter)
@ %40 64 (0x40) Reserved (authority delimiter)
[ %5B 91 (0x5B) Reserved (IPv6 literal)
] %5D 93 (0x5D) Reserved (IPv6 literal)

URL Encoding vs HTML Encoding

URL encoding and HTML entity encoding are often confused, but they serve entirely different purposes and operate in different contexts.

URL Encoding

URL encoding converts characters to %XX format for safe transmission within a URL. It is governed by RFC 3986 and is applied to URL components like the path, query string, and fragment.

HTML Encoding

HTML encoding converts characters to entity references like &amp;, &lt;, and &gt; for safe rendering within an HTML document. It prevents the browser from interpreting content as HTML markup. HTML encoding is governed by the HTML specification.

Key Differences

Aspect URL Encoding HTML Encoding
Context URLs and URIs HTML documents
Format %XX (percent + hex) &name; or &#NNN;
Example for < %3C &lt;
Example for & %26 &amp;
Example for " %22 &quot;
Purpose Safe URL transmission Prevent HTML injection
Specification RFC 3986 HTML Living Standard

In practice, you often need both. When embedding a URL in HTML (like an href attribute), the URL itself needs URL encoding, and the entire attribute value may need HTML encoding if it contains characters like & or ". For example, a link to /search?q=rock&roll in HTML would be written as <a href="/search?q=rock%26roll"> (URL-encoded ampersand) or <a href="/search?q=rock&amp;roll"> (HTML-encoded ampersand). Both are valid, but URL-encoding the ampersand is the more correct approach because it preserves the intended meaning of the data.

URL Encoding in Different Programming Languages

Every major programming language provides built-in functions for URL encoding and decoding. However, the exact behavior varies, and understanding the differences is crucial for avoiding bugs.

JavaScript

JavaScript provides three encoding functions, each with different behavior:

// encodeURI - encodes a complete URL
// Preserves: :, /, ?, #, &, =, +, @, ;, ,, !, ~, *, ', (, )
const url = encodeURI("https://example.com/search?q=hello world");
// Result: "https://example.com/search?q=hello%20world"

// encodeURIComponent - encodes a URL component
// Encodes ALL special characters including URL delimiters
const param = encodeURIComponent("price=100&discount=20");
// Result: "price%3D100%26discount%3D20"

// Never use escape() - it is deprecated
// It does not handle Unicode correctly

The critical distinction is that encodeURI preserves URL structure characters, while encodeURIComponent encodes everything. Use encodeURI when you have a full URL and encodeURIComponent when you are encoding a single parameter value.

Python

from urllib.parse import quote, quote_plus, urlencode

# quote - standard URL encoding (spaces as %20)
encoded = quote("hello world&more")
# Result: "hello%20world%26more"

# quote_plus - spaces become + instead of %20
encoded_plus = quote_plus("hello world")
# Result: "hello+world"

# urlencode - encodes a dictionary as a query string
params = {"q": "hello world", "lang": "en"}
query = urlencode(params)
# Result: "q=hello+world&lang=en"

PHP

// urlencode - spaces become +
$encoded = urlencode("hello world&more");
// Result: "hello+world%26more"

// rawurlencode - RFC 3986 compliant (spaces as %20)
$raw = rawurlencode("hello world&more");
// Result: "hello%20world%26more"

The Space Character: %20 vs +

The space character has two common encodings, and understanding when to use each is important:

The distinction matters because decoding + as a space is only correct in the context of application/x-www-form-urlencoded data. In the URL path component, + is a literal plus sign, not a space. Most server-side frameworks handle this correctly for query strings, but it can cause subtle bugs when encoding paths or working with custom URL schemes.

Security Implications

URL encoding has significant security implications that every web developer must understand.

Double Encoding Attacks

Double encoding occurs when data is URL-encoded more than once. An attacker might submit %2527, which decodes to %27 on the first pass and to a single quote ' on the second pass. If a security filter only checks the first decode, it would miss the malicious character. This can bypass input validation, cross-site scripting (XSS) filters, and SQL injection protections.

For example, consider a filter that blocks the string <script>. An attacker might submit %253Cscript%253E, which the filter sees as harmless percent-encoded text. If the application decodes it twice, it becomes <script>, executing the attack.

URL Encoding Is Not Encryption

A common misconception is that URL encoding hides or protects data. It does not. URL encoding is trivially reversible; anyone can decode a percent-encoded string. Never use URL encoding as a substitute for encryption, authentication, or access control. Sensitive data like passwords, API keys, and personal information should never appear in URLs, encoded or not, because URLs are logged by browsers, servers, proxies, and can be visible in referrer headers.

Open Redirect Vulnerabilities

Some applications use URL-encoded redirect parameters like /redirect?url=https%3A%2F%2Fevil.com. If the application does not validate the target URL after decoding, attackers can use this to redirect users to phishing sites. Always validate decoded URL values against an allowlist of permitted domains.

Path Traversal

URL encoding can be used to disguise path traversal sequences. The string ../../etc/passwd might appear as %2e%2e%2f%2e%2e%2f%65%74%63%2f%70%61%73%73%77%64. Web servers must decode and normalize paths before checking for traversal attacks.

Practical Examples

Encoding Query Parameters

When building URLs with dynamic query parameters, always encode the values:

const baseUrl = "https://api.example.com/search";
const query = "user input with special chars: & ? = #";
const url = `${baseUrl}?q=${encodeURIComponent(query)}`;
// Result: "https://api.example.com/search?q=user%20input%20with%20special%20chars%3A%20%26%20%3F%20%3D%20%23"

Encoding URL Paths

Path segments that contain special characters need encoding, but you must not encode the path separators:

const category = "electronics & gadgets";
const product = "USB-C cable (6ft)";
const url = `/products/${encodeURIComponent(category)}/${encodeURIComponent(product)}`;
// Result: "/products/electronics%20%26%20gadgets/USB-C%20cable%20(6ft)"

Decoding URL Components

const encoded = "hello%20world%3F%26%3D%23";
const decoded = decodeURIComponent(encoded);
// Result: "hello world?&=#"

Common Pitfalls

Security Best Practice: Always encode data for the correct output context. Use URL encoding for URLs, HTML encoding for HTML content, JavaScript encoding for JavaScript contexts, and CSS encoding for style attributes. Never rely on a single encoding function for all contexts.

Need to encode or decode URLs quickly? Try our free online URL encoder and decoder tools.

Encode URL Decode URL

Frequently Asked Questions

What is URL encoding?

URL encoding, also called percent-encoding, is a mechanism that converts characters into a format that can be safely transmitted in a URL. It replaces unsafe or reserved characters with a percent sign followed by two hexadecimal digits representing the character's byte value. For example, a space becomes %20 and a question mark becomes %3F.

Why do some URLs contain %20?

%20 is the URL-encoded representation of a space character. Spaces are not allowed in URLs because they can be ambiguous and break URL parsing. The percent sign followed by 20 represents the hexadecimal value 0x20, which is the ASCII code for a space. Some systems also use + for spaces in query strings, but %20 is the standard encoding.

What is the difference between URL encoding and HTML encoding?

URL encoding converts characters to percent-encoded format (%XX) for safe transmission in URLs. HTML encoding converts characters to entity references (&, <, etc.) for safe display in HTML documents. They serve different purposes: URL encoding is for URLs, HTML encoding is for HTML content. A value like < needs both: %3C in a URL and &lt; in HTML.

When should I use encodeURIComponent vs encodeURI?

Use encodeURI when encoding a complete URL, as it preserves URL structure characters like :, /, ?, &, and =. Use encodeURIComponent when encoding a single URL component value (like a query parameter), as it encodes ALL special characters including those that have structural meaning in URLs. Always use encodeURIComponent for query parameter values.

Can URL encoding be used for security?

URL encoding alone is not a security measure. It is a transport mechanism, not encryption or authentication. In fact, double encoding attacks can bypass security filters by encoding malicious input multiple times. Always validate and sanitize input on the server side, use parameterized queries for databases, and apply proper output encoding for the target context (HTML, URL, JavaScript).