Complete Guide to HTML Entities
HTML entities are one of the most fundamental yet frequently misunderstood aspects of web development. They are the mechanism by which browsers can display characters that would otherwise be interpreted as HTML markup, and they play a critical role in web security. Whether you are encoding special characters to prevent cross-site scripting attacks, displaying mathematical symbols, or ensuring your content renders correctly across different encodings, understanding HTML entities is essential for every web developer.
What Are HTML Entities?
An HTML entity is a sequence of characters that represents a single
character in HTML. Entities begin with an ampersand
(&) and end with a semicolon (;). When
the browser encounters an entity, it replaces it with the
corresponding character during rendering. This mechanism allows you to
include characters in your HTML that would otherwise be impossible or
problematic to represent directly.
The need for entities arises from the fact that HTML uses certain
characters for its own syntax. The less-than sign (<)
starts a tag, the greater-than sign (>) ends a tag,
and the ampersand (&) starts an entity reference. If
you want to display these characters as visible content rather than
markup, you must encode them as entities.
Common HTML Entities
While HTML defines over 2,000 named entities, a small handful are used constantly in everyday web development. These are the entities you will encounter and use most frequently:
The Five Required Entities
| Character | Named Entity | Decimal Entity | Hex Entity | Usage |
|---|---|---|---|---|
| & |
&
|
&
|
&
|
Must always be encoded to avoid ambiguity with entity references |
| < |
<
|
<
|
<
|
Must be encoded to prevent opening an HTML tag |
| > |
>
|
>
|
>
|
Must be encoded to prevent closing an HTML tag |
| " |
"
|
"
|
"
|
Must be encoded inside double-quoted attribute values |
| ' |
'
|
'
|
'
|
Must be encoded inside single-quoted attribute values |
Frequently Used Symbol Entities
| Character | Named Entity | Description |
|---|---|---|
|
Non-breaking space (prevents line break) | |
| © |
©
|
Copyright symbol |
| ® |
®
|
Registered trademark symbol |
| ™ |
™
|
Trademark symbol |
| — |
—
|
Em dash (long dash) |
| – |
–
|
En dash (short dash, for ranges) |
| « |
«
|
Left-pointing double angle quotation mark |
| » |
»
|
Right-pointing double angle quotation mark |
| • |
•
|
Bullet character |
| … |
…
|
Horizontal ellipsis (three dots) |
Numeric vs Named Entities
HTML entities come in two forms: named and numeric. Understanding the difference helps you choose the right one for each situation.
Named Entities
Named entities use a human-readable word to identify the character.
For example, < represents the less-than sign, and
© represents the copyright symbol. Named
entities are easier to read and remember, which makes your HTML source
code more maintainable. However, named entities only exist for a
limited set of characters �?approximately 2,000 out of the more than
140,000 characters in Unicode.
Numeric Entities
Numeric entities reference a character by its Unicode code point. They
come in two forms: decimal and hexadecimal. Decimal numeric entities
use the format &#NNN; where NNN is the decimal code
point. Hexadecimal entities use the format
&#xHHH; where HHH is the hexadecimal code point. For
example, the copyright symbol can be written as
© (decimal) or
© (hexadecimal).
Numeric entities can represent any Unicode character, making them essential for characters that do not have named entity references. If you need to display a rare CJK character, a historical script glyph, or an emoji that lacks a named entity, you must use the numeric form.
When to Use Each
-
Use named entities for the five required characters
(
&,<,>,",') and common symbols like ,©, and—. They are more readable and self-documenting. - Use numeric entities when no named entity exists for the character you need, or when you are generating HTML programmatically and need a consistent, universal encoding method.
- Use hexadecimal numeric entities when working with Unicode code points, as hexadecimal is the standard notation in Unicode documentation and character charts.
When to Use HTML Entities
Knowing when to use entities is just as important as knowing how. Here are the primary scenarios where entities are necessary or beneficial:
1. Displaying Reserved Characters
The most critical use case is encoding characters that HTML reserves
for its own syntax. If you want to display a literal less-than sign in
your content, you must write <. If you write a
bare <, the browser will attempt to parse it as the
start of an HTML tag, which can break your page layout or create
security vulnerabilities.
2. Preventing Ambiguous Ampersands
An ampersand followed by letters and a semicolon is always interpreted
as an entity reference. If you have an ampersand in your content that
is not part of an entity (such as in "Tom & Jerry"), you must
encode it as &. An unencoded ampersand followed
by text that happens to match an entity name will be incorrectly
decoded. For example, "AT&T" without encoding could become "AT&T"
if the browser interprets &T; as an entity reference
(though &T; is not a valid entity, the behavior is
still technically an error).
3. Including Non-Keyboard Characters
Characters like em dashes, copyright symbols, and mathematical
operators may be difficult to type on a standard keyboard. Entities
provide a reliable way to include these characters without depending
on your editor's input method or character map. For example,
— is often easier to type than finding the em
dash character on a keyboard.
4. Non-Breaking Spaces
The entity creates a non-breaking space, which
prevents the browser from wrapping text at that position. This is
essential for keeping related words together on the same line, such as
"10 kg", "Chapter 3", or brand names that should not be
split across lines. It is also commonly used to create visual spacing
in layouts where CSS is not appropriate.
Security Implications: XSS Prevention
HTML entity encoding is one of the most important defenses against Cross-Site Scripting (XSS) attacks. XSS occurs when an attacker is able to inject executable code into a web page viewed by other users. Entity encoding prevents this by converting dangerous characters into their safe entity equivalents.
How Entity Encoding Prevents XSS
Consider a search page that displays the user's query in the results
heading. If a user searches for
<script>alert('xss')</script> and the page
renders this without encoding, the browser will execute the script.
However, if the output is entity-encoded, it becomes
<script>alert('xss')</script>, which the browser displays as literal text rather than executing as
code.
Context-Specific Encoding Rules
Entity encoding is effective in HTML content and attribute contexts, but different contexts require different encoding strategies:
-
HTML content context: Encode
<,>,&as entities. This prevents injection of new HTML tags. -
HTML attribute context: Encode
<,>,&,",'as entities. Quotes must be encoded to prevent breaking out of the attribute value. -
JavaScript context: HTML entities do not work
inside
<script>tags. Use JavaScript string escaping (backslash escapes) instead, or useJSON.stringify()for data injection. -
CSS context: HTML entities do not work inside
<style>tags. Use CSS escaping (backslash followed by hex code) for dynamic values. -
URL context: Use URL encoding
(
encodeURIComponent) for query parameters, not HTML entities.
Entity Encoding in Different Contexts
In HTML Attributes
Attribute values require stricter encoding than HTML content because attributes use quotes as delimiters. A double-quoted attribute value must encode the double quote character, and a single-quoted attribute value must encode the single quote. The ampersand and angle brackets must also be encoded.
<!-- Correct: entity-encoded attribute value -->
<a title="5 > 3 is true">Example</a>
<!-- Incorrect: unencoded special characters -->
<a title="5 > 3 is true">Example</a>
In the incorrect example, the > inside the attribute
value could cause parsing issues in certain contexts, even though most
browsers handle it gracefully. Valid HTML requires encoding these
characters in attributes.
In JavaScript
HTML entities are not processed inside
<script> tags. If you need to include special
characters in JavaScript code embedded in HTML, you must use
JavaScript's own escape sequences:
<!-- Wrong: entities don't work in script tags -->
<script>
var msg = "5 > 3"; // This is literal ">", not ">"
</script>
<!-- Correct: use JavaScript escapes -->
<script>
var msg = "5 \u003E 3"; // Unicode escape
var msg2 = "5 > 3"; // Or just use the character directly
</script>
When injecting data from the server into inline JavaScript, the safest
approach is to use JSON.stringify() on the server side
and place the result inside the script. This handles all necessary
escaping automatically, including quotes, backslashes, and control
characters.
In CSS
Similarly, HTML entities are not processed inside
<style> tags. CSS has its own escape mechanism
using a backslash followed by the character's hexadecimal Unicode code
point:
<!-- Wrong: entities don't work in style tags -->
<style>
.quote::before { content: "“"; }
</style>
<!-- Correct: use CSS Unicode escapes -->
<style>
.quote::before { content: "\201C"; }
</style>
The CSS escape sequence uses a backslash followed by one to six hexadecimal digits. If fewer than six digits are used and the next character could be interpreted as a hex digit, add a space after the escape sequence.
Browser Compatibility
HTML entity support is excellent across all modern browsers. The five
required entities (&, <,
>, ",
') have been supported since the earliest
browsers. Named entities for common symbols like
©, , and
® also have universal support.
The main compatibility concern involves less common named entities
introduced in HTML5 and later specifications. Some older browsers may
not recognize entities like &ngsp; or
≂̸. For maximum compatibility, use
numeric entities for uncommon characters, as numeric entity support
depends only on Unicode support, which is universal in modern
browsers.
One specific note: the ' entity was not part of
HTML 4 and was only defined in XHTML. While all modern browsers
support it, if you need to support very old browsers, use
' instead for single quote encoding.
Practical Tips for Working with Entities
-
Always encode ampersands first: When building an
entity encoder, encode ampersands before other characters. If you
encode angle brackets first, the resulting
<contains an ampersand that would then be double-encoded. - Use a consistent encoding strategy: Decide whether your project will use named entities where available or numeric entities exclusively, and stick with that choice. Mixing styles makes source code harder to read.
- Let your templating engine handle encoding: Modern templating engines like React JSX, Vue templates, and Twig auto-encode output by default. Trust these mechanisms rather than manually encoding everything.
-
Do not double-encode: If content is already
entity-encoded, encoding it again will produce visible entity
references in the output (e.g.,
&lt;displays as<instead of<). This is a common bug when multiple encoding layers are applied. - Test with special characters: Always test your encoding logic with strings that contain quotes, angle brackets, ampersands, and non-ASCII characters to verify correct behavior.
Need to encode or decode HTML entities? Try our free HTML Entity Encoder tool. Process text instantly in your browser with no data sent to any server.
Try Our HTML Entity EncoderFrequently Asked Questions
What is the difference between named and numeric HTML entities?
Named entities use human-readable names like & for ampersand or < for less-than. Numeric entities use the character's Unicode code point, either in decimal (&) or hexadecimal (&) form. Named entities are more readable but only exist for common characters. Numeric entities can represent any Unicode character.
Do HTML entities prevent XSS attacks?
HTML entity encoding is a key defense against XSS when outputting user-supplied data in HTML content and attributes. By encoding special characters like <, >, and &, you prevent attackers from injecting executable HTML or JavaScript. However, entity encoding alone is not sufficient in all contexts �?you need context-specific encoding for JavaScript, CSS, and URL contexts.
When must I use HTML entities?
You must use HTML entities when you need to display characters that have special meaning in HTML: & (ampersand), < (less-than), > (greater-than), " (double quote in attributes), and ' (single quote in attributes). You should also use entities for characters that cannot be typed directly, such as copyright symbols, em dashes, and non-breaking spaces.
Are HTML entities still necessary with UTF-8?
Yes, HTML entities are still necessary even with UTF-8 encoding. While UTF-8 allows you to include most characters directly, the five reserved characters (&, <, >, ", ') must still be encoded as entities when they appear in HTML content or attributes. Entities are also useful for invisible characters like non-breaking spaces and zero-width spaces.
Can I use HTML entities in CSS or JavaScript?
HTML entities are not processed inside <style> or <script> tags. In CSS, use Unicode escape sequences like \2014 for an em dash. In JavaScript strings, use Unicode escapes like \u2014 or the actual UTF-8 character. HTML entities only work in HTML content and attribute values, not in scripting or styling contexts.