Understanding URL Encoding: A Complete Developer Guide
URL encoding — also called percent encoding — is how special characters are
represented safely inside a URL. Get it wrong and your links break, your form data gets
corrupted, and your API requests fail silently. This guide explains the rules, the edge cases,
and the right encoding function to use in every language and situation.
1. Why URL Encoding Exists
URLs can only contain a limited set of characters defined in
RFC 3986.
Characters outside this set — spaces, Unicode letters, symbols like &,
=, and # — must be encoded before they appear in a URL,
otherwise they would be interpreted as URL structure rather than data.
Percent encoding works by replacing the unsafe character with a
% sign followed by the character's two-digit hexadecimal byte value:
Space → %20
& → %26
= → %3D
# → %23
/ → %2F
? → %3F
+ → %2B
The name “percent encoding” comes from that % prefix.
The RFC-correct term is percent encoding; “URL encoding” is the common name.
2. URL Structure — What Gets Encoded Where
A URL has several distinct components, each with its own encoding rules:
https://example.com:8080/search/web%20dev?q=hello+world&page=2#results
│ │ │ │ │ │ │
scheme host port path query query fragment
| Component |
Allowed unencoded |
Must encode |
| Scheme (https) |
a-z 0-9 + - . |
Everything else |
| Host (example.com) |
a-z 0-9 - . |
Unicode domains use Punycode (xn--) |
| Path (/search/web%20dev) |
a-z A-Z 0-9 - _ . ~ : @ ! $ & ' ( ) * + , ; = |
Spaces, ?, #, and characters above ASCII 127 |
| Query key/value (q=hello+world) |
a-z A-Z 0-9 - _ . ! ~ * ' ( ) |
Spaces, &, =, #, and most symbols |
| Fragment (#results) |
Same as query |
Spaces and non-ASCII |
Key rule: Encode each path segment and each query parameter key/value separately.
Never encode the full URL at once — that encodes the structural ://,
?, and & delimiters too.
3. The %20 vs + Space Problem
Two different encodings exist for the space character, and mixing them up is one of the
most common URL bugs.
%20 — RFC 3986 Percent Encoding
%20 is the correct percent encoding of a space (ASCII 32 = 0x20).
It is valid in every part of a URL: path, query string, and fragment.
Use %20 in path segments and in modern APIs.
/files/my%20document.pdf ✅ correct path encoding
/search?q=hello%20world ✅ valid in query string too
+ — application/x-www-form-urlencoded
The + as space is only defined in the
application/x-www-form-urlencoded media type — the format HTML forms
use when submitted. It is not part of the RFC 3986 URL specification.
/search?q=hello+world ✅ fine in HTML form query strings
/files/my+document.pdf ❌ + means a literal plus sign in paths
When you decode a query string, you must decode + as a space.
When you encode a path segment, never use + for spaces.
Rule: When in doubt, use %20. It is valid everywhere.
The + convention is a legacy of HTML forms and should not be used in REST APIs.
4. JavaScript Encoding Functions
JavaScript has two built-in encoding functions that are frequently confused:
encodeURIComponent — Encode a Single Component
Encodes nearly every character except: A–Z a–z 0–9 - _ . ! ~ * ' ( ).
This is the function to use for query parameter keys and values, and for path segments.
encodeURIComponent('hello world') // "hello%20world"
encodeURIComponent('price=50&tax=5') // "price%3D50%26tax%3D5"
encodeURIComponent('café') // "caf%C3%A9"
// Building a query string correctly:
const q = 'C++ & algorithms';
const url = `/search?q=${encodeURIComponent(q)}`;
// → /search?q=C%2B%2B%20%26%20algorithms
encodeURI — Encode a Complete URL
Encodes a full URL but deliberately skips structural URL characters:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =.
Use this only when you have a full URL that may contain spaces or non-ASCII characters.
encodeURI('https://example.com/my page?q=a&b=c')
// → "https://example.com/my%20page?q=a&b=c"
// ⚠️ encodeURI does NOT encode & = ? — they are kept as structure
encodeURI('https://example.com/?a=1&b=hello world')
// → "https://example.com/?a=1&b=hello%20world" (& and = untouched)
The Common Mistake
// ❌ Wrong — encodeURI leaves & and = unencoded, breaking query values with those chars
const url = `/api?data=${encodeURI(userInput)}`;
// ✅ Correct — always use encodeURIComponent for values
const url = `/api?data=${encodeURIComponent(userInput)}`;
Quick rule: Use encodeURIComponent for query parameter values
and path segments. Only use encodeURI if you have a complete URL string that
may contain spaces.
5. PHP Encoding Functions
PHP has two main URL encoding functions and they encode spaces differently:
| Function |
Space encoding |
Standard |
Use for |
| rawurlencode() |
%20 |
RFC 3986 |
Path segments, modern APIs |
| urlencode() |
+ |
HTML forms |
Query strings for HTML forms |
| rawurldecode() |
%20 → space |
RFC 3986 |
Decode path values |
| urldecode() |
+ → space |
HTML forms |
Decode query strings (note: $_GET already decoded) |
// Path segment — use rawurlencode
$filename = 'my document.pdf';
$url = '/files/' . rawurlencode($filename);
// → /files/my%20document.pdf
// Query string — use http_build_query (preferred) or urlencode
$params = ['q' => 'hello world', 'page' => 2];
$url = '/search?' . http_build_query($params);
// → /search?q=hello+world&page=2
// ✅ Always use http_build_query for query strings — handles all edge cases
$url = '/api/search?' . http_build_query([
'query' => $userInput,
'filter' => 'active&deleted', // & safely encoded to %26
]);
PHP tip: Never manually build query strings by concatenating values.
Always use http_build_query(). It handles encoding, delimiter placement,
array serialisation, and edge cases correctly.
6. Python Encoding Functions
Python's urllib.parse module covers all encoding needs:
from urllib.parse import quote, quote_plus, urlencode, urlparse, urljoin
# quote() — RFC 3986, encodes space as %20
# Use for path segments
quote('hello world') # 'hello%20world'
quote('café & résumé') # 'caf%C3%A9%20%26%20r%C3%A9sum%C3%A9'
# quote_plus() — encodes space as +
# Use for query string values (form encoding)
quote_plus('hello world') # 'hello+world'
quote_plus('a=1&b=2') # 'a%3D1%26b%3D2'
# urlencode() — builds whole query string from dict (uses quote_plus)
urlencode({'q': 'hello world', 'page': 2})
# → 'q=hello+world&page=2'
# Building a full URL safely:
base = 'https://api.example.com/search'
params = urlencode({'query': userInput, 'filter': 'a&b'})
url = f'{base}?{params}'
7. Unicode and Non-ASCII Characters in URLs
Modern URLs can technically contain Unicode characters (called IRIs —
Internationalized Resource Identifiers), but when sending an HTTP request,
the path and query must be percent-encoded UTF-8 bytes.
// Character: é (U+00E9)
// UTF-8 bytes: 0xC3 0xA9
// Percent encoded: %C3%A9
encodeURIComponent('café') // "caf%C3%A9"
rawurlencode('café') // "caf%C3%A9"
urllib.parse.quote('café') // "caf%C3%A9"
All three languages encode the same characters to the same bytes, following UTF-8 +
RFC 3986. Multi-byte characters like Chinese, Arabic, Japanese, and Emoji require
multiple %xx sequences:
encodeURIComponent('日本語') // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent('😊') // "%F0%9F%98%8A" (4 bytes for emoji)
Internationalized Domain Names (IDN)
Domain names with non-ASCII characters use Punycode encoding, not
percent encoding. münchen.de becomes xn--mnchen-3ya.de.
Browsers handle this conversion automatically, but if you are building HTTP requests
programmatically, use your language's IDN library to convert before making the request.
8. Double Encoding — The Most Common Bug
Double encoding happens when you encode a value that is already encoded. The result
contains %25 (the encoding of %) turning every %20
into %2520.
// First encoding:
encodeURIComponent('hello world') // "hello%20world"
// Accidentally encoding again:
encodeURIComponent('hello%20world') // "hello%2520world" ❌ WRONG
// When decoded: "hello%20world" — not "hello world"
To avoid double encoding, follow one rule:
- Only encode raw, unencoded values
- Never encode a value received from
decodeURIComponent, urldecode(), urllib.parse.unquote(), or from $_GET/$_POST in PHP (already decoded)
- If unsure, decode first, then re-encode:
encode(decode(value))
PHP note: $_GET and $_POST values are
already decoded by PHP. Never call urldecode() on them — it will
cause double-decode issues. Encode when building outgoing URLs; do not decode incoming ones.
9. Encoding in API Design
When building or consuming REST APIs, follow these rules to avoid encoding bugs:
Query Parameter Values
// ❌ Manually concatenating — breaks when value contains & or =
const url = `/api/search?q=${query}&filter=${filter}`;
// ✅ Use URLSearchParams (browser/Node.js)
const params = new URLSearchParams({ q: query, filter });
const url = `/api/search?${params}`;
// ✅ Use http_build_query (PHP)
$url = '/api/search?' . http_build_query(['q' => $query, 'filter' => $filter]);
// ✅ Use urlencode (Python)
url = f'/api/search?{urlencode({"q": query, "filter": filter})}'
Path Parameters
// ❌ Unencoded path segment — breaks with spaces or slashes in value
const url = `/api/users/${username}/profile`;
// ✅ Encoded path segment
const url = `/api/users/${encodeURIComponent(username)}/profile`;
Avoid Path Traversal
// ❌ Dangerous — if filename is "../../etc/passwd", this is a path traversal attack
$path = '/files/' . $userInput;
// ✅ Encode, then validate on the server that the resolved path stays in /files/
$path = '/files/' . rawurlencode($userInput);
// Server-side: validate realpath() stays within allowed directory
Security note: URL encoding is for data transport, not security.
Encoding a value does not make it safe to use in SQL queries, HTML output, or
file system paths. Always apply the appropriate escaping for each context
(SQL parameterization, HTML entity encoding, path validation).
10. Quick Reference
| Task |
JavaScript |
PHP |
Python |
| Encode a query param value |
encodeURIComponent(v) |
urlencode(v) |
quote_plus(v) |
| Encode a path segment |
encodeURIComponent(v) |
rawurlencode(v) |
quote(v) |
| Build a full query string |
new URLSearchParams(obj) |
http_build_query(arr) |
urlencode(dict) |
| Decode a query param value |
decodeURIComponent(v) |
urldecode(v) |
unquote_plus(v) |
| Decode a path segment |
decodeURIComponent(v) |
rawurldecode(v) |
unquote(v) |
| Space in path |
%20 |
%20 |
%20 |
| Space in query string |
%20 or + |
+ |
+ |
Frequently Asked Questions
What is the difference between %20 and + in a URL?
Both represent a space. + is only valid in
application/x-www-form-urlencoded query strings (HTML form data).
%20 is the RFC 3986 percent encoding and is valid everywhere.
In path segments, always use %20.
What is the difference between encodeURI and encodeURIComponent?
encodeURIComponent encodes everything except A–Z a–z 0–9 - _ . ! ~ * ' ( ).
Use it for query parameter values and path segments.
encodeURI leaves URL structural characters (: / ? & = # [ ] @)
unencoded. Use it only if encoding a complete URL string.
When should I use urlencode vs rawurlencode in PHP?
Use rawurlencode() for path segments (encodes space as %20, follows RFC 3986).
Use urlencode() or http_build_query() for query strings
(encodes space as +).
Why do some URLs show %2520?
%2520 is a double-encoded space. %25 encodes %,
so %2520 is literally the string %20 encoded — not a space.
This happens when you run an encoding function on a value that is already encoded.
Should I encode the entire URL or just the parts?
Always encode individual components (each path segment, each query value) separately,
then assemble the URL. Encoding a complete URL also encodes the structural
://, ?, and & characters, breaking the URL.
What characters are always safe in a URL without encoding?
Unreserved characters: A–Z a–z 0–9 - _ . ~.
Everything else should be encoded unless it serves its structural purpose in that URL component.