Why do some URLs have double-encoded characters like %2520?

%2520 means the % sign was encoded again. %25 is the encoding of %, so %2520 is actually an encoded percent-encoded space (%20). This happens when you call an encoding function on a string that is already encoded. Always encode raw values, never encode values that are already encoded.

What characters are safe in a URL without encoding?

Unreserved characters are always safe: letters A–Z a–z, digits 0–9, and the four symbols - _ . ~. Everything else should be percent-encoded. Reserved characters (: / ? # [ ] @ ! $ & ' ( ) * + , ; =) are only safe where they serve their structural purpose in the URL.

Understanding URL Encoding: A Complete Developer Guide

1. Why URL Encoding Exists

URLs can only contain a limited set of characters defined in RFC 3986. Characters outside this set — spaces, Unicode letters, symbols like &, =, and # — must be encoded before they appear in a URL, otherwise they would be interpreted as URL structure rather than data.

Percent encoding works by replacing the unsafe character with a % sign followed by the character's two-digit hexadecimal byte value:

Space   →  %20
&       →  %26
=       →  %3D
#       →  %23
/       →  %2F
?       →  %3F
+       →  %2B

The name “percent encoding” comes from that % prefix. The RFC-correct term is percent encoding; “URL encoding” is the common name.

2. URL Structure — What Gets Encoded Where

A URL has several distinct components, each with its own encoding rules:

https://example.com:8080/search/web%20dev?q=hello+world&page=2#results
  │       │           │   │               │             │      │
scheme  host        port  path           query        query  fragment

Component	Allowed unencoded	Must encode
Scheme (https)	a-z 0-9 + - .	Everything else
Host (example.com)	a-z 0-9 - .	Unicode domains use Punycode (xn--)
Path (/search/web%20dev)	a-z A-Z 0-9 - _ . ~ : @ ! $ & ' ( ) * + , ; =	Spaces, ?, #, and characters above ASCII 127
Query key/value (q=hello+world)	a-z A-Z 0-9 - _ . ! ~ * ' ( )	Spaces, &, =, #, and most symbols
Fragment (#results)	Same as query	Spaces and non-ASCII

Key rule: Encode each path segment and each query parameter key/value separately. Never encode the full URL at once — that encodes the structural ://, ?, and & delimiters too.

3. The %20 vs + Space Problem

Two different encodings exist for the space character, and mixing them up is one of the most common URL bugs.

%20 — RFC 3986 Percent Encoding

%20 is the correct percent encoding of a space (ASCII 32 = 0x20). It is valid in every part of a URL: path, query string, and fragment. Use %20 in path segments and in modern APIs.

/files/my%20document.pdf       ✅ correct path encoding
/search?q=hello%20world        ✅ valid in query string too

+ — application/x-www-form-urlencoded

The + as space is only defined in the application/x-www-form-urlencoded media type — the format HTML forms use when submitted. It is not part of the RFC 3986 URL specification.

/search?q=hello+world          ✅ fine in HTML form query strings
/files/my+document.pdf         ❌ + means a literal plus sign in paths

When you decode a query string, you must decode + as a space. When you encode a path segment, never use + for spaces.

Rule: When in doubt, use %20. It is valid everywhere. The + convention is a legacy of HTML forms and should not be used in REST APIs.

4. JavaScript Encoding Functions

JavaScript has two built-in encoding functions that are frequently confused:

encodeURIComponent — Encode a Single Component

Encodes nearly every character except: A–Z a–z 0–9 - _ . ! ~ * ' ( ). This is the function to use for query parameter keys and values, and for path segments.

encodeURIComponent('hello world')      // "hello%20world"
encodeURIComponent('price=50&tax=5')   // "price%3D50%26tax%3D5"
encodeURIComponent('café')             // "caf%C3%A9"

// Building a query string correctly:
const q = 'C++ & algorithms';
const url = `/search?q=${encodeURIComponent(q)}`;
// → /search?q=C%2B%2B%20%26%20algorithms

encodeURI — Encode a Complete URL

Encodes a full URL but deliberately skips structural URL characters: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Use this only when you have a full URL that may contain spaces or non-ASCII characters.

encodeURI('https://example.com/my page?q=a&b=c')
// → "https://example.com/my%20page?q=a&b=c"

// ⚠️ encodeURI does NOT encode & = ? — they are kept as structure
encodeURI('https://example.com/?a=1&b=hello world')
// → "https://example.com/?a=1&b=hello%20world"  (& and = untouched)

The Common Mistake

// ❌ Wrong — encodeURI leaves & and = unencoded, breaking query values with those chars
const url = `/api?data=${encodeURI(userInput)}`;

// ✅ Correct — always use encodeURIComponent for values
const url = `/api?data=${encodeURIComponent(userInput)}`;

Quick rule: Use encodeURIComponent for query parameter values and path segments. Only use encodeURI if you have a complete URL string that may contain spaces.

5. PHP Encoding Functions

PHP has two main URL encoding functions and they encode spaces differently:

Function	Space encoding	Standard	Use for
rawurlencode()	%20	RFC 3986	Path segments, modern APIs
urlencode()	+	HTML forms	Query strings for HTML forms
rawurldecode()	%20 → space	RFC 3986	Decode path values
urldecode()	+ → space	HTML forms	Decode query strings (note: $_GET already decoded)

// Path segment — use rawurlencode
$filename = 'my document.pdf';
$url = '/files/' . rawurlencode($filename);
// → /files/my%20document.pdf

// Query string — use http_build_query (preferred) or urlencode
$params = ['q' => 'hello world', 'page' => 2];
$url = '/search?' . http_build_query($params);
// → /search?q=hello+world&page=2

// ✅ Always use http_build_query for query strings — handles all edge cases
$url = '/api/search?' . http_build_query([
    'query' => $userInput,
    'filter' => 'active&deleted',   // & safely encoded to %26
]);

PHP tip: Never manually build query strings by concatenating values. Always use http_build_query(). It handles encoding, delimiter placement, array serialisation, and edge cases correctly.

6. Python Encoding Functions

Python's urllib.parse module covers all encoding needs:

from urllib.parse import quote, quote_plus, urlencode, urlparse, urljoin

# quote() — RFC 3986, encodes space as %20
# Use for path segments
quote('hello world')           # 'hello%20world'
quote('café & résumé')         # 'caf%C3%A9%20%26%20r%C3%A9sum%C3%A9'

# quote_plus() — encodes space as +
# Use for query string values (form encoding)
quote_plus('hello world')      # 'hello+world'
quote_plus('a=1&b=2')          # 'a%3D1%26b%3D2'

# urlencode() — builds whole query string from dict (uses quote_plus)
urlencode({'q': 'hello world', 'page': 2})
# → 'q=hello+world&page=2'

# Building a full URL safely:
base = 'https://api.example.com/search'
params = urlencode({'query': userInput, 'filter': 'a&b'})
url = f'{base}?{params}'

7. Unicode and Non-ASCII Characters in URLs

Modern URLs can technically contain Unicode characters (called IRIs — Internationalized Resource Identifiers), but when sending an HTTP request, the path and query must be percent-encoded UTF-8 bytes.

// Character: é (U+00E9)
// UTF-8 bytes: 0xC3 0xA9
// Percent encoded: %C3%A9

encodeURIComponent('café')    // "caf%C3%A9"
rawurlencode('café')          // "caf%C3%A9"
urllib.parse.quote('café')    // "caf%C3%A9"

All three languages encode the same characters to the same bytes, following UTF-8 + RFC 3986. Multi-byte characters like Chinese, Arabic, Japanese, and Emoji require multiple %xx sequences:

encodeURIComponent('日本語')   // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent('😊')      // "%F0%9F%98%8A"  (4 bytes for emoji)

Internationalized Domain Names (IDN)

Domain names with non-ASCII characters use Punycode encoding, not percent encoding. münchen.de becomes xn--mnchen-3ya.de. Browsers handle this conversion automatically, but if you are building HTTP requests programmatically, use your language's IDN library to convert before making the request.

8. Double Encoding — The Most Common Bug

Double encoding happens when you encode a value that is already encoded. The result contains %25 (the encoding of %) turning every %20 into %2520.

// First encoding:
encodeURIComponent('hello world')     // "hello%20world"

// Accidentally encoding again:
encodeURIComponent('hello%20world')   // "hello%2520world"  ❌ WRONG

// When decoded: "hello%20world" — not "hello world"

To avoid double encoding, follow one rule:

Only encode raw, unencoded values
Never encode a value received from decodeURIComponent, urldecode(), urllib.parse.unquote(), or from $_GET/$_POST in PHP (already decoded)
If unsure, decode first, then re-encode: encode(decode(value))

PHP note: $_GET and $_POST values are already decoded by PHP. Never call urldecode() on them — it will cause double-decode issues. Encode when building outgoing URLs; do not decode incoming ones.

9. Encoding in API Design

When building or consuming REST APIs, follow these rules to avoid encoding bugs:

Query Parameter Values

// ❌ Manually concatenating — breaks when value contains & or =
const url = `/api/search?q=${query}&filter=${filter}`;

// ✅ Use URLSearchParams (browser/Node.js)
const params = new URLSearchParams({ q: query, filter });
const url = `/api/search?${params}`;

// ✅ Use http_build_query (PHP)
$url = '/api/search?' . http_build_query(['q' => $query, 'filter' => $filter]);

// ✅ Use urlencode (Python)
url = f'/api/search?{urlencode({"q": query, "filter": filter})}'

Path Parameters

// ❌ Unencoded path segment — breaks with spaces or slashes in value
const url = `/api/users/${username}/profile`;

// ✅ Encoded path segment
const url = `/api/users/${encodeURIComponent(username)}/profile`;

Avoid Path Traversal

// ❌ Dangerous — if filename is "../../etc/passwd", this is a path traversal attack
$path = '/files/' . $userInput;

// ✅ Encode, then validate on the server that the resolved path stays in /files/
$path = '/files/' . rawurlencode($userInput);
// Server-side: validate realpath() stays within allowed directory

Security note: URL encoding is for data transport, not security. Encoding a value does not make it safe to use in SQL queries, HTML output, or file system paths. Always apply the appropriate escaping for each context (SQL parameterization, HTML entity encoding, path validation).

10. Quick Reference

Task	JavaScript	PHP	Python
Encode a query param value	encodeURIComponent(v)	urlencode(v)	quote_plus(v)
Encode a path segment	encodeURIComponent(v)	rawurlencode(v)	quote(v)
Build a full query string	new URLSearchParams(obj)	http_build_query(arr)	urlencode(dict)
Decode a query param value	decodeURIComponent(v)	urldecode(v)	unquote_plus(v)
Decode a path segment	decodeURIComponent(v)	rawurldecode(v)	unquote(v)
Space in path	%20	%20	%20
Space in query string	%20 or +	+	+

Frequently Asked Questions

What is the difference between %20 and + in a URL?

Both represent a space. + is only valid in application/x-www-form-urlencoded query strings (HTML form data). %20 is the RFC 3986 percent encoding and is valid everywhere. In path segments, always use %20.

What is the difference between encodeURI and encodeURIComponent?

encodeURIComponent encodes everything except A–Z a–z 0–9 - _ . ! ~ * ' ( ). Use it for query parameter values and path segments. encodeURI leaves URL structural characters (: / ? & = # [ ] @) unencoded. Use it only if encoding a complete URL string.

When should I use urlencode vs rawurlencode in PHP?

Use rawurlencode() for path segments (encodes space as %20, follows RFC 3986). Use urlencode() or http_build_query() for query strings (encodes space as +).

Why do some URLs show %2520?

%2520 is a double-encoded space. %25 encodes %, so %2520 is literally the string %20 encoded — not a space. This happens when you run an encoding function on a value that is already encoded.

Should I encode the entire URL or just the parts?

Always encode individual components (each path segment, each query value) separately, then assemble the URL. Encoding a complete URL also encodes the structural ://, ?, and & characters, breaking the URL.

What characters are always safe in a URL without encoding?

Unreserved characters: A–Z a–z 0–9 - _ . ~. Everything else should be encoded unless it serves its structural purpose in that URL component.