Understanding URL Encoding: A Complete Developer Guide

URL encoding — also called percent encoding — is how special characters are represented safely inside a URL. Get it wrong and your links break, your form data gets corrupted, and your API requests fail silently. This guide explains the rules, the edge cases, and the right encoding function to use in every language and situation.

Use our interactive URL Encoder / Decoder tool and URL Parser to experiment with encoding as you read.

1. Why URL Encoding Exists

URLs can only contain a limited set of characters defined in RFC 3986. Characters outside this set — spaces, Unicode letters, symbols like &, =, and # — must be encoded before they appear in a URL, otherwise they would be interpreted as URL structure rather than data.

Percent encoding works by replacing the unsafe character with a % sign followed by the character's two-digit hexadecimal byte value:

Space   →  %20
&       →  %26
=       →  %3D
#       →  %23
/       →  %2F
?       →  %3F
+       →  %2B

The name “percent encoding” comes from that % prefix. The RFC-correct term is percent encoding; “URL encoding” is the common name.

2. URL Structure — What Gets Encoded Where

A URL has several distinct components, each with its own encoding rules:

https://example.com:8080/search/web%20dev?q=hello+world&page=2#results
  │       │           │   │               │             │      │
scheme  host        port  path           query        query  fragment
Component Allowed unencoded Must encode
Scheme (https) a-z 0-9 + - . Everything else
Host (example.com) a-z 0-9 - . Unicode domains use Punycode (xn--)
Path (/search/web%20dev) a-z A-Z 0-9 - _ . ~ : @ ! $ & ' ( ) * + , ; = Spaces, ?, #, and characters above ASCII 127
Query key/value (q=hello+world) a-z A-Z 0-9 - _ . ! ~ * ' ( ) Spaces, &, =, #, and most symbols
Fragment (#results) Same as query Spaces and non-ASCII
Key rule: Encode each path segment and each query parameter key/value separately. Never encode the full URL at once — that encodes the structural ://, ?, and & delimiters too.

3. The %20 vs + Space Problem

Two different encodings exist for the space character, and mixing them up is one of the most common URL bugs.

%20 — RFC 3986 Percent Encoding

%20 is the correct percent encoding of a space (ASCII 32 = 0x20). It is valid in every part of a URL: path, query string, and fragment. Use %20 in path segments and in modern APIs.

/files/my%20document.pdf       ✅ correct path encoding
/search?q=hello%20world        ✅ valid in query string too

+ — application/x-www-form-urlencoded

The + as space is only defined in the application/x-www-form-urlencoded media type — the format HTML forms use when submitted. It is not part of the RFC 3986 URL specification.

/search?q=hello+world          ✅ fine in HTML form query strings
/files/my+document.pdf         ❌ + means a literal plus sign in paths

When you decode a query string, you must decode + as a space. When you encode a path segment, never use + for spaces.

Rule: When in doubt, use %20. It is valid everywhere. The + convention is a legacy of HTML forms and should not be used in REST APIs.

4. JavaScript Encoding Functions

JavaScript has two built-in encoding functions that are frequently confused:

encodeURIComponent — Encode a Single Component

Encodes nearly every character except: A–Z a–z 0–9 - _ . ! ~ * ' ( ). This is the function to use for query parameter keys and values, and for path segments.

encodeURIComponent('hello world')      // "hello%20world"
encodeURIComponent('price=50&tax=5')   // "price%3D50%26tax%3D5"
encodeURIComponent('café')             // "caf%C3%A9"

// Building a query string correctly:
const q = 'C++ & algorithms';
const url = `/search?q=${encodeURIComponent(q)}`;
// → /search?q=C%2B%2B%20%26%20algorithms

encodeURI — Encode a Complete URL

Encodes a full URL but deliberately skips structural URL characters: : / ? # [ ] @ ! $ & ' ( ) * + , ; =. Use this only when you have a full URL that may contain spaces or non-ASCII characters.

encodeURI('https://example.com/my page?q=a&b=c')
// → "https://example.com/my%20page?q=a&b=c"

// ⚠️ encodeURI does NOT encode & = ? — they are kept as structure
encodeURI('https://example.com/?a=1&b=hello world')
// → "https://example.com/?a=1&b=hello%20world"  (& and = untouched)

The Common Mistake

// ❌ Wrong — encodeURI leaves & and = unencoded, breaking query values with those chars
const url = `/api?data=${encodeURI(userInput)}`;

// ✅ Correct — always use encodeURIComponent for values
const url = `/api?data=${encodeURIComponent(userInput)}`;
Quick rule: Use encodeURIComponent for query parameter values and path segments. Only use encodeURI if you have a complete URL string that may contain spaces.

5. PHP Encoding Functions

PHP has two main URL encoding functions and they encode spaces differently:

Function Space encoding Standard Use for
rawurlencode() %20 RFC 3986 Path segments, modern APIs
urlencode() + HTML forms Query strings for HTML forms
rawurldecode() %20 → space RFC 3986 Decode path values
urldecode() + → space HTML forms Decode query strings (note: $_GET already decoded)
// Path segment — use rawurlencode
$filename = 'my document.pdf';
$url = '/files/' . rawurlencode($filename);
// → /files/my%20document.pdf

// Query string — use http_build_query (preferred) or urlencode
$params = ['q' => 'hello world', 'page' => 2];
$url = '/search?' . http_build_query($params);
// → /search?q=hello+world&page=2

// ✅ Always use http_build_query for query strings — handles all edge cases
$url = '/api/search?' . http_build_query([
    'query' => $userInput,
    'filter' => 'active&deleted',   // & safely encoded to %26
]);
PHP tip: Never manually build query strings by concatenating values. Always use http_build_query(). It handles encoding, delimiter placement, array serialisation, and edge cases correctly.

6. Python Encoding Functions

Python's urllib.parse module covers all encoding needs:

from urllib.parse import quote, quote_plus, urlencode, urlparse, urljoin

# quote() — RFC 3986, encodes space as %20
# Use for path segments
quote('hello world')           # 'hello%20world'
quote('café & résumé')         # 'caf%C3%A9%20%26%20r%C3%A9sum%C3%A9'

# quote_plus() — encodes space as +
# Use for query string values (form encoding)
quote_plus('hello world')      # 'hello+world'
quote_plus('a=1&b=2')          # 'a%3D1%26b%3D2'

# urlencode() — builds whole query string from dict (uses quote_plus)
urlencode({'q': 'hello world', 'page': 2})
# → 'q=hello+world&page=2'

# Building a full URL safely:
base = 'https://api.example.com/search'
params = urlencode({'query': userInput, 'filter': 'a&b'})
url = f'{base}?{params}'

7. Unicode and Non-ASCII Characters in URLs

Modern URLs can technically contain Unicode characters (called IRIs — Internationalized Resource Identifiers), but when sending an HTTP request, the path and query must be percent-encoded UTF-8 bytes.

// Character: é (U+00E9)
// UTF-8 bytes: 0xC3 0xA9
// Percent encoded: %C3%A9

encodeURIComponent('café')    // "caf%C3%A9"
rawurlencode('café')          // "caf%C3%A9"
urllib.parse.quote('café')    // "caf%C3%A9"

All three languages encode the same characters to the same bytes, following UTF-8 + RFC 3986. Multi-byte characters like Chinese, Arabic, Japanese, and Emoji require multiple %xx sequences:

encodeURIComponent('日本語')   // "%E6%97%A5%E6%9C%AC%E8%AA%9E"
encodeURIComponent('😊')      // "%F0%9F%98%8A"  (4 bytes for emoji)

Internationalized Domain Names (IDN)

Domain names with non-ASCII characters use Punycode encoding, not percent encoding. münchen.de becomes xn--mnchen-3ya.de. Browsers handle this conversion automatically, but if you are building HTTP requests programmatically, use your language's IDN library to convert before making the request.

8. Double Encoding — The Most Common Bug

Double encoding happens when you encode a value that is already encoded. The result contains %25 (the encoding of %) turning every %20 into %2520.

// First encoding:
encodeURIComponent('hello world')     // "hello%20world"

// Accidentally encoding again:
encodeURIComponent('hello%20world')   // "hello%2520world"  ❌ WRONG

// When decoded: "hello%20world" — not "hello world"

To avoid double encoding, follow one rule:

  • Only encode raw, unencoded values
  • Never encode a value received from decodeURIComponent, urldecode(), urllib.parse.unquote(), or from $_GET/$_POST in PHP (already decoded)
  • If unsure, decode first, then re-encode: encode(decode(value))
PHP note: $_GET and $_POST values are already decoded by PHP. Never call urldecode() on them — it will cause double-decode issues. Encode when building outgoing URLs; do not decode incoming ones.

9. Encoding in API Design

When building or consuming REST APIs, follow these rules to avoid encoding bugs:

Query Parameter Values

// ❌ Manually concatenating — breaks when value contains & or =
const url = `/api/search?q=${query}&filter=${filter}`;

// ✅ Use URLSearchParams (browser/Node.js)
const params = new URLSearchParams({ q: query, filter });
const url = `/api/search?${params}`;

// ✅ Use http_build_query (PHP)
$url = '/api/search?' . http_build_query(['q' => $query, 'filter' => $filter]);

// ✅ Use urlencode (Python)
url = f'/api/search?{urlencode({"q": query, "filter": filter})}'

Path Parameters

// ❌ Unencoded path segment — breaks with spaces or slashes in value
const url = `/api/users/${username}/profile`;

// ✅ Encoded path segment
const url = `/api/users/${encodeURIComponent(username)}/profile`;

Avoid Path Traversal

// ❌ Dangerous — if filename is "../../etc/passwd", this is a path traversal attack
$path = '/files/' . $userInput;

// ✅ Encode, then validate on the server that the resolved path stays in /files/
$path = '/files/' . rawurlencode($userInput);
// Server-side: validate realpath() stays within allowed directory
Security note: URL encoding is for data transport, not security. Encoding a value does not make it safe to use in SQL queries, HTML output, or file system paths. Always apply the appropriate escaping for each context (SQL parameterization, HTML entity encoding, path validation).

10. Quick Reference

Task JavaScript PHP Python
Encode a query param value encodeURIComponent(v) urlencode(v) quote_plus(v)
Encode a path segment encodeURIComponent(v) rawurlencode(v) quote(v)
Build a full query string new URLSearchParams(obj) http_build_query(arr) urlencode(dict)
Decode a query param value decodeURIComponent(v) urldecode(v) unquote_plus(v)
Decode a path segment decodeURIComponent(v) rawurldecode(v) unquote(v)
Space in path %20 %20 %20
Space in query string %20 or + + +

Frequently Asked Questions

What is the difference between %20 and + in a URL?

Both represent a space. + is only valid in application/x-www-form-urlencoded query strings (HTML form data). %20 is the RFC 3986 percent encoding and is valid everywhere. In path segments, always use %20.

What is the difference between encodeURI and encodeURIComponent?

encodeURIComponent encodes everything except A–Z a–z 0–9 - _ . ! ~ * ' ( ). Use it for query parameter values and path segments. encodeURI leaves URL structural characters (: / ? & = # [ ] @) unencoded. Use it only if encoding a complete URL string.

When should I use urlencode vs rawurlencode in PHP?

Use rawurlencode() for path segments (encodes space as %20, follows RFC 3986). Use urlencode() or http_build_query() for query strings (encodes space as +).

Why do some URLs show %2520?

%2520 is a double-encoded space. %25 encodes %, so %2520 is literally the string %20 encoded — not a space. This happens when you run an encoding function on a value that is already encoded.

Should I encode the entire URL or just the parts?

Always encode individual components (each path segment, each query value) separately, then assemble the URL. Encoding a complete URL also encodes the structural ://, ?, and & characters, breaking the URL.

What characters are always safe in a URL without encoding?

Unreserved characters: A–Z a–z 0–9 - _ . ~. Everything else should be encoded unless it serves its structural purpose in that URL component.