URL Encoding Standards: Understanding RFC 3986 for UTM Parameters
Master the RFC 3986 URL encoding standard to ensure your UTM parameters are properly formatted. Learn the official rules and avoid common encoding mistakes.
"I thought I understood URL encoding until I read RFC 3986. Turns out, most of what I 'knew' was wrong. Understanding the actual spec saved us from countless tracking errors."
This revelation came to Marcus Chen, a senior developer, after debugging why seemingly identical URLs behaved differently across browsers. The answer was in RFC 3986—the official URL encoding standard.
What is RFC 3986?
RFC 3986 is the Internet Engineering Task Force (IETF) standard that defines the syntax of Uniform Resource Identifiers (URIs), including URLs.
Published: January 2005 Replaces: RFC 2396 (1998), RFC 1738 (1994) Status: Internet Standard Official document: https://tools.ietf.org/html/rfc3986
Why it matters: Following RFC 3986 ensures your URLs work consistently across all browsers, servers, and analytics platforms.
Table of contents
- What is RFC 3986?
- URL Character Categories (Per RFC 3986)
- Unreserved Characters (Never Need Encoding)
- Reserved Characters (Have Special Meaning)
- All Other Characters (Must Be Encoded)
- Percent-Encoding Rules (RFC 3986 Section 2.1)
- The Format
- Case Sensitivity
- Normalization
- Query String Specifics (RFC 3986 Section 3.4)
- Common RFC 3986 Compliance Issues
- Issue 1: Invalid Percent-Encoding
- Issue 2: Over-Encoding
- Issue 3: Using Reserved Characters Without Encoding
- Issue 4: Using Spaces
- RFC 3986 Compliance Checker
- Best Practices for RFC 3986 Compliance
- 1. Use Only Unreserved Characters
- 2. If You Must Encode, Do It Properly
- 3. Validate Against RFC 3986
- Quick Reference
- Character Sets
- Encoding Format
- FAQ
- Q: Do I really need to follow RFC 3986?
- Q: What happens if I violate RFC 3986?
- Q: Is + the same as %20 for spaces?
- Q: Should I use %2D for hyphens?
- Q: Can I use lowercase hex digits (%2f instead of %2F)?
- Q: Are there any UTF-8 characters I can use without encoding?
- Q: What's the maximum length for a URL per RFC 3986?
- Q: Should I normalize URLs before comparing them?
🚨 Not sure what's breaking your tracking?
Run a free 60-second audit to check all 40+ ways UTM tracking can fail.
Scan Your Campaigns Free✓ No credit card ✓ See results instantly
URL Character Categories (Per RFC 3986)
Unreserved Characters (Never Need Encoding)
Always safe to use as-is:
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9
- . _ ~
Example:
✅ These values need no encoding:
utm_campaign=summer-sale-2024
utm_source=email_newsletter
utm_content=header.banner
utm_term=project~management
Reserved Characters (Have Special Meaning)
These control URL structure:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
Must be percent-encoded when used in parameter values:
❌ WRONG (reserved chars in value):
utm_campaign=Q&A Webinar
utm_source=partner.com/blog
✅ CORRECT (avoided, not encoded):
utm_campaign=qa-webinar
utm_source=partner-com-blog
✅ ALSO CORRECT (properly encoded):
utm_campaign=Q%26A%20Webinar
utm_source=partner.com%2Fblog
(But why bother? Just use clean values)
All Other Characters (Must Be Encoded)
Any character not in the unreserved or reserved sets:
Spaces, non-ASCII characters (é, ñ, 中), control characters, etc.
Percent-Encoding Rules (RFC 3986 Section 2.1)
The Format
Syntax: % followed by two hexadecimal digits
Valid hex digits:
0 1 2 3 4 5 6 7 8 9 A B C D E F
(Also lowercase: a b c d e f)
Examples:
✅ VALID:
%20 (space)
%2F (forward slash)
%3A (colon)
%C3%A9 (é in UTF-8)
❌ INVALID:
%2G (G is not hex)
%XY (X and Y not hex)
%2 (incomplete)
% (no digits)
😰 Is this your only tracking issue?
This is just 1 of 40+ ways UTM tracking breaks. Most marketing teams have 8-12 critical issues they don't know about.
• 94% of sites have UTM errors
• Average: $8,400/month in wasted ad spend
• Fix time: 15 minutes with our report
✓ Connects directly to GA4 (read-only, secure)
✓ Scans 90 days of data in 2 minutes
✓ Prioritizes issues by revenue impact
✓ Shows exact sessions affected
Case Sensitivity
RFC 3986 Section 2.1:
"For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."
Meaning:
✅ PREFERRED:
%2F (uppercase)
%3A (uppercase)
%20 (uppercase)
⚠️ WORKS BUT NOT RECOMMENDED:
%2f (lowercase)
%3a (lowercase)
%20 (no difference, only has digits)
Practical impact:
- Both work in modern browsers
- Uppercase is the standard
- Be consistent
Normalization
RFC 3986 Section 6.2.2.2:
"The hexadecimal digits used for percent-encoding may be normalized to uppercase."
Also:
"For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded by URI normalizers."
Translation: Don't encode characters that don't need encoding!
❌ UNNECESSARY ENCODING:
utm_campaign=summer%2Dsale (encoding hyphen, which is unreserved)
utm_source=email%5Fnewsletter (encoding underscore, which is unreserved)
✅ CORRECT (no encoding needed):
utm_campaign=summer-sale
utm_source=email_newsletter
Query String Specifics (RFC 3986 Section 3.4)
Query component syntax:
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"
In plain English:
Query strings (the part after ?) can contain:
- Unreserved characters (a-z, 0-9,
-,.,_,~) - Percent-encoded sequences (
%XX) - Sub-delimiters (
!,$,&,',(,),*,+,,,;,=) - Colon (
:) and at-sign (@)
But for UTM parameters, stick to unreserved characters only.
Common RFC 3986 Compliance Issues
Issue 1: Invalid Percent-Encoding
Non-compliant:
❌ utm_campaign=sale%2Gspecial (%2G - G is not hex)
❌ utm_source=email% (incomplete encoding)
❌ utm_medium=social%ZZ (%ZZ - ZZ not hex)
RFC 3986 violation: Section 2.1 requires two hexadecimal digits after %.
Compliant:
✅ utm_campaign=sale-special (no encoding needed)
✅ utm_source=email (no encoding needed)
✅ utm_medium=social (no encoding needed)
Issue 2: Over-Encoding
Non-compliant (technically works but violates normalization):
❌ utm_campaign=summer%2Dsale%2D2024
(Encoding hyphens, which are unreserved)
RFC 3986 normalization: Section 6.2.2.2 says unreserved characters should not be encoded.
Compliant:
✅ utm_campaign=summer-sale-2024
Issue 3: Using Reserved Characters Without Encoding
Non-compliant:
❌ utm_campaign=Q&A Webinar
(& is reserved, breaks parameter parsing)
❌ utm_source=partner.com/blog
(/ is reserved, breaks interpretation)
RFC 3986 requirement: Reserved characters in data must be percent-encoded.
Compliant:
✅ BEST (avoid encoding):
utm_campaign=qa-webinar
utm_source=partner-com-blog
✅ ALSO VALID (but unnecessarily complex):
utm_campaign=Q%26A%20Webinar
utm_source=partner.com%2Fblog
Issue 4: Using Spaces
Non-compliant:
❌ utm_campaign=Summer Sale 2024
(Space is not unreserved, must be encoded)
Two encoding options (both valid per RFC 3986):
Option 1: %20
utm_campaign=Summer%20Sale%202024
Option 2: + (application/x-www-form-urlencoded)
utm_campaign=Summer+Sale+2024
But the BEST option:
✅ PREFERRED (no encoding needed):
utm_campaign=summer-sale-2024
RFC 3986 Compliance Checker
function isRFC3986Compliant(url) {
const issues = [];
try {
const urlObj = new URL(url);
const params = urlObj.searchParams;
['utm_source', 'utm_medium', 'utm_campaign', 'utm_content', 'utm_term'].forEach(param => {
const value = params.get(param);
if (!value) return;
// Check 1: Invalid percent encoding
// Must be % followed by exactly 2 hex digits
const invalidEncoding = value.match(/%(?![0-9A-Fa-f]{2})/g);
if (invalidEncoding) {
issues.push({
param,
issue: 'Invalid percent-encoding (RFC 3986 Section 2.1)',
detail: `Found: ${invalidEncoding.join(', ')}`,
severity: 'ERROR'
});
}
// Check 2: Non-hex digits after %
const nonHexEncoding = value.match(/%[^0-9A-Fa-f]{2}/g);
if (nonHexEncoding) {
issues.push({
param,
issue: 'Non-hexadecimal digits in percent-encoding',
detail: `Found: ${nonHexEncoding.join(', ')}`,
severity: 'ERROR'
});
}
// Check 3: Unnecessarily encoded unreserved characters
const unnecessaryEncoding = value.match(/%(?:2D|2E|5F|7E|[3-5][0-9A-F]|[4-5][1-9A-F]|6[1-9A-F]|7[0-9A])/gi);
if (unnecessaryEncoding) {
issues.push({
param,
issue: 'Unreserved characters unnecessarily encoded (RFC 3986 Section 6.2.2.2)',
detail: `Found: ${unnecessaryEncoding.join(', ')}`,
severity: 'WARNING'
});
}
// Check 4: Lowercase hex in encoding (should be uppercase per RFC)
const lowercaseHex = value.match(/%[0-9a-f]{2}/g);
if (lowercaseHex) {
issues.push({
param,
issue: 'Lowercase hexadecimal in encoding (should be uppercase)',
detail: `Found: ${lowercaseHex.join(', ')}`,
severity: 'INFO'
});
}
// Check 5: Reserved characters not encoded
const reservedChars = value.match(/[&=?#\[\]@!$'()*+,;]/g);
if (reservedChars) {
issues.push({
param,
issue: 'Reserved characters in value (should be encoded or avoided)',
detail: `Found: ${[...new Set(reservedChars)].join(', ')}`,
severity: 'ERROR'
});
}
});
} catch (e) {
issues.push({
param: 'URL',
issue: 'Malformed URL',
detail: e.message,
severity: 'ERROR'
});
}
return {
compliant: issues.filter(i => i.severity === 'ERROR').length === 0,
issues
};
}
// Usage
const testUrls = [
'https://example.com?utm_campaign=summer-sale-2024', // Compliant
'https://example.com?utm_campaign=summer%2Dsale', // Warning (unnecessary encoding)
'https://example.com?utm_campaign=Q&A', // Error (unencoded &)
'https://example.com?utm_campaign=sale%2G', // Error (invalid hex)
];
testUrls.forEach(url => {
const result = isRFC3986Compliant(url);
console.log(`\nURL: ${"{"}{"{"}url{"}"}{"}"}}`);
console.log(`Compliant: ${result.compliant}`);
if (result.issues.length > 0) {
console.log('Issues:');
result.issues.forEach(issue => {
console.log(` [${issue.severity}] ${issue.param}: ${issue.issue}`);
if (issue.detail) console.log(` ${issue.detail}`);
});
}
});Best Practices for RFC 3986 Compliance
1. Use Only Unreserved Characters
Simplest approach:
ALLOWED IN UTM VALUES:
a-z (lowercase letters)
A-Z (uppercase letters, but use lowercase for consistency)
0-9 (numbers)
- (hyphen)
_ (underscore)
. (period)
~ (tilde, though rarely needed)
Example:
utm_source=email-newsletter
utm_medium=paid-social
utm_campaign=summer-sale-2024
utm_content=header-banner-v2
utm_term=project-management-software
No encoding needed, 100% RFC 3986 compliant.
2. If You Must Encode, Do It Properly
Use standard library functions:
// JavaScript
const encoded = encodeURIComponent('value with spaces');
// Produces: value%20with%20spaces
// Python
from urllib.parse import quote
encoded = quote('value with spaces')
# Produces: value%20with%20spaces
// PHP
$encoded = rawurlencode('value with spaces');
// Produces: value%20with%20spacesNever encode manually:
❌ DON'T:
"I'll just add %20 for spaces"
Result: Often leads to %2G, %XY type mistakes
✅ DO:
Use encodeURIComponent() or equivalent
Result: Always correct
3. Validate Against RFC 3986
Before deploying:
function validateRFC3986(url) {
// Parse URL
const urlObj = new URL(url);
// Check each parameter
urlObj.searchParams.forEach((value, key) => {
// Unreserved: a-zA-Z0-9-._~
// Percent-encoded: %[0-9A-F]{2}
// Everything else should be encoded
const validPattern = /^[a-zA-Z0-9\-._~%]*$/;
if (!validPattern.test(value)) {
throw new Error(`Parameter ${"{"}{"{"}key{"}"}{"}"}} contains characters that need encoding: ${"{"}{"{"}value{"}"}{"}"}}`);
}
// Check percent-encoding format
const percentPattern = /%[0-9A-Fa-f]{2}/g;
const percentChars = value.match(/%./g) || [];
percentChars.forEach(seq => {
if (!percentPattern.test(seq)) {
throw new Error(`Invalid percent encoding in ${"{"}{"{"}key{"}"}{"}"}}: ${"{"}{"{"}seq{"}"}{"}"}}`);
}
});
});
return true;
}Quick Reference
Character Sets
| Set | Characters | Use in UTM |
|---|---|---|
| Unreserved | a-z A-Z 0-9 - . _ ~ | ✅ Always safe, no encoding |
| Reserved | : / ? # [ ] @ ! $ & ' ( ) * + , ; = | ❌ Avoid or encode |
| Others | Spaces, unicode, etc. | ❌ Avoid or encode |
Encoding Format
| Valid | Invalid | Reason |
|---|---|---|
%20 | %2G | G is not hex |
%2F | %2f | Works, but uppercase preferred |
%3A | %3 | Incomplete (needs 2 digits) |
%C3%A9 | % | No digits after % |
✅ Fixed this issue? Great! Now check the other 39...
You just fixed one tracking issue. But are your Google Ads doubling sessions? Is Facebook attribution broken? Are internal links overwriting campaigns?
• Connects to GA4 (read-only, OAuth secured)
• Scans 90 days of traffic in 2 minutes
• Prioritizes by revenue impact
• Free forever for monthly audits
Join 2,847 marketers fixing their tracking daily
FAQ
Q: Do I really need to follow RFC 3986?
A: If you want your URLs to work consistently across all platforms, yes. Most modern systems expect RFC 3986 compliance.
Q: What happens if I violate RFC 3986?
A: Unpredictable behavior. Some browsers/servers handle it gracefully, others don't. Why risk it?
Q: Is + the same as %20 for spaces?
A: In application/x-www-form-urlencoded context (forms), yes. But %20 is more universal. Better: avoid spaces entirely with hyphens.
Q: Should I use %2D for hyphens?
A: No. Hyphens are unreserved and should never be encoded. Use - directly.
Q: Can I use lowercase hex digits (%2f instead of %2F)?
A: Both work, but RFC 3986 recommends uppercase. Choose one and be consistent.
Q: Are there any UTF-8 characters I can use without encoding?
A: Only ASCII unreserved characters (a-z, 0-9, -, _, ., ~). All other UTF-8 characters technically need percent-encoding (but better: transliterate to ASCII).
Q: What's the maximum length for a URL per RFC 3986?
A: RFC 3986 doesn't specify a maximum. However, browsers typically limit to 2048 characters. Keep UTM values concise.
Q: Should I normalize URLs before comparing them?
A: Yes, per RFC 3986 Section 6. Convert unnecessary percent-encodings to literal characters, uppercase scheme/host, remove default ports, etc.
Ensure your UTM parameters comply with RFC 3986 and work everywhere. UTMGuard validates your URLs against official standards and catches compliance issues before they cause tracking problems. Start your free audit today.