URL SyntaxUpdated 2025

URL Encoding Standards: Understanding RFC 3986 for UTM Parameters

Master the RFC 3986 URL encoding standard to ensure your UTM parameters are properly formatted. Learn the official rules and avoid common encoding mistakes.

7 min readURL Syntax

"I thought I understood URL encoding until I read RFC 3986. Turns out, most of what I 'knew' was wrong. Understanding the actual spec saved us from countless tracking errors."

This revelation came to Marcus Chen, a senior developer, after debugging why seemingly identical URLs behaved differently across browsers. The answer was in RFC 3986—the official URL encoding standard.

What is RFC 3986?

RFC 3986 is the Internet Engineering Task Force (IETF) standard that defines the syntax of Uniform Resource Identifiers (URIs), including URLs.

Published: January 2005 Replaces: RFC 2396 (1998), RFC 1738 (1994) Status: Internet Standard Official document: https://tools.ietf.org/html/rfc3986

Why it matters: Following RFC 3986 ensures your URLs work consistently across all browsers, servers, and analytics platforms.

🚨 Not sure what's breaking your tracking?

Run a free 60-second audit to check all 40+ ways UTM tracking can fail.

Scan Your Campaigns Free

✓ No credit card ✓ See results instantly

URL Character Categories (Per RFC 3986)

Unreserved Characters (Never Need Encoding)

Always safe to use as-is:

Code
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z
0 1 2 3 4 5 6 7 8 9
- . _ ~

Example:

Code
✅ These values need no encoding:
utm_campaign=summer-sale-2024
utm_source=email_newsletter
utm_content=header.banner
utm_term=project~management

Reserved Characters (Have Special Meaning)

These control URL structure:

Code
: / ? # [ ] @ ! $ & ' ( ) * + , ; =

Must be percent-encoded when used in parameter values:

Code
❌ WRONG (reserved chars in value):
utm_campaign=Q&A Webinar
utm_source=partner.com/blog

✅ CORRECT (avoided, not encoded):
utm_campaign=qa-webinar
utm_source=partner-com-blog

✅ ALSO CORRECT (properly encoded):
utm_campaign=Q%26A%20Webinar
utm_source=partner.com%2Fblog
(But why bother? Just use clean values)

All Other Characters (Must Be Encoded)

Any character not in the unreserved or reserved sets:

Code
Spaces, non-ASCII characters (é, ñ, 中), control characters, etc.

Percent-Encoding Rules (RFC 3986 Section 2.1)

The Format

Syntax: % followed by two hexadecimal digits

Valid hex digits:

Code
0 1 2 3 4 5 6 7 8 9 A B C D E F
(Also lowercase: a b c d e f)

Examples:

Code
✅ VALID:
%20 (space)
%2F (forward slash)
%3A (colon)
%C3%A9 (é in UTF-8)

❌ INVALID:
%2G (G is not hex)
%XY (X and Y not hex)
%2 (incomplete)
% (no digits)

😰 Is this your only tracking issue?

This is just 1 of 40+ ways UTM tracking breaks. Most marketing teams have 8-12 critical issues they don't know about.

• 94% of sites have UTM errors

• Average: $8,400/month in wasted ad spend

• Fix time: 15 minutes with our report

✓ Connects directly to GA4 (read-only, secure)

✓ Scans 90 days of data in 2 minutes

✓ Prioritizes issues by revenue impact

✓ Shows exact sessions affected

Get Your Free Audit Report

Case Sensitivity

RFC 3986 Section 2.1:

"For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."

Meaning:

Code
✅ PREFERRED:
%2F (uppercase)
%3A (uppercase)
%20 (uppercase)

⚠️ WORKS BUT NOT RECOMMENDED:
%2f (lowercase)
%3a (lowercase)
%20 (no difference, only has digits)

Practical impact:

  • Both work in modern browsers
  • Uppercase is the standard
  • Be consistent

Normalization

RFC 3986 Section 6.2.2.2:

"The hexadecimal digits used for percent-encoding may be normalized to uppercase."

Also:

"For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded by URI normalizers."

Translation: Don't encode characters that don't need encoding!

Code
❌ UNNECESSARY ENCODING:
utm_campaign=summer%2Dsale (encoding hyphen, which is unreserved)
utm_source=email%5Fnewsletter (encoding underscore, which is unreserved)

✅ CORRECT (no encoding needed):
utm_campaign=summer-sale
utm_source=email_newsletter

Query String Specifics (RFC 3986 Section 3.4)

Query component syntax:

Code
query = *( pchar / "/" / "?" )
pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

In plain English:

Query strings (the part after ?) can contain:

  • Unreserved characters (a-z, 0-9, -, ., _, ~)
  • Percent-encoded sequences (%XX)
  • Sub-delimiters (!, $, &, ', (, ), *, +, ,, ;, =)
  • Colon (:) and at-sign (@)

But for UTM parameters, stick to unreserved characters only.

Common RFC 3986 Compliance Issues

Issue 1: Invalid Percent-Encoding

Non-compliant:

Code
❌ utm_campaign=sale%2Gspecial (%2G - G is not hex)
❌ utm_source=email% (incomplete encoding)
❌ utm_medium=social%ZZ (%ZZ - ZZ not hex)

RFC 3986 violation: Section 2.1 requires two hexadecimal digits after %.

Compliant:

Code
✅ utm_campaign=sale-special (no encoding needed)
✅ utm_source=email (no encoding needed)
✅ utm_medium=social (no encoding needed)

Issue 2: Over-Encoding

Non-compliant (technically works but violates normalization):

Code
❌ utm_campaign=summer%2Dsale%2D2024
   (Encoding hyphens, which are unreserved)

RFC 3986 normalization: Section 6.2.2.2 says unreserved characters should not be encoded.

Compliant:

Code
✅ utm_campaign=summer-sale-2024

Issue 3: Using Reserved Characters Without Encoding

Non-compliant:

Code
❌ utm_campaign=Q&A Webinar
   (& is reserved, breaks parameter parsing)

❌ utm_source=partner.com/blog
   (/ is reserved, breaks interpretation)

RFC 3986 requirement: Reserved characters in data must be percent-encoded.

Compliant:

Code
✅ BEST (avoid encoding):
utm_campaign=qa-webinar
utm_source=partner-com-blog

✅ ALSO VALID (but unnecessarily complex):
utm_campaign=Q%26A%20Webinar
utm_source=partner.com%2Fblog

Issue 4: Using Spaces

Non-compliant:

Code
❌ utm_campaign=Summer Sale 2024
   (Space is not unreserved, must be encoded)

Two encoding options (both valid per RFC 3986):

Code
Option 1: %20
utm_campaign=Summer%20Sale%202024

Option 2: + (application/x-www-form-urlencoded)
utm_campaign=Summer+Sale+2024

But the BEST option:

Code
✅ PREFERRED (no encoding needed):
utm_campaign=summer-sale-2024

RFC 3986 Compliance Checker

Javascript
function isRFC3986Compliant(url) {
  const issues = [];
 
  try {
    const urlObj = new URL(url);
    const params = urlObj.searchParams;
 
    ['utm_source', 'utm_medium', 'utm_campaign', 'utm_content', 'utm_term'].forEach(param => {
      const value = params.get(param);
      if (!value) return;
 
      // Check 1: Invalid percent encoding
      // Must be % followed by exactly 2 hex digits
      const invalidEncoding = value.match(/%(?![0-9A-Fa-f]{2})/g);
      if (invalidEncoding) {
        issues.push({
          param,
          issue: 'Invalid percent-encoding (RFC 3986 Section 2.1)',
          detail: `Found: ${invalidEncoding.join(', ')}`,
          severity: 'ERROR'
        });
      }
 
      // Check 2: Non-hex digits after %
      const nonHexEncoding = value.match(/%[^0-9A-Fa-f]{2}/g);
      if (nonHexEncoding) {
        issues.push({
          param,
          issue: 'Non-hexadecimal digits in percent-encoding',
          detail: `Found: ${nonHexEncoding.join(', ')}`,
          severity: 'ERROR'
        });
      }
 
      // Check 3: Unnecessarily encoded unreserved characters
      const unnecessaryEncoding = value.match(/%(?:2D|2E|5F|7E|[3-5][0-9A-F]|[4-5][1-9A-F]|6[1-9A-F]|7[0-9A])/gi);
      if (unnecessaryEncoding) {
        issues.push({
          param,
          issue: 'Unreserved characters unnecessarily encoded (RFC 3986 Section 6.2.2.2)',
          detail: `Found: ${unnecessaryEncoding.join(', ')}`,
          severity: 'WARNING'
        });
      }
 
      // Check 4: Lowercase hex in encoding (should be uppercase per RFC)
      const lowercaseHex = value.match(/%[0-9a-f]{2}/g);
      if (lowercaseHex) {
        issues.push({
          param,
          issue: 'Lowercase hexadecimal in encoding (should be uppercase)',
          detail: `Found: ${lowercaseHex.join(', ')}`,
          severity: 'INFO'
        });
      }
 
      // Check 5: Reserved characters not encoded
      const reservedChars = value.match(/[&=?#\[\]@!$'()*+,;]/g);
      if (reservedChars) {
        issues.push({
          param,
          issue: 'Reserved characters in value (should be encoded or avoided)',
          detail: `Found: ${[...new Set(reservedChars)].join(', ')}`,
          severity: 'ERROR'
        });
      }
    });
 
  } catch (e) {
    issues.push({
      param: 'URL',
      issue: 'Malformed URL',
      detail: e.message,
      severity: 'ERROR'
    });
  }
 
  return {
    compliant: issues.filter(i => i.severity === 'ERROR').length === 0,
    issues
  };
}
 
// Usage
const testUrls = [
  'https://example.com?utm_campaign=summer-sale-2024',  // Compliant
  'https://example.com?utm_campaign=summer%2Dsale',     // Warning (unnecessary encoding)
  'https://example.com?utm_campaign=Q&A',               // Error (unencoded &)
  'https://example.com?utm_campaign=sale%2G',           // Error (invalid hex)
];
 
testUrls.forEach(url => {
  const result = isRFC3986Compliant(url);
  console.log(`\nURL: ${"{"}{"{"}url{"}"}{"}"}}`);
  console.log(`Compliant: ${result.compliant}`);
  if (result.issues.length > 0) {
    console.log('Issues:');
    result.issues.forEach(issue => {
      console.log(`  [${issue.severity}] ${issue.param}: ${issue.issue}`);
      if (issue.detail) console.log(`    ${issue.detail}`);
    });
  }
});

Best Practices for RFC 3986 Compliance

1. Use Only Unreserved Characters

Simplest approach:

Code
ALLOWED IN UTM VALUES:
a-z (lowercase letters)
A-Z (uppercase letters, but use lowercase for consistency)
0-9 (numbers)
- (hyphen)
_ (underscore)
. (period)
~ (tilde, though rarely needed)

Example:

Code
utm_source=email-newsletter
utm_medium=paid-social
utm_campaign=summer-sale-2024
utm_content=header-banner-v2
utm_term=project-management-software

No encoding needed, 100% RFC 3986 compliant.

2. If You Must Encode, Do It Properly

Use standard library functions:

Javascript
// JavaScript
const encoded = encodeURIComponent('value with spaces');
// Produces: value%20with%20spaces
 
// Python
from urllib.parse import quote
encoded = quote('value with spaces')
# Produces: value%20with%20spaces
 
// PHP
$encoded = rawurlencode('value with spaces');
// Produces: value%20with%20spaces

Never encode manually:

Code
❌ DON'T:
"I'll just add %20 for spaces"
Result: Often leads to %2G, %XY type mistakes

✅ DO:
Use encodeURIComponent() or equivalent
Result: Always correct

3. Validate Against RFC 3986

Before deploying:

Javascript
function validateRFC3986(url) {
  // Parse URL
  const urlObj = new URL(url);
 
  // Check each parameter
  urlObj.searchParams.forEach((value, key) => {
    // Unreserved: a-zA-Z0-9-._~
    // Percent-encoded: %[0-9A-F]{2}
    // Everything else should be encoded
 
    const validPattern = /^[a-zA-Z0-9\-._~%]*$/;
    if (!validPattern.test(value)) {
      throw new Error(`Parameter ${"{"}{"{"}key{"}"}{"}"}} contains characters that need encoding: ${"{"}{"{"}value{"}"}{"}"}}`);
    }
 
    // Check percent-encoding format
    const percentPattern = /%[0-9A-Fa-f]{2}/g;
    const percentChars = value.match(/%./g) || [];
 
    percentChars.forEach(seq => {
      if (!percentPattern.test(seq)) {
        throw new Error(`Invalid percent encoding in ${"{"}{"{"}key{"}"}{"}"}}: ${"{"}{"{"}seq{"}"}{"}"}}`);
      }
    });
  });
 
  return true;
}

Quick Reference

Character Sets

SetCharactersUse in UTM
Unreserveda-z A-Z 0-9 - . _ ~✅ Always safe, no encoding
Reserved: / ? # [ ] @ ! $ & ' ( ) * + , ; =❌ Avoid or encode
OthersSpaces, unicode, etc.❌ Avoid or encode

Encoding Format

ValidInvalidReason
%20%2GG is not hex
%2F%2fWorks, but uppercase preferred
%3A%3Incomplete (needs 2 digits)
%C3%A9%No digits after %

✅ Fixed this issue? Great! Now check the other 39...

You just fixed one tracking issue. But are your Google Ads doubling sessions? Is Facebook attribution broken? Are internal links overwriting campaigns?

Connects to GA4 (read-only, OAuth secured)

Scans 90 days of traffic in 2 minutes

Prioritizes by revenue impact

Free forever for monthly audits

Run Complete UTM Audit (Free Forever)

Join 2,847 marketers fixing their tracking daily

FAQ

Q: Do I really need to follow RFC 3986?

A: If you want your URLs to work consistently across all platforms, yes. Most modern systems expect RFC 3986 compliance.

Q: What happens if I violate RFC 3986?

A: Unpredictable behavior. Some browsers/servers handle it gracefully, others don't. Why risk it?

Q: Is + the same as %20 for spaces?

A: In application/x-www-form-urlencoded context (forms), yes. But %20 is more universal. Better: avoid spaces entirely with hyphens.

Q: Should I use %2D for hyphens?

A: No. Hyphens are unreserved and should never be encoded. Use - directly.

Q: Can I use lowercase hex digits (%2f instead of %2F)?

A: Both work, but RFC 3986 recommends uppercase. Choose one and be consistent.

Q: Are there any UTF-8 characters I can use without encoding?

A: Only ASCII unreserved characters (a-z, 0-9, -, _, ., ~). All other UTF-8 characters technically need percent-encoding (but better: transliterate to ASCII).

Q: What's the maximum length for a URL per RFC 3986?

A: RFC 3986 doesn't specify a maximum. However, browsers typically limit to 2048 characters. Keep UTM values concise.

Q: Should I normalize URLs before comparing them?

A: Yes, per RFC 3986 Section 6. Convert unnecessary percent-encodings to literal characters, uppercase scheme/host, remove default ports, etc.


Ensure your UTM parameters comply with RFC 3986 and work everywhere. UTMGuard validates your URLs against official standards and catches compliance issues before they cause tracking problems. Start your free audit today.

UTM

Get Your Free Audit in 60 Seconds

Connect GA4, run the scan, and see exactly where tracking is leaking budget. No credit card required.

Trusted by growth teams and agencies to keep attribution clean.