HTML URL Encoding

Table of Content

Introduction to HTML URL Encoding

Definition of URL Encoding:

Imagine the internet as a highway where cars (data) travel. URL encoding is a way of making sure that all the cars can safely travel this highway without crashing into each other. It takes special characters like spaces, plus signs (+), and question marks (?) and turns them into codes that computers can understand.

Why We Need URL Encoding:

When you type in a website address (URL) into your browser, it sends a request to the website's server. The server then sends back the website's content to your browser. However, some characters, like spaces or special symbols, can cause problems for the server.

For example, a space in a URL might be interpreted by the server as the end of the URL. This can lead to errors or unexpected results. URL encoding prevents these problems by converting these special characters into codes that the server can easily understand.

When to Use URL Encoding:

You should use URL encoding whenever you need to send data in a URL that contains special characters. This includes:

  • When you're adding parameters to a URL (like when you're searching for something on a website)
  • When you're creating a link to a specific page
  • When you're sending data from one page to another

Example:

If you want to search for "How to write a good URL" on Google, the URL would be:

https://www.google.com/search?q=How%20to%20write%20a%20good%20URL

In this URL, the space character has been URL encoded as "%20". This ensures that the server will correctly interpret the URL and send back the search results.


Basic Concepts of URL Encoding

ASCII and Unicode:

ASCII (American Standard Code for Information Interchange)

  • A code that represents characters, numbers, and symbols using 7-bit binary numbers.
  • Each character is assigned a unique number from 0 to 127.
  • Common characters like letters, digits, and punctuation have ASCII codes.

Unicode

  • A more advanced code that represents characters using 16-bit or 32-bit binary numbers.
  • Covers a wider range of characters, including many languages, mathematical symbols, and emojis.
  • Each character has a unique Unicode code point, which is the numerical representation of the character.

Converting Characters to ASCII Codes

  • You can use a character to ASCII converter tool to find the ASCII code for a character.
  • For example, the ASCII code for the letter "A" is 65.

URLSafe Characters

  • A set of characters that are safe to use in URLs (web addresses).
  • Includes letters, digits, hyphens, underscores, periods, ampersands, and equal signs.
  • Other characters, such as spaces or punctuation, need to be encoded before using them in URLs.

Example:

To convert the word "Apple" to ASCII codes:

  • A -> 65
  • p -> 112
  • p -> 112
  • l -> 108
  • e -> 101

Therefore, the ASCII code for "Apple" is 65112112108101.


Types of URL Encoding

%Encoding (Standard Encoding)

When we type something on a computer, it gets converted into numbers. These numbers are then sent over the internet. However, sometimes these numbers can change or get corrupted while being sent. To prevent this, we use %encoding.

%Encoding replaces certain characters with their %code, which is a combination of a % sign and a number. This way, the original characters are preserved and can be restored after being sent.

Base64 Encoding

Base64 encoding is a way to convert any type of data into a string of characters that can be easily transmitted or stored. It uses a set of 64 characters, including uppercase and lowercase letters, numbers, and special characters.

For example, the word "Hello" can be encoded into "SGVsbG8=". This makes it easier to transmit and store, as the encoded string is smaller and less likely to be corrupted.

Unicode Transformation Formats (UTFs)

UTFs are a set of standards that define how characters from different languages are represented in computers. UTF-8 is the most common UTF. It uses a variable-length encoding, where characters are represented by one or more bytes.

By using UTF-8, we can represent characters from all written languages and avoid the need for separate encodings for different languages.


Encoding Process

Steps to Encoding URLs

What is URL Encoding?

When you type a website address into your browser, it sends a request to a server. However, some characters, such as spaces and special symbols, can cause problems for the server. To prevent these issues, special characters are encoded, which means they are converted into a format that the server can understand.

How to Encode URLs

To encode a URL, you can use an online tool or do it manually. Here are the steps to do it manually:

  • Identify the special characters: Identify the special characters in the URL that need to be encoded. These characters include spaces, punctuation marks, and other non-alphanumeric characters. Here are some common ones: Space: , Exclamation mark: ! , Double quote: " , Hash: # , Dollar sign: $ , Percent: % , Ampersand: & , Single quote: ' , Left parenthesis: ( , Right parenthesis: ) , Plus: + , Comma: , , Slash: / , Colon: : , Semicolon: ; , Equals: = , Question mark: ? , At symbol: @ , Left bracket: [ , Right bracket: ] ,
  • Find the encoded value:Find the percent-encoded value for each special character. Percent-encoding uses a % followed by two hexadecimal digits representing the ASCII value of the character. Here are some examples: Space: %20, Exclamation mark: %21, Double quote: %22, Hash: %23, Dollar sign: %24, Percent: %25, Ampersand: %26, Single quote: %27, Left parenthesis: %28, Right parenthesis: %29, Plus: %2B, Comma: %2C, Slash: %2F, Colon: %3A, Semicolon: %3B, Equals: %3D, Question mark: %3F, At symbol: %40, Left bracket: %5B, Right bracket: %5D,
  • Replace the character with its encoded value:Replace each special character in the URL with its corresponding percent-encoded value.

    Example

    Let's encode the URL https://example.com/search?query=hello world&category=books manually.

    • Identify the special characters: (space) and &.
    • Find the encoded values: %20 for space and %26 for &.
    • Replace the characters with their encoded values:
      • Original: https://example.com/search?query=hello world&category=books
      • Encoded: https://example.com/search?query=hello%20world%26category=books

Best Practices for Encoding

  • Encode all special characters.
  • Use UTF-8 encoding for international characters.
  • Avoid encoding reserved characters such as "/" or "?".
  • Test your encoded URLs to ensure they work correctly.

Conclusion:

URL encoding is an important step for sending requests to servers. By understanding the steps and using the best practices, you can ensure that your URLs are properly formatted and will be processed correctly.


Decoding Process

Steps to Decoding URLs with Encoded Spaces

URLs (Uniform Resource Locators) are like addresses that help us find specific web pages. Sometimes, spaces in URLs are encoded using a special character, '%20'. This is done because spaces are not allowed in URLs.

Decoding URLs with Encoded Spaces:

  1. Identify the encoded space: Look for '%20' in the URL.
  2. Replace '%20' with a space: Simply change '%20' to a regular space ' '.
  3. Paste the decoded URL: Copy and paste the decoded URL into your web browser's address bar.

Example:

Encoded URL: https://www.example.com/welcome%20page Decoded URL: https://www.example.com/welcome page

Common Mistakes to Avoid:

  • Using the wrong encoding: Make sure you replace '%20' with a space, not the other way around (' ' to '%20').
  • Decoding the wrong part: Only decode the spaces that are encoded. Don't change other characters.
  • Using spaces instead of '%20': If you see raw spaces in a URL (e.g., https://www.example.com/welcome page), those are not encoded and don't need to be changed.

Common URL Encoding Scenarios

Encoding Non-English Characters

When we type in our native language, such as Hindi or Arabic, our computers use a special code called "Unicode" to represent these characters. However, when we send this information over the internet or store it in files, there might be issues if our system does not support Unicode.

To solve this problem, we use an "encoding method" to convert these special characters into a format that can be understood by all systems. This makes sure that the data is displayed and processed correctly.

Encoding Query Parameters

Query parameters are used to pass information from a web form or search bar to a web server. When we enter information into a form, such as our name or email address, the data is sent to the server along with the URL in the form of query parameters.

However, if we include non-English characters in our query parameters, it may not be correctly understood by the web server. To prevent this, we use a specific encoding method, such as UTF-8, to convert these characters into a format that is compatible with most web servers.

Encoding File Names

When we save a file on our computer, the system uses a file name to identify and locate it. However, if we use non-English characters in the file name, we might face problems with compatibility on different systems.

To avoid such issues, we can use an encoding method to convert the special characters in the file name into a format that can be handled by all systems. This helps to avoid errors or data loss when working with files containing non-English characters.


Benefits and Limitations of URL Encoding

Advantages of URL Encoding

  • Ensures Compatibility: URL encoding replaces special characters (like spaces, slashes, and ampersands) with their corresponding codes, making URLs readable by web browsers and servers.
  • Prevents Misinterpretation: Without URL encoding, special characters could be mistaken for commands or other information, leading to errors. Encoding makes sure they are treated as part of the URL.
  • Facilitates Data Transmission: URL encoding helps transmit data securely over the internet, as it prevents data from being altered or corrupted during transmission.

Limitations of URL Encoding

  • Character Length Limitations: Some characters cannot be encoded into a single byte, which can lead to longer URLs.
  • Difficult to Read: URL-encoded URLs can be difficult for humans to read and understand, as they contain special characters and codes.
  • Potential for Errors: URL encoding can introduce errors if special characters are not encoded properly.

Alternatives to URL Encoding

  • Query Parameters: Instead of encoding special characters in the URL itself, they can be included as part of the query string using the "&" symbol (e.g., "example.com?query=my+name").
  • Path Parameters: In certain cases, path parameters can be used to pass data without needing to URL encode it (e.g., "example.com/user/my-name").
  • Base64 Encoding: Base64 encoding can be used to encode binary data or other non-URL-friendly characters into a string of letters, numbers, and symbols.