URL Encode/Decode Tool

URL Encode and Decode Tool

This tool has been designed to either encode or decode a string or text. In order to allow the websites to function globally, URIs need to be encoded in a uniform manner. To map so many characters that is used worldwide into 60 or so characters are allowed to be used, there is a two step process that is followed:

  1. Firstly, the character string is converted into bytes sequence by using UTF-8 encoding
  2. Any byte that is not ASCII letter or digit must be converted to %HH where HH stands as the hexadecimal value of the byte.

For example, the string: François, would be encoded as: Fran%C3%A7ois

(The "ç" is encoded in UTF-8 as two bytes C3 (hex) and A7 (hex), which are then written as the three characters "%c3" and "%a7" respectively.) while this process makes the URI very long (up to 9 ASCII characters for a single Unicode character), but it does help the browsers in only displaying the text in decoded format. There are also many other protocols that may send UTF-8 without the %HH escaping.

What does URL Encoding Mean?

URL encoding means that certain characters in a URL will be replaced with one or more character triplets which contain a percent character followed by the hexadecimal digits. The two hexadecimal digits of the triplet(s) are represented by the numeric value of the character that has been replaced.

However, the term URL encoding may not have been used in the most correct manner as the encoding process is not restricted to just the URLs but may also be used on any other form of URIs which also includes the URNs. This is the reason; the term present encoding is what is most widely preferred.

Which Characters Are Allowed in a URL?

The characters that are allowed in a URI are of either reserved nature or unreserved (it may also include the percent character as a part of percent-encoding). Reserved characters have been defined as those characters which sometimes may have special meanings. The unreserved characters, on the other hand, have no meaning at all. While using the percent-encoding, the characters that may not be allowed to be displayed are represented using the characters that are allowed. The sets of combination of reserved and unreserved characters as well as the situations under which some reserved characters have been allotted special meaning may modify slightly with each revision of specification that dictate the URIs and URI schemes.

According to RFC 3986, the characters shown in a URL must be borrowed from a defined set of reserved and unreserved ASCII characters. Any other character outside of this set is not allowed to be used.

The unreserved characters are although encoded but should not be encoded. The unreserved characters are:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 - _ . ~ However, unlike unreserved characters, the reserved ones have to be encoded but only under certain specific situations. The reserved characters are: ! * ' ( ) ; : @ & = + $ , / ? % # [ ]

Encoding/Decoding a Piece of Text

RFC 3986 does not follow a specific rule guide according to which non-ASCII characters can be encoded (e.g. the umlauts ä, ö, ü). URL encoding involves a set of hexadecimal digits and as that pair, these digits are equal to a value of 8 bits. This means that it is theoretically possible to use one of the 8 bit code pages for non-ASCII characters (e.g. ISO-8859-1 for umlauts).

However, there are also many languages that have their own 8-bit code page; it can be a cumbersome process to handle all the different 8-bit code pages. There are also some languages that cannot be accommodated into the 8-bit code page, such as Chinese. Therefore, RFC 3629 proposes that UTF-8 character encoding table can be used for the non-ASCII characters.

The URL encoder/decoder takes this fact into account and offers to provide a choice between the ASCII character encoding table and the UTF-8 character encoding table. However, when you opt for the ASCII character encoding table, a warning message is displayed if the URL encoded/decoded text contains non-ASCII characters.

When and why would you use URL encoding?

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications. In addition, the CGI specification contains rules for how web servers decode data of this type and make it available to applications.

When the data has been sent in an HTTP GET request, application/x-www-form-url encoded data is included in the query component of the request URI. When the same data is sent in an HTTP POST request or via email, the data is displayed in the body of the message, and the name of the media type is included in the message's Content-Type header.