Why Are URLs Full of So Many Garbage Characters?

PublishedAugust 4, 2015

We may earn a commission from links on this page.

#?&%! Why do some URLs look like cuss words in comic books? We have the answer.

As you probably know, URLs (uniform resource locators) are basically “addresses” for websites. Type one into your browser, and you’re instantly reading news, watching videos, scheduling sketchy tête-à-têtes, and so on.

Anatomy of an URL

It helps to start with the basic structure of an URL. We’ll use http://www.gizmodo.com as our first example.

http:// is what’s called the “protocol,” which tells your computer how to interact with the server of the site you want to visit. In this case, HTTP tells your computer to expect to receive data that’s been structured for websites.

www.gizmodo.com — this part indicates the name of the server you want to interact with. Think of it like a street address or a phone number.

Now, let’s stop here for a sec. Back in the early days of the internet, these basic URL components were enough. At first, webpages were simple documents that linked to each other. Sean O’Connor, lead application engineer at URL-shortening site Bit.ly, recalls:

In this relatively simply world, only so much information is required to reference one page from another: what protocol should I use (http://), what server should I ask (www.example.com), and what document do I want on that server (/articles/cool-info.html).

However, as the web evolved, so did websites’ capabilities, and thus so did URLs. People wanted their computers to do more interesting, dynamic things beyond fetching static pages. And that’s when URLs started to get more detailed.

Anytime you see a “?” in an URL, for example, the characters that follow it is what’s called “query parameters.” With these extra bits of information, the server can respond dynamically, giving you a webpage based on what you want to see. It might automatically put your name into a field, or provide relevant links based on your web search.

Hence nowadays, links can be long and full of apparent gibberish. Indeed, there are so many different symbols now be included in URLs, the Internet Society ginned up a handy directory of them all.

URLs in the Modern World

Let’s look at what’s happening with some example URLs after the .com part of the address.

http://www.gizmodo.com/tag/gizexplains — Those slashes (/tag/gizexplains) organize the “path” of the request, or where to go within the many files hosted on the server that hosts gizmodo.com. The slashes mark hierarchy within the path, sort of like nested folders.

How about this one? I googled “I like Gizmodo” and here’s what popped up:

https://www.google.com/search?q=i%20like%20gizmodo&rct=j

Now this is where things start to get crazy, but the structure probably also looks familiar, no? This is the kind of URL that appears after you initiate a search, and the parameters you set (like, the keywords you search for) show up in the URL, and each is separated with a plus sign. (Remember, all the search parameters in a given URL follow a question mark.)

But, wait! What if your actual search query already has funky, non-alphanumeric characters in it? What happens to the URL then? Does it explode?

Nope. A different special character simply replaces the original. So if you were to google “What is this?” a new character would replace the question mark. Like a percent sign. We need that question mark to signal in the URL that what follows are search parameters, remember? This is a process called escaping.

Here’s an example: ?term=what+is+this%3F&public=true

“In this case of the ‘what is this?’ value, the question mark would get confusing, given the meaning of question marks within URLs,” O’Connor explains. He continues:

Accordingly, there’s a process called escaping. When you escape, you replace a meaningful character with an alternate representation that won’t cause trouble, but that can be turned back into the original value. Examples of that here are replacing spaces with plus signs and replacing the value’s question mark with %3F.

You might see numbers in a search results URL, too. Like, “%20” sandwiched in between words. That’s a form of escaping, too — it represents a space.

If you see any equals signs in an URL, they’re for separating keys from values in any key-value pairs, and ampersands separate different pairs. A key-value pair could be like, “page=5.” Here, we’re talking about the “page” of the website as a key, and “5” is the value, or fifth page.

&rct=j — Let’s look back at “I like Gizmodo.” In some cases, like this one, it’s very possible that it’s impossible to figure out what any one chunk of URL vomit can exactly mean. “That being said, it is pretty common for parameters to be used for keeping track of information that only has meaning to the site that is using to them,” O’Connor says. “Accordingly, they may not be publicly documented or explained.”

#section-result — Finally, the pound sign. (Or hashtag, depending on how old you are.) It’s a URL fragment and acts as a caboose. Says O’Connor: “Everything at the end of a URL after a hash sign is special in that it is never sent to the server and it is exclusively used by the web browser. Often this is used to refer to sections within a document but sometimes it can be used for other purposes.”

Static and Dynamic

Now that we’ve got that cleared up, you should know that URLs can be categorized into two types, according to how many crazy characters are included. The two types are static and dynamic.

Static URLs are those that contain only dots, slashes, dashes, and underscores. They tend to traffic better than dynamic URLs and rank higher in Google searches, since they’re easier to read and remember.

The wackier dynamic ones are a grab bag: question marks, ampersands, equal signs, exclamation points, asterisks, and other keyboard symbols snake their way into these navigation bars. These URLs are impossible to remember, totally unusable in branding campaigns, and generally see lower click-through rates.

I mean, obviously, no one is going to use a dynamic URL from a search query in some kind of marketing mission or plaster it on a business card. But people want to tweet specialized URLs to very specific content, or share it in a presentation, and pesky character limits get in the way. When you shrink URLs with Bitly or TinyURL or Google URL Shortener or Ow.ly, those services aren’t getting rid of the goofy characters in dynamic URLs; they simply store that information somewhere else. When a user clicks on the shortened link, they’re redirected to where the original, longer link leads.

It’s a somewhat complicated system, but not one that’s going away or being replaced any time soon. (The Twitter Age’s link shorteners have been the closest thing to a revolution.) And in the future, our direct contact or familiarity with URLs might decrease further, especially since lots of content, like news articles, are shared on Facebook, and other people access sites by browsing social media feeds. (Or in some cases, content is now being directly published on sites like Facebook, which really eliminates the need to manually type in an URL.)

In the near future, URLs could end up like phone numbers: They’re everywhere, we use them every day, but will only know the important ones off the top of our heads.

Open kinja-labs.com

Show all 119 comments

Why Are URLs Full of So Many Garbage Characters?

Related Content

Related Content

Anatomy of an URL

URLs in the Modern World

Static and Dynamic