AJAX browsing is a beautiful thing; it provides the user with a much more responsive, fluid, and engaging browsing experience. But is it searchable? That is, can a search engine get the same content as would be delivered to users via an AJAX link? Google is very aware of the need to do something about this little conundrum, and has indeed offered a solution. It is however a bit ugly, so here is my slightly more elegant solution as seen on Callisto.fm and soon to be seen on other site I'm finishing.
The Problem
The user wants to navigate through the site, but why reload the entire page? Why interrupt playback of audio or video? Static elements such as the header, nav, and footer are already there, so why waste time and bandwidth serving them again?
Pulling content into the browser via AJAX is easy, but how do we make that same content visible to search engines? In effect, the following two URLs need to deliver the same content:
http://www.callisto.fm/#/browse/by/channel/
http://www.callisto.fm/browse/by/channel/
The Solution
The solution is comprised of three key steps:
1. Serve traditional relative URLs so that search engines can reach them
2. Use javascript to make select URLs AJAX driven on the client side
3. Have the server respond with only unique content for any AJAX request
Let's first take a look at a portion of the raw HTML as sent by the server for a plain ol' GET of http://www.callisto.fm/ . (Apologies in advance for the wierd spaces inside anchor tags in code view. For some reason Posterous was refactoring the HTML entities and actually showing a link instead of code.)
<ul class="nav main">
<li class="listen">< a id="btnListen" class="hash selected" href="/">Listen</a ></li>
<li class="browse">< a id="btnBrowse" class="hash" href="/browse/by/channel/">Browse</a > </li>
<li class="search">< a id="btnSearch" class="hash" href="/search/">Search</a ></li>
</ul>
Notice the anchors in the code above: the href attributes do not contain a hash tag. This allows search engines to reach that URL and get the appropriate content. Of course that's old stuff we learned in HTML 101 which we need, but we also need those URLs to be AJAX driven as far as the user in concerned.
Now notice that the anchors have the class "hash" assigned to them. We use jQuery and the following bit of javascript to dynamically insert the hash tag:
$('a.hash').each(function()
{
this.href='/#'+$(this).attr('href');
});
I'm sure you've already figured out what this code doing: javascript is grabbing every anchor with the class="hash" and prepending the hash tag to the URL. Because search engines dont execute javascript, this solution falls neatly into place when the code is rendered in a browser.
The Client-Side AJAX Request
Now that our href's have been modified to include the necessary hash, we need to implement the AJAX request and display the content. Take a look at the following code:
if("onhashchange" in window)
{
$(window).bind('hashchange',function(){Ajax.onHashChange();});
}
Modern browsers make this really easy by supporting the onhashchange event of the window object. (The final code will also support older browsers by polling window.location). In the code above we are binding the onhashchange event to the onHashChange() function of the Ajax object. (The complete code is available at the end of this post.)
With this in place, any time the hash tag of the browsers address bar changes, our code will be fired. It is this code that requests a hashed URL via AJAX:
onHashChange:function()
{
this.hash=decodeURIComponent(window.location.hash); // safari returns the hash encoded
if (this.hash!='') // only make a request if the hash is not empty
{
var page = this.hash.replace('#/','/');
$.ajax(
{
type: "GET",
url: page,
dataType: "html",
success:function(data, stat, Xhr)
{
$('#content').html(data);
document.title=Xhr.getResponseHeader("X-XHR-Title");
}
});
}
}
Here we are only making a request if the hash is not empty. The success callback is where the content gets displayed. In this example we are injecting the new HTML into an element with id="content". (Note: it is possible to make that dynamic using custom HTTP headers). Now that our HTML has been requested and inserted, we need to update the browser's title bar, and we do that with a custom HTTP header called "X-XHR-Title". Setting those server-side is easy enough as you'll see below.
So, given this URL:
http://www.callisto.fm/#/browse/by/channel/
the hash value would be:
/browse/by/channel/
thus the AJAX request would perform a GET for:
http://www.callisto.fm/browse/by/channel/
The Server-Side Response to an AJAX Request
Herein lies the trick that ties it all together. A normal request for the previous URL would deliver the entire page—header, nav, footer and all. But we dont want all those elements delivered again; we only want the content unique to that page, so we need to make the server aware of AJAX requests and modify our output accordingly.
The
Zend Framework makes this very easy, as I'm sure does other frameworks like Symfony or Rails, so the concept here is mainly what I'm speaking to.
In a framework that utilizes a MVC approach and layouts, the unique content per page would be the view, and the more static elements such as header and footer would be a part of the layout. So when handling an AJAX request, we simply disable the layout and allow the view to be rendered and served all by itself. Voila. That's it. Here it is as written in PHP as part of a Zend Framework-based application:
if ($this->getRequest()->isXmlHttpRequest())
{
$this->getHelper('layout')->disableLayout();
}
This little, but crucial piece is usually placed in App_Controller_Action::init() but could easily be placed in the bootstrap or a front controller plugin.
As for the title HTTP header, have a look at this function as found in App_Controller_Action::init() :
protected function setAjaxTitle($title)
{
// replace weird characters that can cause issues in delivery and display of the title
$title = preg_replace('/[^(\x20-\x7F)]*/','', $title);
// set the response header
$this->getResponse()->setHeader('X-XHR-Title',$title,true);
}
And you would call this function from a controller action, setting the title specifically for the associated view.
One Last Thing—The Bow on Top
All of this works perfectly well, if the user enters the site from the home page. But we know well that they absolutely need to enter from any URL, including those AJAX permalinks they're sure to bookmark. So how do we get the site to behave properly in those cases? Take a look at the Ajax.init() function below:
// run when page is first loaded from the server, not when content is pulled via AJAX
init:function()
{
this.convertLinks();
this.hash=decodeURIComponent(window.location.hash); // get the initial hash value
if("onhashchange" in window)
{
$(window).bind('hashchange',function(){Ajax.onHashChange();});
}
else
{
// requires the jquery timer plugin
this.HashPoller=$.timer(20,function() // check the address every 20 miliseconds
{
if(Ajax.hash != decodeURIComponent(window.location.hash))
{
Ajax.onHashChange();
}
});
}
if(this.hash=='')
{
// load the home page
window.location.hash='#/';
}
else
{
// there WAS a hash value when the page was first loaded. The user came in through an AJAX permalink
this.onHashChange();
}
}
Here we check to see if on the initial page load, the browser's address bar already contained a hash. If so, then we fire the onhashchange event handler.
Ah, but what if the user enters through an indexed page such as http://www.callisto.fm/browse/by/channel/ ? In that case, to make things all nice and neat, we would want to redirect them to the home page, with their requested page included as the hash. However, because search engines need to crawl these pages we cannot issue that redirect via the server generated HTTP location header; we must have the client redirect itself.
To do this, the server side would need to check for two things: that the page is not the homepage (or any other non redirected page), and that the request was not an AJAX request. If this is the condition, then the server can append a dynamic javascript enclosure. Here is the server side code as found in the dispatchLoopStartup method of a front controller plugin (note: you'll need Mojito_JsInit ):
if (!$this->getRequest()->isXmlHttpRequest())
{
// an array of page requests that are never redirected to a hashed URL. Always the home page and those that are served via modal or iframe
$nonRedirects = array ('index/index','account/signup','auth/login');
$requestedPage = $this->getRequest()->getControllerName().'/'.$this->getRequest()->getActionName();
if (!in_array($requestedPage,$nonRedirects))
{
Mojito_JsInit::getInstance()
->addMethod('Ajax.redirect','/#'.$this->getRequest()->getRequestUri()) // add the javascript call to redirect to the hashed url
->lock(); // prevent further methods from being added on this request
}
}
and the related javascript component:
redirect:function(url)
{
window.location.href=url;
}
And there you have it. Take a look at http://callisto.fm and see for yourself. Every URL that is served, is also fully visible to search engines with the same exact content, all while keeping the URLs nice and pretty. Oh and yes those AJAX pageviews are still tracked in Google analytics by calling pageTracker._trackPageview(). The javascript engine for this is posted below, but remember to include the jquery timer plugin. As for the server-side, you'll need to piece that together yourself, but I have every confidence that you will!
Ajax:
{
HashPoller:null,
hash:'',
init:function()
{
this.convertLinks();
this.hash=decodeURIComponent(window.location.hash);
if ("onhashchange" in window) // use the onhashchange event
{
$(window).bind('hashchange',function(){Ajax.onHashChange();});
}
else this.HashPoller=$.timer(20,function() // otherwise poll the address
{
if (Ajax.hash!=decodeURIComponent(window.location.hash))
{
Ajax.onHashChange();
}
});
if (this.hash=='')
{
window.location.hash='#/';
}
else
{
this.onHashChange();
}
},
convertLinks:function(parent)
{
// use parent arg to define a parent element, thereby limiting the scope of anchor manipulation
if (parent!=undefined)
{
parent=parent+' '; else parent='';
}
var selector=parent+'a.hash';
$(selector).each(function()
{
this.href='/#'+$(this).attr('href');
});
},
onHashChange:function()
{
this.hash=decodeURIComponent(window.location.hash); // safari returns hash encoded
if (this.hash!='')
{
var page = this.hash.replace('#/','/');
$.ajax(
{
type: "GET",
url: page,
dataType: "html",
success:function(data, stat, Xhr)
{
// set the X-XHR-Container header server side to determine which element will receive the requested content
// or just hard code it here. Must be a jQuery selector and might be something like '#content'
var container=Xhr.getResponseHeader("X-XHR-Container");
var $container = $(container);
$container.html(data);
document.title=Xhr.getResponseHeader("X-XHR-Title");
// fire google analytics
if (typeof window.pageTracker=='object') pageTracker._trackPageview(page);
// woopra analytics
if (typeof window.woopraTracker=='object') woopraTracker.track(page,document.title);
// in case the content contains the FBML fb:like tag
if (typeof window.FB=='object') FB.XFBML.parse($container[0]);
}
});
}
},
redirect:function(url) // used when the client entered through other than the home page
{
window.location.href=url;
}
}