Blog category
Search engine friendly web pages help spiders and people
June 4, 2008
Whilst a website’s design should be aesthetically pleasing, it mustn’t compromise on functionality. Functionality includes ease of use, compliance with disability accessibility legislation, copy that conveys your message and search engine visibility. These aspects are inter-related. Designing for one will help with the others. So making a website search engine friendly will contribute to its usability, accessibility and readability.
Navigation
If images are used for navigation, there should be a text version as well. Or techniques such as the Gilder/Levin
method should be used to display the images ‘over’ the text. Not only do text links make it easier for the spiders to crawl the site, they are also an opportunity to use keywords and they are more accessible to assistive technologies such as screen readers. If images are used, they must include alternative text (alt attribute) to properly describe the image.
Site map
Ideally a text navigation system should link to all pages on the site. The key here is to strike a balance between not going too deep for the sake of the spiders and having a logical (as deep as it takes) structure for the sake of usability. Having the whole navigation set out on one page (or a series of pages for larger sites) in the form of a site map solves the problem. This is also a tremendous aid to usability.
Frames
Frames are self-contained elements of pages (e.g. navigation, banner, footer, body content, etc.). They are ‘bolted’ together to make pages. Whilst the spiders are usually capable of following links from one frame to another, the frame returned to the results page will be out of context, as the surrounding frames will be missing. Additionally, accessibility tools may struggle with frames. They are best avoided.
Flash
Flash content offers many benefits in terms of the visual and audio experience that can be delivered on a web page. And the spiders can, to an extent, read Flash content. But they cannot interpret that content to the same extent as pure HTML. The best solution is to blend Flash elements into an HTML page rather than building a complete Flash page. Then, where necessary (e.g. for Flash navigation), include text equivalents.
Javascript
Javascript cannot be easily interpreted by the spiders. This is not a problem where Javascript functionality does not need to be indexed, but may be a problem where it is used for, say, navigation. In this instance, the spiders would not be able to follow the links to crawl the rest of the site. An alternative text-based navigation system would be needed to solve the problem.
Dynamic content
Content that is generated dynamically (e.g. blogs, portals, site publishing systems, shopping carts, etc.) can be crawled and read by the spiders. But it can present a number of different problems.
Content management systems will populate pages from templates and may not contain optimised elements such as title tags or header tags. (However, there are plenty of good content management systems that deal with this effortlessly – choose carefully.)
Shopping carts generally rely on cookies or session ids to pass vital information from one page to another. The spiders cannot read cookies and may not follow URLs with session ids. Or there is a risk of duplicate content problems when different session ids return the same data. Some shopping cart systems are better equipped than others to deal with these situations.
Duplicate content
Where content that is substantially the same is presented under more than one URL, it may be deemed as duplicate by the search engine and thus frowned upon. Sometimes the reason may be genuine (e.g. printer-friendly versions of originals, shopping cart pages, canonical addresses, etc.) and sometimes the reason may be crude spamming attempts. The search engines may not neccessarily penalise duplicate content, but they may be deterred from indexing a whole site if there is too much of it. One way to get round the problem is to redirect duplicate pages (usually by assigning ‘301 Moved Permanently’ HTTP status codes) to safe pages.
Robots.txt and robots <meta> tag
Control which pages the spiders crawl by installing a robots.txt file in the website root directory. This is a way of retaining pages that are not vital to search engine results and may otherwise be penalised (e.g. duplicate content pages). The robots.txt file is a simple text file that contains syntax to specifically ‘disallow’ prescribed pages for all or specified spiders. But take care not to accidently disallow the whole site (Disallow: /).
An alternative to robots.txt is to include a robots meta tag in the head section of required pages.
CSS
It’s a widely held myth that using Cascading Style Sheets will assist the spiders in reading a page because the page has been massively ‘de-cluttered’ by removing all the presentational markup. The spiders are perfectly capable of finding and reading content regardless of the markup. Having said that, CSS is essential for many other reasons such as conformance with web standards, accessibility and ease of maintenance.
Filed in: Search Engine Optimisation



