Google sitelinks and site search in results
* Originally posted on crawlscore.com but moved here as the blog on crawlscore.com is no more *
I just happend to search for DeviantART rather than just type in my address bar and got some the usual Google sitelinks but I also go a Google site search link - that's a new one on me.
The timing couldn't have been any better - since my recent post about semantic search then Google have come out with this :

It's a great idea but try and search for something.... anything. It really doesn't work for me. Google is a fantastic search engine, it really is but to make a site search work properly it needs to be able to do more than just crawl pages - it needs context, it needs to know more about the site structure.
When I searched for "bot" I got results from the FAQ and all sorts of other stuff - what I didn't get a list of images that were tagged "bot". If I use the site search on DeviantArt I get a great set of results which is exactly what I expected.
I'm sure Google and all the others are working hard on creating this "context" about each site to get better results but it's an incredibly difficult task - I know this because we've tried.
You can get reasonable results on some sites by doing deeper crawls, different scoring algorithms, directory pattern matching/keywords/dynamic weighting, etc but you just know that the sites you really want to crawl/index properly are either using server side sessions, javascript or have a weird site structure that blows your all encompassing master plan out of the water.
Even all the very, very clever people at Google haven't been able to crack those problems yet so I guess you might think we were wasting our time but we wanted to understand the problem and we do, very much so.
For our experiments we've been using Nutch (for which we've created some new hi-res logos). We'd strongly encourage anyone who really wants to understand search engines to have a go with Nutch. Yes, it's quirky in places but you really begin to get a feel for the problems that search engines face.
Anyway, back to the story : If Google had created a crawler specific to DeviantART I'm sure they would have ended up with something that produced very similar results to DeviantARTs internal search engine but for Google to create a site specific crawler for every one of the billions of sites out there might take a while....
If they worked on working with the webmaster community to structure pages properly (the meta data I was droning on about in my previous post then it could have a massive impact on the quality, relevance and delivery of search results.
So thanks Google for giving me a great example of why we need semantic crawlers and structured web pages :)
- admin's blog
- Login or register to post comments

Rss Feed