Monday 14 May 2007

, ,

Uninventing the search engine

Don't you hate it when stuff just works? It's predictable and boring, and if you ask me, anything falling into this category should be sabotaged immediately to spice things up a little.
Many web coders clearly share my view because this is precisely what they've been doing with their previously accurate, efficient and dependable search engines.

Take Googles' Image Search for example. Imagine your typical day; you're surfing the web when a sudden impulse to track down a picture of Spider-man wrestling a T-Rex grips you with full force. You visit Google Image Search and type in the keywords 'spiderman', 'wrestles' and 'trex'. Now you wouldn't imagine there would be all that many depictions of such a scene so it would be reasonable to expect a return of say less than half a dozen hits at the most. Well you'd be wrong; supposedly Google currently indexes 1030 images of the web-shooting wonder getting down and dirty with the "last and largest known carnosaur".

It's curious that amongst these 'hits' are images of King Kong, random politicians, The Simpsons, Bambi, fish corpses and Wacko Jacko's face embedded in a slice of toast, but none of them remotely resemble what I actually searched for.

The fact that there are lots of pictures containing isolated wrestlers, spidermen and dinosaurs might indicate that Google has applied the OR Boolean search operator to my query rather than the more useful AND one. This isn't the case, however; if you click on the 'Advanced Image Search' link you'll see that the keywords are automatically entered into the "find results related to all of the words" box to demonstrate which kind of search I performed prior to reaching this page. Just to confirm, clicking the search button again at this point returns exactly the same set of irrelevant flotsam.

It could be that I'll never ascertain for certain if Spider-Man (yes, I know that's the correct way to write it) ever unleashed the Pumphandle Michinoku driver II on a 43 foot long, 7.5 tonne 'tyrant lizard king'. It's no laughing matter.

That's just a drop in the ocean. All kinds of search engines across the board are falling prey to Boolean vandalism; software repositories, forums, recipe databases - the list is endless. The digg coders are prime suspects. Try probing it for stories involving two of the widest prevailing bedfellows, the 'llama' and 'blamange'. Go on, guess how many hits there are for this keyword combo. 89! That's eighty-nine, EIGHTY-NINE!

It appears that contrary to the norm, you can crowbar the AND operator in between them to narrow down the field, but why wouldn't this be the default setting to begin with as it is with Google? (well, the text search element of Google anyway). You wouldn't dial 999 to report a crime and when asked, "which service do you require" reply police... OR a florist please, either will do. So why would it make sense in any other context?

5 comments:

Anonymous said...

At least it wasn't jesus in there. Google isn't as effective as it used to be and microsoft's attempts have so far been a failure. I suppose all the people trying to get hits have buggered it up. I wonder if it's unfixable.

I reckon we'll eventually see a kind of napster for links where ppl share them and you can have a trust network to reduce to dimwits.

dreamkatcha said...

People playing the system for profit or hits helps to explain why Google is becoming less useful, but what I don't get is why web masters who have complete control over their sites are apparently flushing functioning search technology down the toilet.

For instance, another trend I've noticed is that sites will force you to use 'OR' queries which skew the results so much they are worthless, and then offer a link to the same search run through Google.

The best example I know of is MacUpdate, one of the two biggest Mac software databases. A search for 'wysiwyg' and 'html' returns 306 results.

I'd know without sifting through them that this is bogus because there are very few WYSIWYG HTML editors for the Mac.

The text at the top reads "Relevant Search — Best results first!" which is also a lie because BBEdit is the fourth 'hit' and that is a text editor, not a visual web page design tool. The author of the software isn't the one pretending because the acronym WYSIWYG doesn't feature in the description at all.

Incidentally, the description of the fifth result, FontCard, contains neither keyword.

The advanced search menu has a 'required words' field, and guess what? Typing the same keywords into that returns an identical set of 306 results.

Next up: the MacRumors forum (and hundreds of others that use the vBulletin platform). This one really takes the biscuit. If you search for the keywords 'shuffle', 'custard' and 'cream' and opt to only search keywords appearing in thread titles you get 491 results spread across 20 pages. Unsurprisingly enough, none of them include all three words in the title.

But wait, what's this? You can also search the forum via Google. Try that and you get no hits at all. Isn't that novel? Result! Well you know what I mean.

Napster for genuine search results is so ridiculous it will probably happen. ;)

Anonymous said...

There is of course the other driving force which is paid links. Company x pays Google/MS whoever for more prominence for certain themes or keywords. That explains some more of the stupid results but i still don't think it's enough.

People thought search engines were ridiculous before they became popular. Come to think of it maybe they are right.

dreamkatcha said...

I've developed 'sponsored link' blindness. Any appearing at the very top of the page or in the right-hand column may as well not be there for me.

At least Google disclose which sites have paid to be there. There are bound to be other search engines that give more prominence to customers without making the relationship clear.

Something which should have been glaringly obvious to me is that you can make money through adding a customised Google search box to your site. I've even got my own Google Co-Op for Amiga games (with the ads disabled) and it didn't occur to me until you triggered the memory that this would explain why many web masters would rather you used Google's technology than their own in-house search engine... in which case, why not ditch the latter altogether?

Yeah, I bet it would funny to travel back to a time before search engines had been invented and sit in on a discussion between a programmer and a layman. Why would anyone need a search engine when we've got libraries?

dreamkatcha said...

Joel Mueller of MacUpdate sees the light!

The influence I exert over the web is staggering. ;)