Tuesday, October 16, 2007

Early feedback from the blogosphere

One of the greatest things about having Behold online is the amount of feedback that the users give. For example, Lee Stranahan uses Behold to add high quality Creative Commons photos to his blog posts to enhance readers' experience. A number of Flickr users recently spotted that Behold occasionally does not keep up with the changing image licenses; Behold now tries to address this by re-indexing images from Flickr more frequently. Behold was also recently mentioned in the SMARTBoard podcast, where Ben Hazzard and Joan Badger discussed online tools that are useful for teachers. They highlighted the simple user interface and also the ability to narrow down search results through Behold's image content analysis. In a similar vein, Anthony Evans uses Behold's visual fitlering to find images that are suitable for educational purposes. All of this is enormously useful feedback that will be used to improve the search experience further. You can also leave your comments by going to Behold and clicking on the 'Chat to the developer' link at the bottom of the page, or by sending an e-mail to

Friday, September 21, 2007

New address: Behold.cc

Behold can now be found at the brand spanking new address behold.cc. Why .cc? Not because Behold has moved to the Cocos Islands in the Indian Ocean. Instead, it was picked to reflect Behold's focus on indexing high quality Creative Commons images :-). However, given the previous experience with moving the site about (see below), all the old links will continue working for quite some time, until the PageRank issue is fully resolved.

Thursday, September 20, 2007

Zero PageRank. A guide to annihilating your site's ranking in 3 easy steps

This post is for webmasters. Recently, I noticed that Behold's PageRank went from 5 to zero. I was very surprised by this and decided to investigate. Previously, I had heard that this is one way that Google can penalise your site for trying to 'spam' it, that is, to deceive it into thinking that the site is more popular/linked to than it really is. Although I had never had this intention, I decided to find out what might have raised Google's alarm. My first port of call was Google's own webmaster guidelines. I found nothing in my actions that could have violated these guidelines, with the possible exception of serving what appeared to be duplicate content from different sub-domains. However, it then dawned on me that Google was not penalising me, but, most likely, I did so myself. Here is what happened:

A long time ago, when Behold did not have a dedicated server, I set up a website at www.beholdsearch.com. The index page used a meta-refresh tag to redirect to the location at which Behold happened to be hosted at the time. Soon I obtained the first dedicated server for Behold. I named it go.beholdsearch.com and changed the meta-refresh tag at www.beholdsearch.com to an HTTP 301 permanent redirect to this address. 9 months later, I purchased a more powerful server to host what is now the Flickr version of the search engine. I named this server photo.beholdsearch.com. Noticing that the Flickr service became much more popular than the university image search that was still located at go.beholdsearch.com, I changed the 301 redirect from www.beholdsearch.com to point to photo.beholdsearch.com instead. Soon I had discontinued the university search and closed the old go.beholdsearch.com domain.

Bad idea
. It looks like you should never change a 301 redirect once you have it in place. And not just because it goes against the very idea of a permanent redirect. My best guess is that this is all to do with the way PageRank is assigned. While no one knows how Google really works, it is likely that when Google sees a 301 redirect from site A to site B, it associates all the links pointing to site A with site B. In other words, it transfers the PageRank from A to B. Site B starts showing up in search results instead of A. However, when you change the redirect so that A now points to C, A has no more PageRank to give. Otherwise one could keep redirecting to new sites and increasing their PageRank at will. Meanwhile, B is now not associated with A. If B itself is not linked to from anywhere (and why would other people link to it when it's easier to link to and remember the www version of the site), on subsequent crawls Google realises this and removes all the PageRank from B that was handed over to it previously from A. So, you end up with having no PageRank at all on any of your landing pages, old or new. It will now take some time for Behold to regain its old PageRank. If you have to change your redirects, it looks like it is much safer to do so with the appropriately named 'temporary redirect' (HTTP 302).

Wednesday, September 12, 2007

Getty's $49 per image price plan

Getty Images announced a new price plan, allowing to use most of their images online for just $49 per image. This is a drastic price reduction, considering that previously this would cost as much as $200 per image. Perhaps this is the first sign of the image sales market reacting to the increasing availability of free high quality images on sites like Flickr.

Searching for creative commons content

The number of searches has sharply risen this week, thanks to a blog post by Cameron Parkins on creativecommons.org describing Behold's mission of finding high quality images that can be freely used. He rightly points out the importance of online resources such as Flickr: "Flickr is not only inspiring in terms of the sheer amount of photos available, but even more so in terms for its ability to allow interesting and innovative resources, such as Behold, to be built." I could not agree more. It is the openness of sites like Flickr that makes Behold's job possible. It is encouraging that Behold's approach to image search has found resonance within the Creative Commons community.

Thursday, August 09, 2007

Behold's mission

Major news! Behold is no longer just a research project. It now has a mission statement that concerns you, the user. It reads: "to offer search over images of highest possible quality that are freely available on the Internet". To this end Behold has now indexed over 1 million high quality images from Flickr and is giving you the option to search them using tags as well through Behold's image analysis. While Flickr has nearly 400 million images, only a small proportion of them are professional, artistic and aesthetically pleasing. Behold's aim is to bring you only images of this calibre. Of course, many of you want to use such images on your websites and in your printed materials. By utilising Flickr's license information, Behold can restrict search results to contain only the images that can be freely used.

Behold can continue growing and offering you more high-quality images, thanks to a parallel computation service provided by Amazon Elastic Compute Cloud. In the mean time, work will continue on improving Behold's visual search capability to enhance your search experience.

Tuesday, June 12, 2007

Revamp

Two weeks ago Google quietly slipped face-filtering into its image search, as reported by GoogleBlogoscoped. This is a very good sign: search engine companies are now paying attention to image content analysis. Meanwhile, Behold has once again been upgraded. A separate option has been added to search around 100,000 photos from a Flickr group called the Unofficial JPEG Magazine and simplified the user interface. There are also two new demo videos on the new 'about' page. Stay tuned for more updates!