google tricks
October 24, 2007
Introduction
Search engines are good. Google has been the indisputable king of search engines for quite some time now, and even if msn is trying it’s best to pursue the surfers with a new engine, a massive advertising budget and a new “css” based layout, google is still on top. Yahoo and Ask Jeeves has always been alternatives, and still are, while the 90’s big shots AltaVista and webcrawler has taken a big step down.
This article will demonstrate how you can build your own search engine using Google’s API service and some PHP magic. Google has released the source code for it’s results, and they are free for anyone to use with the only drawback being a limit of 1000 queries per day and displays a maximum of 10 results at a time.
And while we’re at it, we might as well do it right this time, using semantic XHTML markup and accessible forms. This will also make it possible for you to add your own search engine to your site with googles powerful back-end serving up the results.
The retro-cool google
Google is loved by many and hated by few. It’s light, fast, (almost) ad-free and secure. But a quick look under the hood makes any web developer wonder why the biggest of them all are still using markup that belong to the 90’s. Inaccessible forms, tables for layout, markup errors, incorrect ampersands in the url, old-school <b> tags etc. etc. Now, in order to change all this you could write them an email explaining your concern about their coding. Or you could create your own search engine, built on the same reliable back-end but re-coded with modern front-end technologies. We will do the later in this article.
Google API
As mentioned before, google released their search results to the public some time ago. All you have to do is sign up a google account at https://www.google.com/accounts/NewAccount if you don’t already have one. Once there, you can request for an API key. If you have problems obtaining one, see their help pages for more info. This key is exactly what is sounds like – a key to access their massive database of basically all web pages available on the world wide wisdom. We will use this key in the example, so make sure you have one before you continue.
PHP and NuSOAP
We will use PHP to create the program. PHP is an open-source server side scripting language and is very similar to perl and c++. Ask your host if they support it (all good hosts do) or try to install it on your local machine. Not only will we use PHP, but also an extension called NuSOAP. It contains many useful classes, especially for tasks like accessing the google results and parsing them. You can download the package at http://sourceforge.net/projects/nusoap/. Once downloaded, unzip the .zip archive and locate the folder named “lib”. In there you will find several PHP files with the extension “.php”. These are the files we will use to access the NuSOAP library.
Getting Started
To get started with the tutorial, make sure you go through these steps:
- Get an API key at https://www.google.com/accounts/NewAccount
- Create a new directory in the root called “search”
- Download the NuSOAP library at http://sourceforge.net/projects/nusoap/
- Unzip the NuSOAP .zip archive, locate the lib folder and copy all files in the folder into the search folder you just created
- Read the next section
Google did wrong – let’s make it right
Semantic markup and web standards are the way of the future. It’s like any tool or technological advance – first comes availability and then perfection. Google’s markup is quite far from perfection, but we know better. After a quick analyze of the search results page I decided that a definition list would probably be the best way to code the results, since it has all the typical fields: a title and descriptions. So this is basically what we want:
<form>
<p>[search_form]</p>
</form>
<p class="results">[results_info]</p>
<dl>
<dt><a href="[url]">[title]</a></dt>
<dd>[description]</dd>
<dd class="url">[url] - [size]</dd>
[ ... repeated ... ]
</dl>
<p class="nav">[page_nav]</p>
Getting the PHP stuff working
Now is the time to download the index.php sample file. This is the file that contains all vital php instructions to create the search engine. Once downloaded, upload it to the search folder on your root directory or open it up in your favourite text editor. Once there, lets take a look at the general sections of the PHP file:
- Preferences
- Main program
- Sub programs
- XHTML rendering
1. Preferences
Theese are the preferences. Modify as you wish, but never change the google API key once you obtained it.
$pref->text– defines what text should be displayed if google doesn’t find any text.$pref->title– defines what title should be displayed if google doesn’t find any title.$pref->key– this is where you should enter your Google API key – read previous section for more info on how to obtain one.$pref->results– sets the number of results to be displayed on one page. Google API has a limit of max 10 results.$pref->start– sets the starting item to be displayed. If set to “Auto”, a page navigation will appear in the bottom.$pref->safe– toggles the SafeSearch filter on or off. “false” means off and “true” means on.$pref->filter– toggles the google filter for similar pages on the same domain on or off. “false” means off and “true” means on.
2. Main program
The main program contain 4 commands:
require_once('nusoap.php');$query = $_GET['q'];$results = getResults($query);$title = getTitle($query);
The first line includes the NuSOAP lib into the document. The second line collects the query string. The third collects the search results in a variable called $results and the fourth line collects the document title into $title.
3. Sub programs
I am not going into detail of all aspects of these functions, but here is a quick review. The getTitle(); is very simple and collects the title depending on what and if the search query has been done. The getResults(); parses the $query string and collects the google results using the NuSOAP functions, then returns the data. The markup is parsed with some regular expressions using the preg_replace(); function in order to remove unwanted tags, correct ampersands and convert <b> tags to XHTML. The function also contains an advanced page and results calculator if you have set the $pref->start to “auto”.
4. XHTML rendering
The last section looks familiar for any web programmer. A few things should be noted here:
- The default style is my childish attempt to assimilate google’s well known layout interface. You can change this to whatever you like.
- Please note the
<title><?php echo $title; ?></title>line. That means that the PHP will take care of the title. - The form also contains some PHP commands, leave them where they are or experiment as you wish.
- The most important line is
<?php echo $results; ?>. This is where the data comes in from the PHP program.
Wrapping it all up
Once you have configured the preferences, just save your file and point your browser to http://www.yoursite.com/search/index.php and start googling. Or, you can have a look at our example demo. You can alter and refine the PHP code depending on what level you master, or you can stick to modifying the XHTML rendering in the fourth section of the document. In any way, you will have a fully standard compilant, valid and highly customizable google search engine at your disposal, ready to use at your site or wherever you like. Remember that the google API only allows 1000 queries per day, so if the search results are empty, you might need to cool off and take a walk.
Entry Filed under: php articles. .
Trackback this post | Subscribe to the comments via RSS Feed