Hi everybody
I'll continue introducing iSearch, that was developed with intention of being a tool to fetch results in any search engine, like google, live.com, yandex, etc.
iSearch consists of SearchEngine superclass and some inherited classes to work against several
search engines.
GoogleSearch(SearchEngine)
YouTubeSearch(SearchEngine)
MsnSearch(SearchEngine)
YahooSearch(SearchEngine)
YandexSearch(SearchEngine)
TorrentzSearch(SearchEngine)
MininovaSearch(SearchEngine)
ScrapeTorrentSearch(SearchEngine)
BaiduSearch(SearchEngine)
Figator(SearchEngine)
Here you have an example to fetch all the results in google for a query like 'hello':
from iSearch import *
a=GoogleSearch("hello")
for i in a:
print i
You can also get only 10 results:
a=GoogleSearch("Hello")
a.getNResults(10)
It's very simple!!.
Therefore, making your own search wrapper is not very difficult, below you can see GoogleSearch implementation:
Search pages usually have the query string and page number in the URL, so you have to define where they must be
with the keywords "{query}" and "{startvar}"
"{query}" will be replaced with the first parameter of the constructor, the query
"{startvar}" will be replaced with the result count, in the google example from 0 to infinite stepping 100
So, to define "{startvar}" you can define two class attributes:
- self.startIndex : the first index used in {startvar}
- self.increment : value to add to the last index to get the next page results
And finally you have to define 2 regular expresions:
- self.urlRegexp : RegExp to match the desired information of the page (you must parenthesize that information, see python re package for more information)
- self.nextRegexp : RegExp to match the "Next" link or something that reveals that there is another page with more results, when that regular extression does not match, SearchEngine finishes.
You can get iSearch in http://proxystrike.googlecode.com/svn/trunk/iSearch.py
Obviously you can use iSearch in many different ways, for example, below you can see a simple crawler using iSearch, of course inheriting SearchEngine:

You can see extended documentation in Proxystrike Wiki
Enjoy it!
