June 16, 2009

iSearch tool

Hi everybody

I'll continue introducing iSearch, that was developed with intention of being a tool to fetch results in any search engine, like google, live.com, yandex, etc.

iSearch consists of SearchEngine superclass and some inherited classes to work against several
search engines.

GoogleSearch(SearchEngine)
YouTubeSearch(SearchEngine)
MsnSearch(SearchEngine)
YahooSearch(SearchEngine)
YandexSearch(SearchEngine)
TorrentzSearch(SearchEngine)
MininovaSearch(SearchEngine)
ScrapeTorrentSearch(SearchEngine)
BaiduSearch(SearchEngine)
Figator(SearchEngine)

Here you have an example to fetch all the results in google for a query like 'hello':


from iSearch import *

a=GoogleSearch("hello")
for i in a:
    print i

You can also get only 10 results:

a=GoogleSearch("Hello")
a.getNResults(10)

It's very simple!!.

Therefore, making your own search wrapper is not very difficult, below you can see GoogleSearch implementation:


Search pages usually have the query string and page number in the URL, so you have to define where they must be
with the keywords "{query}" and "{startvar}"

"{query}" will be replaced with the first parameter of the constructor, the query
"{startvar}" will be replaced with the result count, in the google example from 0 to infinite stepping 100

So, to define "{startvar}" you can define two class attributes:

  • self.startIndex : the first index used in {startvar}
  • self.increment : value to add to the last index to get the next page results

And finally you have to define 2 regular expresions:
  • self.urlRegexp : RegExp to match the desired information of the page (you must parenthesize that information, see python re package for more information)
  • self.nextRegexp : RegExp to match the "Next" link or something that reveals that there is another page with more results, when that regular extression does not match, SearchEngine finishes.

You can get iSearch in http://proxystrike.googlecode.com/svn/trunk/iSearch.py

Obviously you can use iSearch in many different ways, for example, below you can see a simple crawler using iSearch, of course inheriting SearchEngine:



You can see extended documentation in Proxystrike Wiki

Enjoy it!

June 3, 2009

Introducing tools

Hi everybody, 

In the next posts I'd like to explain some very useful modules that I've developed to improve my tools.

Below you can see a summary:
reqresp: Library to work with HTTP requests and responses
TextParser: Library to parse text
iSearch: Library to perform searches using search engines
console: Library aimed to create simple command line interfaces easily.

To begin with, I'll introduce reqresp:
You can access the API in the ProxyStrike wiki

Below you can see en example of the use of reqresp module:


I hope this is useful for you. Enjoy it.