This weblick tool basically scratches an itch I have had for quite some time. I have asked myself plenty of times; What information do some high profile sites simply give away ? By give away, I actually mean accidentally leak. Stuff such as headers, cookies, HTML comments and SSL certificates all have plenty of opportunity to contain some information an administrator or developer may have not thought through. With Weblick, I hope to scrape all of the possible information into a database for later analysis.
I’ll suggest you create yourself a new python virtual environment. This will allow you to install all of the required dependencies without touching your operating systems base python installation.
Weblick supports many database backends as it makes use of the peewee ORM. At the moment though, only SQLite and MySQL/MariaDB has been tested. Theoretically PostgresSQL should work do, but some work is needed to add support for that to this tool.
- Clone the repository with:
$ git clone https://github.com/leonjza/weblick.git
This will leave you with a new directory called
Weblick has a few dependencies that need to be resolved. All of these are defined in the requirements.txt file.
Recommended: Create a new python virtual environment with
$ virtualenv env in the
weblick directory. Once this is finished, source the new environment with
$ source env/bin/activate. Your python interpreter will now use the one in your newly installed environment.
- Install the required dependencies with:
$ pip install -r requirements.txt
If you are going to be using the MySQL/MariaDB backend, prepare a database and credentials so that Weblick may create tables, insert and update there. Update the
[mysql] section in the settings.ini file too.
For the default SQLite driver no configuration should be needed. The database file for SQLite will live in the
- With the database configured in the
settings.inifile, create the schema with:
$ python lick.py setupdb
This tool was written to use the Aleksa Top 1 Million data export.
- Download the source data to the
$ curl -O http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
- Extract the downloaded
$ unzip top-1m.csv.zip
Note: If you prefer to have this csv somewhere else, just update the
That should be it. You should now be able to run it with
$ python lick.py and watch your database grow!
A web component exists that allows one to view some information about urls. To run the web interface, simply run
$ python web.py
Future / TODO
With all of the information gathered, I am thinking of attempting to make it possible to alert if things have changed. Ie;
- New / Missing cookies
- New / Missing HTTP headers
- New / Missing comments in HTML sources
- SSL certificate expiry / changes
I should also make it so that a custom CSV can be used as a commandline argument.