Edwin Knuth — Software engineer

28 Mar, 2010

geocoding and spatial database queries using python with django and postgis

Python is an excellent language for developing GIS applications. It's very easy to create a web service that

queries a spatial database and returns a result. Sometimes that sort of thing is called a location based service. They are common in mobile applications, including iphone apps. For our purposes, latitude and longitude are just inputs to a function. Geocoding is the act of turning street addresses into coordinates on a map. A geocoding service takes an address or a more generic reference to streets and returns a latitude and longitude that can be plotted on a map. Reverse geocoding takes a point and returns an address, but we won't be going into that here. Google provides an excellent geocoding service that has a flexible address matching algorithm and provides accurate results. It can be used in python by way of the geopy module. The module can also query other geocoding services like yahoo and microsoft, but google does a good job and works well with the Google Maps API and base layer. The syntax to look up a location by the street address is simple. This example also includes using that geocoded address to look up the Portland neighborhood in which the point is located. To do this we must have a geodjango model loaded in a postgis or other spatial database or geodatabase as they are sometimes called.


from django.contrib.gis.geos import Point
from models import Neighborhood
from geopy import geocoders
api_key='AAJJDJDJSJ....'

g = geocoders.Google(api_key)
place, (lat, lon) = g.geocode(address)
pnt = Point(lon,lat)
n = Neighborhood.objects.get(poly__intersects=pnt)

23 Mar, 2010

installing bucardo for multi-master postgres database replication

In my quest for postgres replication/clustering nirvana I've played with pgpool, slony, rubyrep and now bucardo. Our requirements are painless bidirectional synchronization between two postgresql databases that serve a django application.

Bucardo seems awesome, but there are some problems with the install scripts. I hope to work them out in a future post. In the meantime here are some notes about installing bucardo on ubuntu 9.10.

Bucardo is written in perl and seems to be a little more complicated than rubyrep to get up and running. I'll be testing it out on a virtualbox vm running ubuntu 9.10 karmic koala. This is postgres 8.4, for the record. The first prerequisite for bucard is the DBIx::Safe perl module. Unfortunately there isn't a package available in the default 9.10 repo's, but it does look like it's out for lucid. I'm just going to install DBIx::Safe from source. It and bucardo require the DBD::Pg perl module, which is available from the ubuntu repo. I'll install that with: sudo aptitude install libdbd-pg-perl The bucardo directions have instructions for installing DBIx::Safe from source, so I'll leave that up to them. The directions also state that DBIx::Sage is a rereq, but I couldn't find that. The bucardo build did not complain about lacking that module, so I'm hoping we're good to go. After getting and building bucardo, the next step is to add the new functions to your database. Bucardo uses stored procedures written in perl to handle the replication. The instructions use yum to install this from a package on redhat/centos style servers. For ubuntu, we'll do: sudo aptitude install postgresql-plperl-8.4 The next step is run the bucardo_ctl install script. It prompts you for database settings, which is kind of a hassle. rubyrep was much more straightforward. It also creates a bucardo superuser on the database. The script prompts you for the the password for this user, but there doesn't seem to be a step to set the password. I opened up a psql session and ran the following: alter user bucardo with password 'CHGME'; Definitely change that password to something a little more secure. While you are at it, make sure you are being secure with your db connections. You don't want port 5432 open to the world, and please use SSL. The bucardo_ctl script gave me an error about not having the plperl language available. I tried various things and got tired of running the script. I installed the bucardo functions to the database with the following command: psql DBNAME -h localhost -f /usr/local/share/bucardo/bucardo.schema The next step is to add the database to be replicated to bucardo. bucardo_ctl had a little trouble authenticating. I needed to specify the hostname. You can do it like this: bucardo_ctl --dbhost=localhost --dbpass=BUCARDO_PASSWORD add database DBNAME For some reason bucardo had a lot of trouble installing on Ubuntu. I kept getting an error about missing plperlu, even though the package was installed on the server and I could manually add the language to a database. There seem to be a few errors in the bucardo_ctl perl script. I had to make two changes to get it to work. You need to comment the exit command after the bucardo_schema file is installed, and a secret hardcoded db password at the end of the install: 5465 if ($res !~ m{Pl/PerlU was successfully installed}) {

5466 warn "\nINSTALLATION FAILED! ($res)\n\n";

5467 warn "Installation cannot proceed unless the Pl/PerlU language is available\n";

5468 warn "This is usually available as a separate package\n";

5469 warn "For example, you might try: yum install postgresql-plperl\n";

5470 warn "If compiling from source, add the --with-perl option to your ./configure comma 5470 nd\n\n";

5471 # exit 1;

5472 }

.......

5481 $port and $BDSN .= ";port=$port";

5482 my $default_bucardo_password = 'replicate';

5483 $dbh = DBI->connect($BDSN, 'bucardo', $default_bucardo_password, {AutoCommit=>0,RaiseError=> 5483 1,PrintError=>0});

Now, let us create a few databases, populate and then sync them with bucardo. We'll be using pgbench to generate some test data.

createdb db_left
createdb db_right
pgbench -i db_left
pgbench -i db_right

We'll be starting with two identical databases, populated with some data. One of the tables does not have a primary key defined, but bucardo can handle that. I'm not really interested in that functionality, so I'm not going to worry about it here. To add the databases to replication, execute the following commands:

bucardo_ctl add database db_left --dbhost=localhost --dbpass=PASS
bucardo_ctl add database db_right --dbhost=localhost --dbpass=PASS
bucardo_ctl add all tables
bucardo_ctl add all sequences

17 Mar, 2010

using rubyrep to replicate a postgresql database on ubuntu 9.10 (karmic koala)

Update: I was mistaken about rubyrep's ability to support continuous replication. See my new post for an update. For my current project, we need to replicate a postgresql 8.4 database between several sites. We are looking at bucardo and rubyrep. Both projects support a multiple master architecture. I'm going to give rubyrep a shot, and then try bucardo. I'll be using ubuntu 9.10 karmic koala, but we'll be using debian in production. We'll be replicating a fairly large django database between two sites.

Slony-I

Slony isn't the only technology out there for doing replication of a postgres database. Looks like two relatively new projects are being developed that might be better for certain instances. We couldn't use slony because it is does not support bidirectional synchronization. Slony only allows you to have a single master and multiple slaves. For our purposes we have to have multiple masters.

Setup

rubyrep is obviously written in ruby, and you can choose from standard and jruby version. The java ruby variant is supposed to be a bit faster, but I just want to see how it works. To test, I'll be installing it on for two databases on a single server. Assuming you have ruby installed, you can install rubyrep with the following command:

sudo gem install rubyrep

This installed version 1.1.1 of rubyrep. The next step is to generate a stub config file using the rubyrep command. The tutorial does not specify the path, and gem didn't add it to the default path. For my setup the rubyrep script can be called like:

/var/lib/gems/1.8/bin/rubyrep generate bh_rubyrep.conf

This will generate a sample config file that you can edit to specify your database and server settings. My databases are both on localhost for this test. I uncommented the following to specify syncing all tables:

config.include_tables /./ # regexp matching all tables in the database

The /./ is just a regex to match table names.

Execution

The first step is to do a scan of the databases:

/var/lib/gems/1.8/bin/rubyrep scan -c bh_rubyrep.conf

I haven't created the right hand database, so I got a fatal error stating it didn't exist. I created the second database and ran the command again, but it gave no output. It should list all the tables with a count of mismatched records. Unfortunately my right hand database is empty. I dumped the left hand database and restored to the the right.

I ran the scan again and got a nice list of tables with zero mismatches. We are going to be replicating the database of a relatively large postgres application. With both databases running under a virtualbox vm on a macbook pro, it took 1 min 24 seconds to scan an 18mb database. All tables have primary keys defined, which helps our cause.

eknuth@eknuth-vubuntu:~/rubyrep$ time /var/lib/gems/1.8/bin/rubyrep scan -c bh_rubyrep.conf

purchasing_purchaseancillary 100% ......................... 0

orders_orderline 100% ......................... 0

orders_orderstatus 100% ......................... 0

...

auth_user 100% ......................... 0

navigation_pinnedtab 100% ......................... 0

navigation_tab 100% ......................... 0

real 1m23.323s

user 1m4.708s

sys 0m8.533s

To test the syncronization, I deleted all the records from one of the tables and ran the sync command:

time /var/lib/gems/1.8/bin/rubyrep sync -c bh_rubyrep.conf

rubyrep picked up the missing 8000 records and recreated them in the rh database. Very slick and it did not take measurably longer than the scan.

Conclusions

I'm very impressed with the ease of setup. It really was incredibly painless, in keeping with rubyrep's slogan. There are more complicated sync and conflict resolution rules you can set up, but the basic config was done very quickly.

Unlike slony, which uses triggers to sync the data, rubyrep would be the sort of thing you run periodically out of a cron job. I'm not sure if that will work for us.

rubyrep seems like an excellent way to do postgres replication. It would also allow you to replicate data between mysql and postgres, which is kinda cool.

I also think that rubyrep is probably one of the best postgres data comparison tools available. It seems well suited for that type of use.