id	summary	reporter	owner	description	type	status	priority	milestone	component	resolution	keywords	cc
153	look into MySQL multimaster replication	geofft		"If we really aim to be split between W20 and W91 in a way that adds ''useful'' redundancy, we need to do this. As far as I can tell, we have two options:
 * Do [http://onlamp.com/pub/a/onlamp/2006/04/20/advanced-mysql-replication.html circular replication], i.e., A replicates to B, but B also replicates to A, even though you're not supposed to. This requires some hackery with auto_increment columns, and possibly other things, and given that we don't control users' code, I don't know how I feel about the reliability of this.
 * Set up [http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster.html MySQL Cluster]. It uses its own database type (NDB) and [http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-overview-requirements.html by default requires all the memory ever] to store the entire database in memory, although apparently you can [http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-disk-data.html move non-indexed columns to disk] in 5.1. I am highly unsure whether that helps us enough, and taking up memory would be a violation of the principle that unused sites cost us only a couple megabytes on disk. But, if it works, [http://dev.mysql.com/doc/refman/5.1/en/mysql-cluster-replication-multi-master.html multimaster replication between clusters] is definitely supported. So we can stick two SQL servers in W20 and two in W91, set each pair up as a cluster, and replicate between them.

Arguably there's also the approach of running the a single VM synchronized between the two locations on two hypervisors. [http://www.vmware.com/products/fault-tolerance/ VMware has a commercial implementation of this], and [http://blog.xen.org/index.php/2008/12/04/xen-fault-tolerance-kemari-released/ Xen has an experimental version] that's also being [http://www.mail-archive.com/kvm@vger.kernel.org/msg25022.html ported to KVM]. However, in addition to being ever so slightly sketched by the technology, I'm not completely happy with this solution because it only protects us from hardware failures, not software crashes. But I suspect we are more at risk from hardware trouble than software, and this one may actually be the least disruptive and least sketchy option. Note that we'd probably want to continue having a primary and a secondary (we'd just want two instances of each), which helps us a good bit with software failures, but doesn't really also solve #151 the way that true, reliable MMR would."	defect	new	normal		web