So what's everyone using to push databases around these days?

Ray Wong rayw at rayw.net
Fri Feb 15 18:50:02 PST 2008



Hey, so with ancient reference ideas like DFS and google file system and
multicast rsync, what's everyone using to copy decent sized directories
of data around these days?

I've got a site with a semi-typical data cycle (1 or more updates per
day) filling a couple hundred gigs of space (MySQL in several database
dirs).  It goes to a few dozen distributed targets, let's constrain the
parallelism requirement to between 5 and 50 hosts.

What I'd really like is a multicast scp to just blast data to
all targets (though some bwlimiting from ssh would be a nice option
too), but something like mrsync is entirely too fragile for my liking,
as the first target gets all data and yet is the one used for deciding
what to send everywhere else when re-syncing (so it might be the only
host with the files in place already, ensuring no one else gets the file).

rdist over ssh is okayish, but the lack of a multicast option concerns
me.  I fully could handle doing a unicast recovery if a host missed out
on a multicast copy, but without any multicast seems like I'd lose a lot
of performance out of the data originator.

And rsync, well, it's just painfully slow. :)

So, what's anyone else doing?  I haven't really felt like I was pushing
any new ground since Postini or UltraDNS, I'll bet this is old hat to
quite a few of you.  Is there an obvious solution I'm missing?



More information about the Baylisa mailing list