network thruput

Michael T. Halligan michael at halligan.org
Wed Jun 1 19:01:52 PDT 2005


>>One quick suggestion. quadruple your # of switches. I've built beowulf 
>>clusters where each server has
>>    
>>
>
>changing hw is not an option .. 
>	esp since reconfig the network or switches is not occuring
>	either
>
>  
>
>>4-8 nics, each connected to a different switch, and then bonded, 
>>creating a 400 or 800mb (in your case
>>4gb or 8gb) network.
>>    
>>
>
>yes.. the suggestion was to channel bond to go faster.. but
>its not much of a improvement, since not all machines
>are bonded
>  
>
That's a shame, bonding is fun!

>> If you have the spare hardware,
>>    
>>
>
>no spares ... they run 24x7 environment .. no test machines, no
>hotswap live backup servers either .. yikes
>
>  
>
Sigh. Companies never learn. Hardware is so cheap compared to downtime 
or man hours.




>>When you say they're getting a paltry 5-10MB/s at best, are you saying 
>>all of the servers are at the
>>same time, or at any given time?
>>    
>>
>
>that's the average thruput at any random given time between the
>supposedly high performance cluster nodes
>	- measured with say copying 100MB or 500MB files between
>	any random node at any random time
>  
>

Strange. 10MB/s seems really slow, even with slow hardware and bad 
disks. My thoughts might be skewed,
though, for the past year everything I've played with has been brand 
new, largely overbuilt, and very fast.

>	- to get rid of disk latency issues, we used node1:/dev/loop
>	copying into nodexx:/dev/loop and its the same ... which
>	means the ultra-360 disks is fast enough to keep up on the
>	gigE lan
>
>  
>
>>Beyond that, the stacked switch setup could be bad if that means switch 
>>10 has to traverse all of the
>>other switches in order to get to switch 9.
>>    
>>
>
>that;s the stack i am tryingt break up ... to get rid of all the netbios
>packets from the cluster ( there is nothing a windoze box needs to do
>on the cluster )
>  
>

But does NetBios really create that much chatteR?

>	- netbios packets are about 90% of all packets on the wire
>
>  
>
>>Another thing I'd do is collect some good stats to show to the PHB's .. 
>>Setup NTOP for a week and
>>show them that it's windows chatter eating up all the bandwidth. If 
>>    
>>
>
>already showed the traffic pattern ... but to no avail ... :-)
>
>hard to convince PhD with managerial authority that they're not
>quite up to puff with network design and topology issues
>	- push too hard, and one is on the streets ya know
>
>  
>
>>they're manageable switches,
>>setup cacti to graph them via snmp.
>>    
>>
>
>cacti seems too complicated for me ... :-)
>
>i like something simple like ... to show what is clogging the network
>
>	90% netbios packets
>	5%  tcpip ( data )  not dns, arp, http, smtp, etc..
>	5%  misc
>
>  
>

Cacti is terribly simple. You install the software, point it at your 
switches, enter in the snmp information,
setup devices, setup graphs for those devices' interfaces, and then add 
them to your device tree. Add a cronjob
to poll every 5 minutes, and you've got graphs!


>>Might also be worth digging in to 
>>see if you're having any
>>type of arp or broadcast storms, perhaps a screwed up vlan.
>>    
>>
>
>i was hoping to see dns/arp issues but thats not the case here ..
>

>>$150k? Ouch.
>>    
>>
>
>they're very proud to own that $150K tape library...
>that i will not touch ... not even for $500/hr... no way ...
>
>tapes are a disaster waitng to happen in my book and i rather
>not be restoring from tape or making tape backups, and besides,
>they have another to take care of that for them
>
>  
>
See. I prefer tape if I'm doing a full-restore and if the full restore 
is from only one tape.. streaming like
that tends to be a lot faster, in HUGE amounts of sequential data, than 
hard drives, because of seek performance.
Unfortunately, it' s more like "go get this 20k file that's 90% into 
your 200GB tape" .. Ugh.


>>For $20k nowadays you can get a 40 tape lto2 library that 
>>has 200GB (uncompressed)
>>    
>>
>
>we're looking at 3TB of data ..  still pretty small systems actually
>
>  
>
3TB of data in one backup?  That's annoying. I hate having to backup any 
individual system that's more than
the size of my largest tape.


>>I'm starting to give up on Tape to be honest.
>>    
>>
>
>:-) congrats .. :-)
>
>i think after one or few full restores from tapes that someone
>else did, i think one will no longer be "tape happy" and prefer
>a more reliable way to restore from full backups ( bare metal restore )
>where you have to restore in 5 seconds because the whole company
>is shutdown until it is back up and online ...
>	- i will always prefer to have live warm-swap backups systems
>	even if i have to bring in my own 2GB - 5GB of disks
>	for those that are willing to pay my fees w/o discounts
>
>  
>
>>The value of tape and disk  always goes back and fourth,
>>    
>>
>
>yes... depending on the sitation
>
>
>fun stuff...
>
>c ya
>
What I find more annoying about tapes, is that tape libraries suck. They're finicky. Cables go bad, their moving parts break, tapes get
stuck.. The only thing worse is the terrible complexity of backup software. Every time I have to setup or work with netbackup, I remember
just how much I hate veritas' very existance.




-------------------
BitPusher, LLC
http://www.bitpusher.com/
1.888.9PUSHER
(415) 724.7998 - Mobile





More information about the Baylisa mailing list