Hi-cap storage question

Michael T. Halligan michael at halligan.org
Fri Jun 18 18:30:20 PDT 2004


Mark,

One thought in tape drives.. I've seen a few 768-tape storagetek
libraries with something like 7 drive quantum (they might be upgradeable
to 11 libraries) go on ebay, for about $45k.. Add another $25k worth of
superdlt drives to get it to 7 SDLT 200/400s, and about $10k worth of
tape, could get you about 15 petabytes of storage, at about 360GB/hour
(or is that per minute? I'm tired).
uncompressed data. All for the low, low price of about $80k.


In terms of usable needs, you're only keeping about 200GB live right
now.

Some questions that come to mind : 

1. How much data are you generating per hour
2. How much turn-around time do you need to access archived data


Hardware costs go down every 6 months or so, with an interesting
improvement in drive speed and storage capacity what, once a year?

3. Can you space out your purchasing/buildout needs to grow as you do?
For example, buy 42 73GB 15k drives now (3 14 disk jbods), in a year,
purchase 42 146GB 15k drives, and theoeretically in 18 months-2 years,
buy 42 290GB drives, etc? 

If you actually have to use these flat files to do anything useful, the
more spindles the better, especially if you're not planning on going
with a database.

There's also the option of building a complex archive.org like
cluster, throwing hundreds of cheap servers with low-cost(quality) raid
controllers and cheap PC drives, then implementing something like
lustre, GFS,  or andrewfs to be able to access it all, or write an
intelligent "load-balancer" to sort where data goes, and be able to
migrate between stable and unstable, or fast and slow "nodes" or
clusters of nodes as needs develop.



> Okay, here's one for you high-capacity folks:
> 
> We're looking at our storage requirements right now.
> 
> We have two needs:  short-term storage and access on live media, and
> long-term storage on archival media.
> 
> Simple, right?
> 
> Well, maybe not.
> 
> You see, the amount of data is a bit larger than what most people are
> used to dealing with.  It's roughly an exabyte.
> 
> We won't be generating it all at once, but that's our
> short-to-medium-term need.  Right now, today, we're storing/archiving
> a terabyte every 20 days, and getting by with hardware RAID (only
> keeping about 100-200GB live, and rotating the available partitions on a
> FIFO basis) and LTO1 jukeboxes.
> 
> We expect that need to double within a year, and within two to three?
> years, jump exponentially.
> 
> Right now, the vast bulk of the data is stored in flat files, which does
> not present a problem for us at the moment, neither in terms of access
> (it's write-only during collection), or inode count (the number of files
> is small and finite).  Moving to a database isn't out of the question,
> but if we did, "free" is a very good word.
> 
> Thoughts on the physical storage issue?
> 
> 
> 
> 
> -- 
> Mark C. Langston                                    Sr. Unix SysAdmin
> mark at bitshift.org                                       mark at seti.org
> Systems & Network Admin                                SETI Institute
> http://bitshift.org                               http://www.seti.org
> 

-------------------
Michael T. Halligan
Chief Geek
Halligan Infrastructure Designs.
http://www.halligan.org/
3158 Mission St. #3
San Francisco, CA 94110
(415) 724.7998 - Mobile




More information about the Baylisa mailing list