No, this isn't a Small Is Beautiful article. Its about "Small is Practical".
Let me begin with an anecdote.
Back in the early 1980s I worked for a UNIX shop as a kernel programmer. I wrote many device drivers for many platforms. It was a true "How I fought with hardware and Software but kep my sanity" stage of life and was very interesting. One of the 'toys' was an early VAX-780 with an early version of BSD 4.x. No, really, we had TCP in 4.1c before 4.2. But the hardware or the VAX and the PDP-11, like other hardware I'd worked on, was liited by today's standards. We had a whopping 4 Megabytes of memory in the PDP-11/44 that the company ran on. It supported 40 users doing development, building compilers and cross compiling for other platforms. We shifted across to the VAX as it proved its stability and its performance improved as Bill Joy played software leap-frog with Dave Cutler - but that's another story.
One of the reasons those machines could handle such a load was because of the hardware. The PDP-11 had all dual port memory. That is, the disk and the CPU could access memory at the same time. Well not the same byte, but pretty close. It had to do with the granularity of the memory. Back them, semiconductor memory, the fast stuff, came in small chips, perhaps 1K bits or if you could pay even more, 4K bits. The memory board was arranged so that the chips gave 512 byte wide slices. (This corresponded to a sector on the disk. We'll get to that in a moment.) The board was built with two busses, one of the 'CPU' side and one on the 'peripheral' side.
If you imagine for a moment a board that holds just 512 bytes. The 'switch' on it allows either the CPU or the peripheral side to access the memory. Imagine all of memory like this - switchable between the CPU or the peripheral side in 512 byte slices.
Now we come to the disk. The disk controllers on the DEC computers were pretty smart. One controller I used extensively could control eight drives ("spindles"). However it only had one data transfer path to memory. But once it started a transfer it could keep going for any number of sectors.
With this shared memory arrangement, the CPU could be executing code in one part of memory while the disk controlelr is doing a transfer in another part. The magic number of 512 bytes meant that any sector could be laid down anywhere in memory. Since the PDP-11 didn't have virtual memory, programs and data had to be 'swapped' in and out in order to manage multi-programming. Having the code and the data contigous in memory meant that a swap could be accomplished with a single disk transfer while the CPU was executing code elsewhere. Making sure that the two operations didn't clash was the job of the operating system.
A moment's though leads to the conclusion that no matter how many address/data multiplex channels you stack on a board, this only works if your memory chips are no larger than 4K bits. That corresponds to 512 8-bit bytes. Once you get to larger chips you can't do this any more.
Perhaps you can see the fallacy. OK, its not really a fallacy. It comes down to a side issue.
You see if we are going for the contigous transfer swap-in/out, then it doens't matter if we are multiplexing at a larger granuality, 1K byes, 4K byes, 8K bytes. Where it does matter is the file system.
The UNIX file system was traditionally based on 512 byte allocation blocks. For various reasons this was quite efficent in disk space utilization since UNIX of that time used many smaller files - things like very short ("one line") shell scripts. and many programs were very small. But very often the file system would get fragmented and the mid-sized files would end up with their 512 byte blocks scattered over the disk, resulting in a very poor access time.
Some people experimented with larger block sizes for the file sytem and with experimentation it was found that 4K blocking was very effective. Now 4K - 4096 - just happens to be 8 times 512.
Back to the VAX. Back to those smart controllers. By this time the 4K file system blocks had been settled on a the standard for VAX UNIX. We now had more hardware to play with. We had 8 spindles and ... can you see where this is going?
Yes, "striping". Later called RAID.
So by the mid 1980s I was playing with taking those 4K file system buffers and spreading them across 8 physical drives in various ways. Was it faster to do a transfer of 8 sectors to or from one spindle or break up the buffer across all 8 spindles? What about staggering the access?
No, the real problem was convincing management to give me 8 drives and drive packs. The PDP-11 had two RM80s - less then 250 Megabytes total. I was asking for eight! And the new 600 Megabyte Winchester drives. Why, my bosses asked, do you want nearly 20 times to total storage needs of the whole company?
This ought to have brought to mind Thomas Watson - one the one hand:
and on the other
"Every time we've moved ahead in IBM, it was because someone was willing to take a chance, put his head on the block, and try something new. "
And time moves. In one sense our machines have grown slower even though the CPUs are faster. We've given up on a lot of techniques and are slowly redicovering them in a new context.
What we really want are faster drives to keep up with our faster CPUs. What we are getting bigger drives.
We ought to get faster drives for a very simple reason. If the density or the recording doubles that means twice as much data is passing under the drive head. That's without adding platter or faster rotation. Yes, we get faster drives, but what's deivered into memory hasn't matched the actual speedup. What we've really got, what we've been sold on, is more capacity.
But right now I can drive over to my local parts outlet and pick up a 250 Gigabyte 7200rpm SATA drive for under C$100. It will probably be cheaper in the USA. My old bosses would freak out!. No wonder ISPs can offer new customers 200GB of storage. They probably buy in bulk, and how many of their customers are actually going to use that much space? A few, yes, but it will be a Poisson distribution. All my blogs, sources, binaries of development work, database, and email only add up to about 100 Megabytes.
My main worry with a 250 Gigabyte drive would be backups.
"Sufficient unto the day...." as the poet Shelley said.
Technorati Tags: DEC Disk RAID VAX BSD