Thursday, May 21, 2009

Super fast "du"

Have you ever tried to get a sense where is your disk space? I have.

First I'd run something like "df -h" and it'll tell me:
Useg: 490G, Available: 20M.

Then I'd run "du -ch --max-depth=1 | disk_use " to get a sense where are the bad guys. Then you do "cat disk_use | grep G" and proceed recursively.

This thing works, but it's really painful, because at the top level "du -ch" has to traverse the directory structure and compute the *exact total*. That takes too much time. I don't care about exact numbers. All I want is the estimate of where are the biggest data chunks are. *This estimate is really easy to get.*

We shall randomly select individual data pages(I mean inodes), find their respective top level directories and add +1 to each top level directory. If we do that for ~10K pages, we'll get a very good estimate as to which top level directories contain most data. If you want to get nubmers in KB, simply normalize them (make them sum to 1) and multiply by the total used space.

If you have some free time, please do it.

Monday, May 11, 2009

This insane world

Ok. Let's all buy organic and save the world. Doesn't work.

I got some tea from http://www.ineeka.com/.

It's all nice pictures, "cultivating consciousness", ..., nice stuff. The problem is that the tea comes in a metal box. Think about it.

One metal box for 20 tea bags?! How is that sustainable?

(the tea is good, though)