The crazy thoughts: Super fast "du"

Have you ever tried to get a sense where is your disk space? I have.

First I'd run something like "df -h" and it'll tell me:
Useg: 490G, Available: 20M.

Then I'd run "du -ch --max-depth=1 | disk_use " to get a sense where are the bad guys. Then you do "cat disk_use | grep G" and proceed recursively.

This thing works, but it's really painful, because at the top level "du -ch" has to traverse the directory structure and compute the *exact total*. That takes too much time. I don't care about exact numbers. All I want is the estimate of where are the biggest data chunks are. *This estimate is really easy to get.*

We shall randomly select individual data pages(I mean inodes), find their respective top level directories and add +1 to each top level directory. If we do that for ~10K pages, we'll get a very good estimate as to which top level directories contain most data. If you want to get nubmers in KB, simply normalize them (make them sum to 1) and multiply by the total used space.

If you have some free time, please do it.

The crazy thoughts

Thursday, May 21, 2009

Super fast "du"

No comments:

Followers

Blog Archive

About Me