Archive for April 2013

Handling many files in one directory   Leave a comment

The Assignment

You have a directory with gazillion files. Since most filesystems are not very efficient with many files in one directory, it is advisable to spread them among a hierarchy of directories. Write a program (or script) which handles a directory with many files and spreads them in an efficient hierarchy.

Does that sounds like a University assignment or something? Yes, it does.

Well apparently such a situation just happened to me in real life. Searching across the internet I couldn’t find anything too useful. And I will stand corrected if there is something which already deals with that problem. Post ahead if so.

And yes, thank god I’m using Unix (Linux), don’t even want to think what one would do on Windows.

The Situation

An application was spooling many files to the same directory, generating up to a million files in the same directory. I’m sorry I cannot disclose any more information about it, but lets just say it is a well known open source application.
Access to these files was obviously fast having ext4 and dir_index, but the directory index is too big to actually list files or do anything else without clogging everything in the system. And we need these files.

So we’ve decided to model the files in a way that’ll be more efficient for browsing and we can then handle it from there.

The Solution

After implementing something pretty quick and dirty for the situation, to mitigate the pain, I’ve sat down and wrote something a bit more generic. I’m happy to introduce the spread_files.sh utility.
What does it take care of:

  • Reading the directory index just once
  • Hierarchy depth as parameter
  • Stacking up to X files per mv command
  • Has recursion in Bash!!
  • Obviously the best solution would be to never get to that situation, however if you do, feel free to use spread_files.sh.