Ninja Merge   Leave a comment

Recently I was presented with the following situation at work:

  • Your input is a handful of directories, filled with files, some of them are a “sort of a copy” of the other
  • Your output should be one directory with all the files from the source directories merged into it
  • The caveat is – if any of the files collide, you must mark them somehow for inspection

So that sounds pretty simple, isn’t it? In my case the input was millions of files. I’m not sure about the exact number, it doesn’t matter. The best solution for this problem is to never get to this situation, however sometimes you just inherit stuff like that at a new work place.

The Solution

We needed a ninja. I called it It is a Bash wrapper for rsync that will merge directories one by one into a destination directory and handle the collisions for you using a checksum function (md5 was “good enough” for that task).

Get here:

It even has unit tests and the works. All that you have to do is specify:

  • A list of source directories
  • A destination directory
  • A directory to store the collisions

If a path collided, you might end up with something like that in your collision directory:

$ cd collision_directory && find . -type f

Sorting the collisions is something you’ll have to do manually. Sorry!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: