Using Perl to compare two directories
If you ever need to compare two directories to find files missing from one that are in the other, perl can be a good utility to use. You can read into two arrays two directories, then create a hash using the second directory array as keys (this should be the smaller directory). Then use the hash to see if the file exists in the second directory. Using an unless statement allows you to push the file that isn’t in the hash into an ‘unmatched’ array.
Below is my commented script. It prints the number of files in both directories and then a list of files missing from the second (smaller) directory:
#!/usr/bin/perl -w
use strict;
my @firstDirNew;
my @secondDirNew;
my @unmatched;
my $unmatcheditem;
my $i = 0;#load paths in to the directories you want to compare.
#List larger directory first
opendir(DOCS1, “/Volumes/webserver/Inetpub/wwwroot/sitedirectory/bc”);
my @firstDir = readdir DOCS1;
closedir DOCS1;
opendir(DOCS2, “/Volumes/Users/Admin/Sites/profileImages”);
my @secondDir = readdir DOCS2;
closedir DOCS2;#assign the directory lengths to a variable;
my $length1 = @firstDir;
my $length2 = @secondDir;#call the sub;
compareDirs();#print the results;
print “Docs1 Directory is $length1 long\n”;
print “Docs2 Directory directory is $length2 long\n”;
unless (@unmatched eq “”) {
print “Items in the Docs1 Directory not in the Docs2 Directory:\n”;
foreach $unmatcheditem (@unmatched) {
print $unmatcheditem . “\n”;
}
}#heart and soul of the script;
#put items into the %seen hash as keys.
#Keys have to be unique so it works nicely to push
#items from the second array into a new array
#if it is in the first array
sub compareDirs {
my $item;
my %seen = ();
#build a lookup table
foreach $item (@secondDir) { $seen{$item} = 1 }
#find only elements in @fisrstDirNew not in @secondDirNew;
foreach $item (@firstDir) {
unless ($seen{$item}) {
#it’s not in %seen, so add to @unmatched
push(@unmatched, $item);
}
}
}
You can run this command line from the Terminal or if you have BBEdit or TextWrangler you can use the run command in the Unix menu. I like to use BBEdit to run the command, because it then opens a new window with the output. If I want I can save the list and use it to run additional scripts on the directory, such as copy scripts.
December 30th, 2005 at 2:20 am
You can achieve pretty much the same effect by running diff on these two directories like so:
diff -r firstDir secondDir
As an added bonus, if the same file exists in bot dierectories, but different, you will be notified as well.
January 1st, 2006 at 8:10 am
~Editors note - Thanks. Actually, I was wondering if diff could be used. I guess I used a air gun to put a wall hanger up:) - Brad