Sorting a list of email addresses with Perl (virtusertable)

I’ve always had a mental block when sorting with Perl, but I was asked to write a quick script to sort out the /etc/mail/virtusertable file for our sendmail setup. The format of the table is as follows:

<email address> (whitespace) <destination>

The whitespace can be a tab or space and the destination can be in the form of a local mailbox or another email address. Also the file may contain hashed out lines which sendmail ignores, but we still need them. The output needs to be sorted by domain and then by email address. Simple huh?

The perl bit….

The plan is to read the file in from the command line, get it into an array and sort the array so i’ll build the test file bit by bit. We’ll be calling the perl script in the format ./scriptname input_file. I’m not using real email addresses on a blog so I’ve created a test file to demonstrate, which looks like:

111@aaaaaa.com	 some_username
222@aaaaaa.com	 some_username
333@aaaaaa.com	 some_username
444@aaaaaa.com	 some_username
555@aaaaaa.com	 some_username
666@aaaaaa.com	 some_username
111@bbbbbb.com	 some_username
222@bbbbbb.com	 some_username
...

I’ve created this example virtusertable with the following script:

#! /usr/bin/perl -w

use strict;

foreach my $domain ('a' .. 'f'){
   foreach (1 .. 6){
      print "$_" x 3 . '@' . $domain x 6 . ".com\t some_username\n";
   }
}

Ok, let’s start with getting the file into the program and into the array:

#! /usr/bin/perl -w

use strict;

my @virtusertable_array = <>;

foreach (@virtusertable_array) {print}

There is an obvious issue with this, the list is already sorted, so we need to somehow shuffle the list around to give us something to sort! We can either use a ‘fisher yates’ shuffle function or just be lazy and use a perl function. The modified script to create the list follows:

#! /usr/bin/perl -w

use strict;
use List::Util 'shuffle';

my @output;

foreach my $domain ('a' .. 'f'){
   foreach (1 .. 6){
      push(@output, "$_" x 3 . '@' . $domain x 6 . ".com\t some_userna
   }
}                           

@output = shuffle(@output);
foreach (@output){print}

My output now looks like this:

444@ffffff.com	 some_username
333@cccccc.com	 some_username
444@aaaaaa.com	 some_username
333@aaaaaa.com	 some_username
222@eeeeee.com	 some_username
333@bbbbbb.com	 some_username
222@bbbbbb.com	 some_username
...

Perl sorting

To sort an array in perl is pretty easy, just use the keyword ‘sort’ for an ‘ascii-betical’ sort. We can modify the foreach line in our script as follows:

foreach (sort @virtusertable_array) {print}

and we get the following:

111@aaaaaa.com	 some_username
111@bbbbbb.com	 some_username
111@cccccc.com	 some_username
111@dddddd.com	 some_username
111@eeeeee.com	 some_username
111@ffffff.com	 some_username
222@aaaaaa.com	 some_username
222@bbbbbb.com	 some_username
...

Not exactly what is needed, as we want all the domains sorting first, then the user part of the address. To take the sorting to the next level we need to use the $a and $b variables which are the default for sorting. This is not a tutorial on sorting but the basics are this:

ASCIIbetical sort (both the same):
@sorted = sort @unsorted;
@sorted = sort {$a cmp $b} @unsorted;

Numberical sort:
@sorted = sort {$a  $b} @unsorted;

Alphabetical sort
@sorted = sort {lc($a) cmp lc(b)} @unsorted;

That’s interesting but still not going to help us. We need to extract the domain name from the email address and use that, the do the comparison on the user part. So a couple of functions should do the job as we don’t want to modify the actual data. The functions are:

sub domain {
   my $addy = shift;
   $addy =~ /\@([^\s]*)\s+/;
   $count_domain ++;
   return $1;
}

sub user {
   my $addy = shift;
   $addy =~ /^([^\s]*)\@/;
   $count_user ++;
   return $1;
}

My functions may be a bit lazy, but i have control over what is entered into the file! Also note the $counters which will provide some interesting data later on. Now we have the functions we just need to add the sort to the main program:

my @sorted_virtusertable = sort { &domain($a) cmp &domain($b) || &user($a) cmp &user($b) } @virtusertable;

Note the ‘||’ (or) which adds the second level sort on the user part. The whole script should look like this now:

#! /usr/bin/perl -w

use strict;

my ($count_domain, $count_user);

sub domain {
        my $addy = shift;
        $addy =~ /\@([^\s]*)\s+/;
        $count_domain ++;
        return $1;
}

sub user {
        my $addy = shift;
        $addy =~ /^([^\s]*)\@/;
        $count_user ++;
        return $1;
}

my @virtusertable_array = <>;

foreach (sort { &domain($a) cmp &domain($b) || &user($a) cmp &user($b) } @virtusertable_array) {print}

print "sub domain called: $count_domain times\nsub user called $count_user times\n";

Ok so I added a bit to count the number of times the functions were called, i got:

sub domain called: 298 times
sub user called 124 times

Staggering isn’t it? Well as you would expect, there is a way round this using an intermediate table and references, but thats for another day!

This entry was posted in FreeBSD Administration, Perl and tagged , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *