Tuesday, January 12, 2016

GAL Cleanup Part 1 - Expire old contacts

Over the 8 years that I've worked here, we've managed to virtually triple the number of contacts we host in our Global Address List. At this point, we have 90,000 mailboxes and 60,000 mail enabled contacts. I suspect that a vast majority of these contacts have not been sent to in several years! Heck, considering the volatile work environment, it's more than likely a good portion of those are no longer good. Customer at another company, leaves and we never delete their contact.

Phase 1:

This got me thinking.. Read in the primary smtp address from 100 or so contacts. Search the message tracking logs (on server hosting Internet bound email) for something going to this contact. Each contact I touch, I would mark Custom Attribute 9, with date/time. If I found an entry in the tracking logs, mark Custom Attribute 10. Relatively easy script to write.

#This is too easy!!
get-contacts -resultsize 100 | ForEach {get-messagetrackinglog -start (7 days ago) -recipient $_.externalemailaddress.addressstring.tostring()}

Unfortunately, that TOOK FOREVER!
  1. We have 12 hub transports that send out email to the Internet. This means to effectively scan for a single recipient, I'd need to scan all 12 servers. 
  2. We process probably a million messages each day going out to the Internet.
  3. As I said, we have 60,000 mail enabled contacts.
  4. We keep tracking logs back 30 days.
If I let it run in this state, we'd be scanning 1/30th contacts every day each month..

Phase 2:

I noticed that the Recipient field on get-messagetrackinglog uses OR logic. I could technically buffer up a big handful of recipients into that field and search for them all at once. 20, I'll start with twenty recipients per search. 

These are not the results I was hoping for...
Evidently, I've stumbled onto a 'known bug' with the cmdlet. Your search has to be under so many characters (256 iirc). Once you exceed that, it fails. Only workaround is to reduce the # of entries in your query. At one point, I reduced my # of recipients to only 5 addresses and the script was failing. What next? Only 2 people at a time? Not much of a time savings. 

Phase 3:

While walking out to my car that night I was discussing this project with a co-worker. During explaining the concept to him, I came up with an interesting idea. Scan message tracking logs for non-mailbox users. OK, it sounds worse, but it pays off. 
  1. Create giant string of every accepted domain. This will be used to filter out every mailbox recipient.
  2. Find and fine-tune 'directory searcher' function to validate email address is in GAL. 
So here's my basic process. On each hub transport:
  •  read message tracking logs and spit out all recipients
  • filter where internal domains -notmatch external email address domain
  • check GAL to see if contact exists for recipient.
  • get mailcontact - put today's date in CA10.
Now some contacts appear to get messages hourly as part of scheduled tasks. So I created a second filter on already touched contacts. 

No comments:

Post a Comment