Scenario: I was given a list of 15,000 email addresses and asked if they were still valid in our Exchange environment.
Easy method: I ran a simple VBScript that does an LDAP query against each email address. This worked great, except that it took close to 5 seconds per email address to query our environment. (~20 hrs!) The over-all run time was going to be too extreme.
Next method: Read all entries into a datalist using the tricks from Microsoft Scripting Guys. The script would read each object (primary;proxy) into seperate records, then search each one. This ran for 2 hours before the server logged me off. Even if I deleted entries that I found, it still took too long.
Final method: Read each email address and append it to a string for each letter (for example "administrator@example.com;author@example.com" went in A) accessible via an array. Now I have a 50ish (a..z,0..9,!#$%^) row array containing unsorted lists. The script just needs to find the correct row, and see if the email address exists there.
The entire process took approximately 14-15 minutes. This inclides 12 minutes to read in all 250k email addresses from Active Directory(AD) and then sort them. Processing it creates a NEW csv containing the original line, then appends TRUE or FALSE. If it finds the email address, it puts a TRUE, otherwise it puts FALSE. It does very little clean-up of the email addresses it reads from the CSV, but that could be improved rather easily.
| Attachment | Size |
|---|---|
| Find_Duplicates_in_File_v2.zip | 2.64 KB |
Comments
Post new comment