The purpose of this program is to monitor the domain registration activities of certain companies. Therefore we periodically download the zone file from a verisign ftp location, this file contains all domains registered to date, and their corresponding nameservers. To check what domains belong to a certain company, we check certain nameservers. In a text file called [login to view URL] I will store the nameservers to be checked, one per line. The zone file is published twice a day, but the time at which this happens, varies. Therefore the program will need to login to the ftp every set time, and check if the last changed date / time has changed. If so, it's time for action - the zone file, in .gz format, will be downloaded. The file is about 500 megs in size, compressed, and 2 GB, decompressed.
As soon as the file is downloaded, it needs to be analyzed. At the first run, all nameservers (from the [login to view URL] file) and their corresponding domains will have to be stored in a database or file(s). From the second run, the changes to this database/file(s) is what it's all about. The old list is compared to the new one, and per nameserver, a mail is sent out (to the address(es) - one per line - specified in [login to view URL]) with the newly registered domains.
A problem is that some companies use their host's nameservers, not their own like [login to view URL] or ns.google.com. It's even worse when a host hosts multiple companies I want to monitor. A certain subset of results will thus have to be filtered. Therefore I will put the set of nameservers from [login to view URL] that need extra filtering in another file, servers.txt. The best way to filter is probably by extracting whois and checking the registrant email address. As the 'central' internic database lags 24-48 hours behind, the script will need to extract the whois record from the whois server of the registrar the domain was registered at. I will build a profile for every nameserver from servers.txt. Take the host interland for an example. Assuming this host hosts two companies to monitor, I will make two files:
*[login to view URL]: in this one the whois servers to be checked will be stored:
[login to view URL] (for company A)
[login to view URL] (for company B)
*[login to view URL]: in this one, the registrant email addresses to be matched will be stored:
companyA [at] [login to view URL]
companyB [at] [login to view URL]
In the case of multiple companies on one host, multiple whois databases need to be checked for multiple registrant emails for each of the domains found by the other part of the script (per nameserver) until a match is found. In case there is no match, the next domain is checked. If there is a match, the found newly registered domain is temporarily stored somewhere, along with the nameserver and registrant email address. After the results of a nameserver have been filtered, a (few) mail(s) is/are sent out, one for each registrant email address, in this format:
Subject: NAMESERVER (in this case INTERLAND) // registrant email address
Body: the found domains.
Then the results for the next nameserver from [login to view URL] are filtered.
Speed and smoothness are everything in this script. First of all I want the results of 'characteristic' nameservers like [login to view URL] not to lag behind those that need to be filtered. The processes thus need to run seperate, either after each other or simultaneously, whichever is best. Furthermore, most whois databases allow only one connection at a time, and only a limited number of queries per timeframe is possible. Although it's likely that less than 100 domains need to be checked every day, I can cover this problem with multiple IPs.
When all is done, the downloaded zone file can be deleted.
Note: when talking about nameservers, I will only put the middle part of it in the files - for [login to view URL] this would mean lycos, for [login to view URL] this would mean google.
This is a relatively simple script that can be completed in one or two days. The usual offers like $1000 are simply far too high. To give you an indication: I had this script done in parts and in different languages, cost totalling about $270.