Email extraction script

เสร็จสมบูรณ์ โพสต์แล้ว Jul 8, 2005 ชำระเงินเมื่อจัดส่ง
เสร็จสมบูรณ์ ชำระเงินเมื่อจัดส่ง

We offer free email-based services which are mostly used by honest people, but occasionally our services are inadvertently used by people to commit crimes, and we always cooperate with the legal authorities to bring these people to the appropriate legal process.

We normally satisfy subpoena requests by hand with no problem, but this particular one is demanding multiple passes through roughly nine gig of email data, and the Court needs this in less than a week or I go to jail. We need software which will identify and extract certain emails from a large set of archives.

Thus, your task will be to write the following software (which will become our sole property), test it, run it, and then spot-check the results by hand. It's conceivable that the software will take hours or possibly days to run, so we don't have a whole lot of time. If you don't have the availability to work on this *now*, please don't bother bidding. Period.

**_Please note:_ A decision has now been made on this bid request. We are working with RentACoder to resolve the purchase issue, but please do not submit any more bids! Thank you for your patience with this issue.**

## Deliverables

The script will accept as a command line parameter the name of a configuration file, e.g., config.pl. That file, in turn, will contain the following variables. (The format can be different; I'm just assuming here that you'll simply call the file as a subordinate script.)

# Input directory containing a set of

# gzipped email archives: (READ-ONLY!)

$input_dir = '/home/d4b/subpoena/input';

# Directory to store current work files:

$work_dir = '/home/d4b/subpoena/work';

# File containing email addresses of perpetrators,

# stored one per line:

$perps = '/home/d4b/subpoena/[url removed, login to view]';

# Primary output to contain relevant emails:

$evidence = '/home/d4b/subpoena/[url removed, login to view]';

# Secondary output containing summary of work done:

$log = '/home/d4b/subpoena/[url removed, login to view]';

After incorporating the above variables, continue:

For each $input_file in the $input_dir:

{

Record the current date and time to the $log,

prepended with "TIME: ".

Record the $input_file filename to the $log,

prepended with "FILE: ".

COPY the $input_file to the $work_dir.

# Under NO circumstances should you gunzip or

# otherwise modify the files in the $input_dir;

# those files cannot be touched, and you should

# assume that the $input_dir is read-only!

The $input_file will be in gzip format; gunzip it.

For each $perp (email address) in the $perps file:

{

Record the $perp email address to the $log,

prepended with "PERP: ".

Start reading the $input_file into a buffer

of 100 lines, i.e., you will never need to

"go backward" further than 100 lines back.

Search for "^To: .*$perp" and "^CC: .*$perp",

i.e., lines which being with either token and

contain the perp's email address. This can

be done as one pass or two, whichever is easier.

When each instance is found:

{

Reach "backwards" through the buffer for

the most-recent (i.e., closest) instance of

"^From ", (i.e., a line which begins with the

token "From" and then a space, NOT a colon!),

or the unlikely-but-safe beginning of buffer.

Output the found "^From " line to the $log.

Output every line, starting with the

"^From " line, up to but NOT including the next

"^From " line, (or end of file) to the $evidence

file. i.e., This will output the entire email

which was received by the perpetrator in question.

Continue searching.

}

}

Delete the $input_file in the $work_dir.

}

(end)

* * *

**Updates:**

1. We have a cron job that scoops up the /var/mail/whatever file when it hits about 10 meg, (every few hours), renames it and gzip's it. Thus, the $input_dir archive contains a series of such files.

2. It struck me that I need to require a signed non-disclosure agreement from the winning coder, seeing as there's 9G (!) of personal emails. :-)

3. We will give the winning coder an account on the machine which holds these archives. The data should never leave that machine, even for "tests".

4. That last line of the pseudo-code is critical: Make sure you delete your $work_dir copy of the $input_file each time! We do *not* have 9G of free disk space on that server!!

* * *1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

* * *This broadcast message was sent to all bidders on Sunday Jul 10, 2005 9:42:43 AM:

**_Please note:_ A decision has now been made on this bid request. We are working with RentACoder to resolve the purchase issue, but please do not submit any more bids! Thank you for your patience with this issue.**

## Platform

Linux

วิศวกรรม Linux MySQL Odd Jobs Perl PHP การบริหารจัดการโปรเจค สถาปัตยกรรมซอฟต์แวร์ การทดสอบซอฟต์แวร์ UNIX

หมายเลขโปรเจค: #3798079

เกี่ยวกับโปรเจกต์

9 ข้อเสนอ โปรเจกต์ระยะไกล ใช้งาน %project.latestActivity_relativeTime|แทนที่%

มอบให้กับ:

ruslanb

See private message.

$41.65 USD ใน 3 วัน
(11 รีวิว)
2.9

freelancer 9 คน กำลังเสนอราคาในงานนี้ โดยมีราคาเฉลี่ยอยู่ที่ $124

jthoma

See private message.

$127.5 USD ใน 3 วัน
(188 รีวิว)
6.4
coste

See private message.

$212.5 USD ใน 3 วัน
(7 รีวิว)
3.5
pabst

See private message.

$212.5 USD ใน 3 วัน
(5 รีวิว)
3.3
bcexelbi

See private message.

$199.75 USD ใน 3 วัน
(4 รีวิว)
2.4
piotrzielinski

See private message.

$42.5 USD ใน 3 วัน
(3 รีวิว)
1.9
perlsourcerer

See private message.

$85 USD ใน 3 วัน
(8 รีวิว)
2.0
krzysztofa

See private message.

$110.5 USD ใน 3 วัน
(0 รีวิว)
0.0
vw1562249vw

See private message.

$85 USD ใน 3 วัน
(0 รีวิว)
0.0