Python web scraping and data munging script

ยกเลิก โพสต์แล้ว Nov 26, 2010 ชำระเงินเมื่อจัดส่ง
ยกเลิก ชำระเงินเมื่อจัดส่ง

I am looking for Python developer to create a script for downloading data from a web page.

The same script can also be used for extracting similar formatted tabular data from HTML page saved to disk, and XLS file containing similar data in a worksheet. The extracted data is saved to file as a text CSV file.

Usage:

scriptname --code=abc --source=internet --filename=/path/to/file

Arguments:

*code*: Required. This determines the data to be downloaded (if source is internet) If data is being read from file, this tells us what the data in the file refers to.

*filename*: Optional. This is only required if the source is NOT internet

*source*: Optional. Valid choices are 'internet' (default) OR 'xlsfile' OR htmlfile

**Notes**

Sample Url pages:

a. [url removed, login to view]

b. [url removed, login to view]

c. special case of the two above: [url removed, login to view]

Sample xls files to be parsed have been attached

Sample html files to be parsed have been attached

The format for data stored in the XLS and HTML local disk files are similar to the HTML page here: [url removed, login to view] (which is very similar to the sample Url pages (it is an earlier version).

The script saves data to file using the following naming convention and and column headers (in the CSV file):

output file name: [url removed, login to view]

output file format (CSV headings):

Type, price, Strike, AQ Bid, AQ Ask, Bid Size, Bid, Ask, Ask Size, Last, Vol, Time (CET), Open, High, Low, Day volume, Total premium, O.I., Settl.

## Deliverables

1) All deliverables will be considered "work made for hire" under U.S. Copyright law. Employer will receive exclusive and complete copyrights to all work purchased. (No 3rd party components unless all copyright ramifications are explained AND AGREED TO by the employer on the site per the worker's Worker Legal Agreement).

## Platform

Cross platform

Python

หมายเลขโปรเจค: #3877581

เกี่ยวกับโปรเจกต์

โปรเจกต์ระยะไกล ใช้งาน %project.latestActivity_relativeTime|แทนที่%