Web scrap project

กำลังดำเนินการ โพสต์แล้ว 5 ปีที่ผ่านมา ชำระเงินเมื่อจัดส่ง
กำลังดำเนินการ ชำระเงินเมื่อจัดส่ง

script language: PHP

front end: html/javascript

database: mysql

NO FRAME WORKS

Table Structure:

ALL TABLES:

id (auto insert)

createddate (auto insert)

modifieddate (auto update)

Table A (entities)

name varchar

state (varchar) 2 letter us state code

type varchar(city, county, school district, university, college)

url varchar

table B (url)

datasource (url or query where the data came from)

url varchar

googleposition

maxlayers (defaults 2)

statusname (values would include "found match", "no match")

table C (curl data)

url varchar

retrievedhtml largetext

match varchar

statusname (values would include "undefined" (default), "good", "bad", "review")

The 1st script will:

1 parse the LEA_NAME column for unique values for "school district" names from here - [login to view URL], get the state, school district name, & url. 25,000 results

2 parse the "county names" from here - [login to view URL], get state name, convert it to 2 letter code, 3,098 records

3 parse the "city names" from here - [login to view URL] grab city name, usps (state). 29,000 records.

4 parse the "US college/university" from here - [login to view URL] - & grab college/university name, & url. 2,073 records.

5 populate table A with the name, type, state code (2 digit) while skipping duplicate. convert the state name to 2 digit.

The 1st script will be a one time script, run from linux cli.

The 2nd script will:

1 Loop through table A, & attempt to find the url that matches with a google search, if one was not present from the datasource. The logic must skip certain false positives such as a domain with the word "weather" or "census" or "zillow" or "google" in it or url with ".jpg" or ".asp"

2 populate the record in table B, with :

datasource = the url of the data source above

url = url (skip duplicate)

statusname = null

googleposition = 1-20 (first page of google results only)

The 2nd script will return 35,000 - 200,000 results.

The 2nd script will run periodically from linux cli, on a crontab, & will be rerun, in the future, when additional excemptions are added.

The 2nd script should be multi threaded, & should cap out above a 100mb/second connection

The 3rd php script will:

1 Loop through table B, use curl to retrieve the web page

2 Loop through each of the child pages, for the value in the maxlayers column

3 Look for a particular pattern of text, including a case insensitive search for "bids" "request for proposal" "rfp" "rfq" "request for bids" "proposals"

4 Compare the curl returned html against the keywords

if there is match - insert a record into table C with the url (skip non unique url), the retrieved html, what keyword caused a match, & update table B statusname to "found match"

if there is no match, updated table B maxlayers count upward 1, & updated the statusname to "no match"

Each record from table B may have multiple records in table C

The 3rd script will run be run periodically from the linux cli

The 3rd script should be multi threaded, & should cap out above a 100mb/second connection

The 4th script will be ANDROID PHONE FRIENDLY:

1 Define an sql query which should return the top 10 selection from table C, sorted by modifieddate ASC, WHERE type != "bad" & != "good"

2 Provide a simple html table view front end to review each of the url, which should have columns for all values from table B.

3 An additional column will show a update status button, which when pressed shows the values (as buttons) from table C, which when pressed, update record in table C

The 4th script view is intended for a quality check employee to review all results from, & log if url matches our ultimate criteria or not.

MySQL PHP Web Scraping

หมายเลขโปรเจค: #18227988

เกี่ยวกับโปรเจกต์

15 ข้อเสนอ โปรเจกต์ระยะไกล ใช้งาน %project.latestActivity_relativeTime|แทนที่%

freelancer 15 คน กำลังเสนอราคาในงานนี้ โดยมีราคาเฉลี่ยอยู่ที่ $550

mingxiao2008

Hello, Dear How are you? I have check your project description and am ready for discussing with you about project for now. I have experienced in PHHP and WebScraping , MySQL. I will work very hard and best for y เพิ่มเติม

$500 USD ใน 10 วัน
(81 รีวิว)
8.1
schoudhary1553

Hello Sir, I am the expert freelancer here. I am on the 6th position through out the world to deliver the quality job. I have deliver here more than 400 + projects with 100% client satisfaction. I have more than 5 เพิ่มเติม

$600 USD ใน 10 วัน
(102 รีวิว)
7.1
bestit4u

Hi. I am very interested in your project, because I have much experience in such projects. I have good skills with the program language including C/C++, C#, java, php, asp.net, python, VB.NET. So I have expert and s เพิ่มเติม

$555 USD ใน 10 วัน
(114 รีวิว)
7.1
Mickelson

Hi Nice to meet you. I'm scraping expert. My past works: Youtube comment scrapping Real estate property list to csv Job-site content to csv And scrap posts from facebook, twitter, instagram using scrapy. In add เพิ่มเติม

$500 USD ใน 10 วัน
(119 รีวิว)
6.9
lightingdavid

Hey? How are you? I have reviewed "Web scrap project" .I have good skills for these (MySQL, PHP, Web Scraping). I have been working for 7 yrs in this scope. While we contract and work in our jobs, I will get paid o เพิ่มเติม

$500 USD ใน 10 วัน
(145 รีวิว)
6.3
saad2038

Hi, I can help you to writes script that parse the pages and save the data in database based on conditions and rules that you describes in the project description. I've read the description carefully that we need to wr เพิ่มเติม

$1000 USD ใน 10 วัน
(55 รีวิว)
6.3
naishodayo

How are you today? I am a super expert in this area. If you contact me, I can show you my past work too. Please contact me. Thank you.

$555 USD ใน 10 วัน
(4 รีวิว)
4.8
extravagantweb

I can scrape any data you require. Please contact me and we can discuss getting started, I'm eager to begin working for you.

$333 USD ใน 10 วัน
(6 รีวิว)
4.3
reosoftwares6

"Hi, Hope you are doing well! Thanks for sharing your project requirement with us. It will be our great pleasure to work on your project. I have checked your requirement, yes we can do it, because we already work on si เพิ่มเติม

$616 USD ใน 7 วัน
(0 รีวิว)
0.0