This should be very easy for someone who has done this before. Basically, we need a simple Python script to extract a list from one web page (few fields including item detail URL), and extract additional detail from item detail URL. The source is one index page and one detail page per item. There are less than 100 items.
I'd like this done in the next few days.
Please let me know if you have any questions.
This is what the scraper's final JSON should look like:
{
"items": {
"item": [
{
"id": "1",
"name": "Item Name 1",
"description_short": "Sample item",
"price": "$100",
"detail_url": "http://some/page",
"description_long": "Several sentences will be extracted here.",
"features": [
"Feature 1",
"Feature 2",
"Feature 3"
],
"need": [
"Need 1",
"Need 2",
"Need 3"
],
"faq": [
{
"id": "0",
"question": "question 1",
"answer": "answer 1"
},
{
"id": "1",
"question": "question 2",
"answer": "answer 2"
},
{
"id": "2",
"question": "question 3",
"answer": "answer 3"
}
]
},
{
"id": "2",
"name": "Item Name 2",
"description_short": "Sample item",
"price": "$100",
"detail_url": "http://some/page",
"description_long": "Several sentences will be extracted here.",
"features": [
"Feature 1",
"Feature 2",
"Feature 3"
],
"need": [
"Need 1",
"Need 2",
"Need 3"
],
"faq": [
{
"id": "0",
"question": "question 1",
"answer": "answer 1"
},
{
"id": "1",
"question": "question 2",
"answer": "answer 2"
},
{
"id": "2",
"question": "question 3",
"answer": "answer 3"
}
]
},
{
"id": "3",
"name": "Item Name 3",
"description_short": "Sample item",
"price": "$100",
"detail_url": "http://some/page",
"description_long": "Several sentences will be extracted here.",
"features": [
"Feature 1",
"Feature 2",
"Feature 3"
],
"need": [
"Need 1",
"Need 2",
"Need 3"
],
"faq": [
{
"id": "0",
"question": "question 1",
"answer": "answer 1"
},
{
"id": "1",
"question": "question 2",
"answer": "answer 2"
},
{
"id": "2",
"question": "question 3",
"answer": "answer 3"
}
]
}
]
}
}