Text Analysis Project using PySpark ML

ปิด โพสต์แล้ว 5 ปีที่ผ่านมา ชำระเงินเมื่อจัดส่ง
ปิด ชำระเงินเมื่อจัดส่ง

I want someone to do a theme analyses around 5 million comments on a video sharing website using PySpark Ml library as the main tool. I will provide the dataset. The work environment should be Databricks Community Edition (you can create an account for free), and the deliverable is a Databricks notebook.

The data is at “video_creator – commentor_id – comment” granularity. What I want you to do is the following:

1. Remove comments that are not written in English.

2. For each commentor_id, append all his/her comments into one feature, call it “all_comments”. That is, aggregate the granularity of dataset into commentor_id – all_comments granularity

3. Transform the “all_comments” feature using Word2Vec modules of PySpark ML library (not the MlLib library as I want to do everything using dataframes)

4. Do a clustering of the transformed “all_comments” feature using the LDA module of PySpark ML.

5. Generate the most frequent words for each cluster as identified in field. I will do the interpretation of the results, and you don’t need to worry about it.

So overall, it’s a straightforward task of data clean, aggregation, and application of standard PySpark ML modules.

I estimate this project to take 2 to 3 hours of programming for someone good at Python and PySpark. I hope to get the project done in 3 days, up to 6 days is acceptable. If you place your bid, I will share with you the link to the data file. I don't have other instructions other than those five steps listed above.

วิทยาศาสตร์ข้อมูล Python Spark

หมายเลขโปรเจค: #17903811

เกี่ยวกับโปรเจกต์

7 ข้อเสนอ โปรเจกต์ระยะไกล ใช้งาน %project.latestActivity_relativeTime|แทนที่%

freelancer 7 คน กำลังเสนอราคาในงานนี้ โดยมีราคาเฉลี่ยอยู่ที่ $271

shivampanchal

I have a good hands on working with Advanced R and Python and BI tools and technologies, AI, Big Data. I have quite a good knowledge of DL/ML Algorithm , have also developed Dashboards and Web Application. My area of e เพิ่มเติม

$250 USD ใน 3 วัน
(66 รีวิว)
7.0
DarkKnight2206

Hello! I am a python developer. I looked at your project and it seems interesting. I have all necessary skills required for this project. Ping me to discuss in detail.

$140 USD ใน 2 วัน
(34 รีวิว)
5.6
suyashdhoot

Hi I am a very experienced statistician, data scientist and academic writer. I have completed several PhD level thesis projects involving advanced statistical analysis of data. I have worked with data from several comp เพิ่มเติม

$500 USD ใน 3 วัน
(31 รีวิว)
5.9
raghavajay3

do kindly let's discuss over chat

$222 USD ใน 6 วัน
(34 รีวิว)
4.8
james880606

Hello? I have read your job description carefully. I have python experienced for 7 years. I want to discuss with you via chat. Thanks you, James.

$155 USD ใน 3 วัน
(3 รีวิว)
2.4
psdhillon

I have been working as data scientist for more than 4 years during which i implemented numerous machine learning algorithms to solve varied business problems. Moreover, to gain other domain expertise, i have been activ เพิ่มเติม

$388 USD ใน 7 วัน
(2 รีวิว)
2.2