I have created all functions in [login to view URL] using pandas data frame. First I just loaded the data into Pyspark data frame and convert it back to the pandas data frame and created all the functions.
This solution is ad-hoc. I do not know pyspark.
So I need a helper to convert all the functions into pure Pyspark functions. The data can be found in kaggle - [login to view URL]
I have also added the dataset attached along with the main python files. There are only 7 - 8 functions you need to convert it to pyspark function. You need to only use PySpark Dataframe and do all transformations and finally save the processed data- frame to csv and parquet as I did in [login to view URL]
I have tested in spark version - 3.2.1
An expert can do this within 1 hour or max 2 hours.
I worked with pyspark 3 years ago and I have carried out endless projects and processes in this language, currently in my company I am migrating all the codes from pyspark to python in GCP.