Machine Learning with Big Data : Introduction
However, for performing Machine Learning with ML Lib there are some specific pre-processing which need to be performed.
- In the case of most classification and regression algorithms, you want to get your data into a column of type Double to represent the label and a column of type Vector (either dense or sparse) to represent the features.
- In the case of recommendation, you want to get your data into a column of users, a column of items (say movies or books), and a column of ratings.
- In the case of unsupervised learning, a column of type Vector (either dense or sparse) is needed to represent the features.
- In the case of graph analytics, you will want a DataFrame of vertices and a DataFrame of edges.
In this series, we will look at how to perform each of these steps in pyspark. We will also look at how to operationalize the ML model in databricks using a set of options.
Informative. Waiting for more!
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteGood start 👌
ReplyDelete