Edit

Share via


Build a model with SynapseML

This article shows how to build a machine learning model with SynapseML and how it simplifies complex machine learning tasks. Use SynapseML to build a training pipeline with a featurization stage and a LightGBM regression stage. The pipeline predicts ratings from book review text. Here’s how to use prebuilt models with SynapseML to solve machine learning problems.

Prerequisites

Prepare resources

Set up the tools and resources you need to build the model and pipeline.

  1. Create a new notebook
  2. Attach your notebook to a lakehouse. In Explorer, expand Lakehouses, and then select Add.
  3. Get an Azure AI services key by following the instructions in Quickstart: Create a multi-service resource for Azure AI services.
  4. Create an Azure Key Vault instance and add your Azure AI services key to the key vault as a secret.
  5. Record your key vault name and secret name. You need this information to run the one step transform later in this article.

Set up the environment

In your notebook, import SynapseML libraries and initialize your Spark session.

from pyspark.sql import SparkSession
from synapse.ml.core.platform import *

spark = SparkSession.builder.getOrCreate()

Load a dataset

Load your dataset and split it into train and test sets.

train, test = (
    spark.read.parquet(
        "wasbs://publicwasb@mmlspark.blob.core.windows.net/BookReviewsFromAmazon10K.parquet"
    )
    .limit(1000)
    .cache()
    .randomSplit([0.8, 0.2])
)

display(train)

Create the training pipeline

Create a pipeline that featurizes data using TextFeaturizer from the synapse.ml.featurize.text library and derives a rating using the LightGBMRegressor function.

from pyspark.ml import Pipeline
from synapse.ml.featurize.text import TextFeaturizer
from synapse.ml.lightgbm import LightGBMRegressor

model = Pipeline(
    stages=[
        TextFeaturizer(inputCol="text", outputCol="features"),
        LightGBMRegressor(featuresCol="features", labelCol="rating", dataTransferMode="bulk")
    ]
).fit(train)

Predict the output of the test data

Call the transform function on the model to predict and display the output of the test data as a dataframe.

display(model.transform(test))

Use Azure AI services to transform data in one step

Alternatively, for these kinds of tasks that have a prebuilt solution, you can use SynapseML's integration with Azure AI services to transform your data in one step. Run the following code with these replacements:

  • Replace <secret-name> with the name of your Azure AI Services key secret.
  • Replace <key-vault-name> with the name of your key vault.
from synapse.ml.services import TextSentiment
from synapse.ml.core.platform import find_secret

model = TextSentiment(
    textCol="text",
    outputCol="sentiment",
    subscriptionKey=find_secret("<secret-name>", "<key-vault-name>")
).setLocation("eastus")

display(model.transform(test))