Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
This article shows how to build a machine learning model with SynapseML and how it simplifies complex machine learning tasks. Use SynapseML to build a training pipeline with a featurization stage and a LightGBM regression stage. The pipeline predicts ratings from book review text. Here’s how to use prebuilt models with SynapseML to solve machine learning problems.
Prerequisites
Get a Microsoft Fabric subscription. Or, sign up for a free Microsoft Fabric trial.
Sign in to Microsoft Fabric.
Use the experience switcher on the bottom left side of your home page to switch to Fabric.
Prepare resources
Set up the tools and resources you need to build the model and pipeline.
- Create a new notebook
- Attach your notebook to a lakehouse. In Explorer, expand Lakehouses, and then select Add.
- Get an Azure AI services key by following the instructions in Quickstart: Create a multi-service resource for Azure AI services.
- Create an Azure Key Vault instance and add your Azure AI services key to the key vault as a secret.
- Record your key vault name and secret name. You need this information to run the one step transform later in this article.
Set up the environment
In your notebook, import SynapseML libraries and initialize your Spark session.
from pyspark.sql import SparkSession
from synapse.ml.core.platform import *
spark = SparkSession.builder.getOrCreate()
Load a dataset
Load your dataset and split it into train and test sets.
train, test = (
spark.read.parquet(
"wasbs://publicwasb@mmlspark.blob.core.windows.net/BookReviewsFromAmazon10K.parquet"
)
.limit(1000)
.cache()
.randomSplit([0.8, 0.2])
)
display(train)
Create the training pipeline
Create a pipeline that featurizes data using TextFeaturizer
from the synapse.ml.featurize.text
library and derives a rating using the LightGBMRegressor
function.
from pyspark.ml import Pipeline
from synapse.ml.featurize.text import TextFeaturizer
from synapse.ml.lightgbm import LightGBMRegressor
model = Pipeline(
stages=[
TextFeaturizer(inputCol="text", outputCol="features"),
LightGBMRegressor(featuresCol="features", labelCol="rating", dataTransferMode="bulk")
]
).fit(train)
Predict the output of the test data
Call the transform
function on the model to predict and display the output of the test data as a dataframe.
display(model.transform(test))
Use Azure AI services to transform data in one step
Alternatively, for these kinds of tasks that have a prebuilt solution, you can use SynapseML's integration with Azure AI services to transform your data in one step. Run the following code with these replacements:
- Replace
<secret-name>
with the name of your Azure AI Services key secret. - Replace
<key-vault-name>
with the name of your key vault.
from synapse.ml.services import TextSentiment
from synapse.ml.core.platform import find_secret
model = TextSentiment(
textCol="text",
outputCol="sentiment",
subscriptionKey=find_secret("<secret-name>", "<key-vault-name>")
).setLocation("eastus")
display(model.transform(test))