Edit

Share via


Transform and enrich data with AI functions (preview)

Important

This feature is in preview.

With Microsoft Fabric, all business professionals (from developers to analysts) can get more value from their enterprise data through Generative AI. They can use experiences like Copilot and Fabric data agents. Because of a new set of AI functions for data engineering, Fabric users can take advantage of industry-leading large language models (LLMs) to transform and enrich data.

AI functions use the power of generative AI for summarization, classification, text generation, and more. With a single line of code, users can:

You can incorporate these functions as part of data-science and data-engineering workflows, whether you're working with pandas or Spark. There's no detailed configuration and no complex infrastructure management. You don't need any specific technical expertise.

Prerequisites

Note

  • AI functions are supported in Fabric Runtime 1.3 and later.
  • AI functions use the gpt-4o-mini (2024-07-18) model by default. Learn more about billing and consumption rates.
  • Most of the AI functions are optimized for use on English-language texts.

Getting started with AI functions

When you use pandas, the OpenAI package must be installed. In the Python environment, the AI functions package must also be installed.

No installation is required when you use PySpark, because AI functions are preinstalled in the PySpark environment.

The following code cells include all the necessary installation commands.

# The pandas AI functions package requires OpenAI version 1.30 or later.
%pip install -q --force-reinstall openai==1.30 2>/dev/null

# AI functions are preinstalled on the Fabric PySpark runtime.

This code cell imports the AI functions library and its dependencies. The pandas cell also imports an optional Python library to display progress bars that track the status of every AI function call.

# Required imports
import synapse.ml.aifunc as aifunc
import pandas as pd
import openai

# Optional import for progress bars
from tqdm.auto import tqdm
tqdm.pandas()

Apply AI functions

Each of the following functions allows you to invoke the built-in AI endpoint in Fabric to transform and enrich data with a single line of code. You can use AI functions to analyze pandas DataFrames or Spark DataFrames.

Tip

Learn how to customize the configuration of AI functions.

Calculate similarity with ai.similarity

The ai.similarity function invokes AI to compare input text values with a single common text value, or with pairwise text values in another column. The output similarity score values are relative, and they can range from -1 (opposites) to 1 (identical). A score of 0 indicates that the values are unrelated in meaning. Get detailed instructions about the use of ai.similarity.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([ 
        ("Bill Gates", "Microsoft"), 
        ("Satya Nadella", "Toyota"), 
        ("Joan of Arc", "Nike") 
    ], columns=["names", "companies"])
    
df["similarity"] = df["names"].ai.similarity(df["companies"])
display(df)

Categorize text with ai.classify

The ai.classify function invokes AI to categorize input text according to custom labels you choose. For more information about the use of ai.classify, go to this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "This duvet, lovingly hand-crafted from all-natural fabric, is perfect for a good night's sleep.",
        "Tired of friends judging your baking? With these handy-dandy measuring cups, you'll create culinary delights.",
        "Enjoy this *BRAND NEW CAR!* A compact SUV perfect for the professional commuter!"
    ], columns=["descriptions"])

df["category"] = df['descriptions'].ai.classify("kitchen", "bedroom", "garage", "other")
display(df)

Detect sentiment with ai.analyze_sentiment

The ai.analyze_sentiment function invokes AI to identify whether the emotional state expressed by input text is positive, negative, mixed, or neutral. If AI can't make this determination, the output is left blank. For more detailed instructions about the use of ai.analyze_sentiment, see this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "The cleaning spray permanently stained my beautiful kitchen counter. Never again!",
        "I used this sunscreen on my vacation to Florida, and I didn't get burned at all. Would recommend.",
        "I'm torn about this speaker system. The sound was high quality, though it didn't connect to my roommate's phone.",
        "The umbrella is OK, I guess."
    ], columns=["reviews"])

df["sentiment"] = df["reviews"].ai.analyze_sentiment()
display(df)

Extract entities with ai.extract

The ai.extract function invokes AI to scan input text and extract specific types of information that are designated by labels you choose (for example, locations or names). For more detailed instructions about the use of ai.extract, see this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "MJ Lee lives in Tuscon, AZ, and works as a software engineer for Microsoft.",
        "Kris Turner, a nurse at NYU Langone, is a resident of Jersey City, New Jersey."
    ], columns=["descriptions"])

df_entities = df["descriptions"].ai.extract("name", "profession", "city")
display(df_entities)

Fix grammar with ai.fix_grammar

The ai.fix_grammar function invokes AI to correct the spelling, grammar, and punctuation of input text. For more detailed instructions about the use of ai.fix_grammar, see this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "There are an error here.",
        "She and me go weigh back. We used to hang out every weeks.",
        "The big picture are right, but you're details is all wrong."
    ], columns=["text"])

df["corrections"] = df["text"].ai.fix_grammar()
display(df)

Summarize text with ai.summarize

The ai.summarize function invokes AI to generate summaries of input text (either values from a single column of a DataFrame, or row values across all the columns). For more detailed instructions about the use of ai.summarize, see this article.

Sample usage

# This code uses AI. Always review output for mistakes.
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df= pd.DataFrame([
        ("Microsoft Teams", "2017",
        """
        The ultimate messaging app for your organization—a workspace for real-time 
        collaboration and communication, meetings, file and app sharing, and even the 
        occasional emoji! All in one place, all in the open, all accessible to everyone.
        """),
        ("Microsoft Fabric", "2023",
        """
        An enterprise-ready, end-to-end analytics platform that unifies data movement, 
        data processing, ingestion, transformation, and report building into a seamless, 
        user-friendly SaaS experience. Transform raw data into actionable insights.
        """)
    ], columns=["product", "release_year", "description"])

df["summaries"] = df["description"].ai.summarize()
display(df)

Translate text with ai.translate

The ai.translate function invokes AI to translate input text to a new language of your choice. For more detailed instructions about the use of ai.translate, see this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        "Hello! How are you doing today?", 
        "Tell me what you'd like to know, and I'll do my best to help.", 
        "The only thing we have to fear is fear itself."
    ], columns=["text"])

df["translations"] = df["text"].ai.translate("spanish")
display(df)

Answer custom user prompts with ai.generate_response

The ai.generate_response function invokes AI to generate custom text based on your own instructions. For more detailed instructions about the use of ai.generate_response, see this article.

Sample usage

# This code uses AI. Always review output for mistakes. 
# Read terms: https://azure.microsoft.com/support/legal/preview-supplemental-terms/.

df = pd.DataFrame([
        ("Scarves"),
        ("Snow pants"),
        ("Ski goggles")
    ], columns=["product"])

df["response"] = df.ai.generate_response("Write a short, punchy email subject line for a winter sale.")
display(df)