Datasets

2025-07-04

A dataset is the foundation of AI testing in Business Central. Since AI tests are inherently data-driven, datasets allow us to simulate various user scenarios and interactions. By using diverse and comprehensive datasets, we can thoroughly evaluate AI features to ensure they meet high standards for correctness, safety, and accuracy.

Create a dataset

Tip

The full source code for the example used in this article can be found in the Marketing Text Simple demo project.

AI tests in Business Central rely on datasets defined in either JSONL or YAML format. These datasets contain both test input and expected data values used by the AI Test Tool.

Define a JSONL dataset

While there's no rigid schema required, the AI Test Tool supports certain common elements like test_setup and expected_data. Using these keywords helps create a consistent structure.

Here's an example of a valid JSONL dataset:

{"test_setup": {"item_no": "C-10000", "description": "Contoso Coffee Machine", "uom": "PCS"}, "expected_data": {"tagline_max_length": 20}}
{"test_setup": {"item_no": "C-10001", "description": "Contoso Toaster", "uom": "PCS"}, "expected_data": {"tagline_max_length": 20}}
{"test_setup": {"item_no": "C-10002", "description": "Contoso Microwave Oven", "uom": "PCS"}, "expected_data": {"tagline_max_length": 20}}

Each line represents a distinct test case with inputs and expected outputs.

Define a YAML dataset

You can also define the same dataset in YAML format for improved readability:

tests:
  - test_setup:
      item_no: "C-10000"
      description: "Contoso Coffee Machine"
      uom: "PCS"
    expected_data:
      tagline_max_length: 20

  - test_setup:
      item_no: "C-10001"
      description: "Contoso Toaster"
      uom: "PCS"
    expected_data:
      tagline_max_length: 20

  - test_setup:
      item_no: "C-10002"
      description: "Contoso Microwave Oven"
      uom: "PCS"
    expected_data:
      tagline_max_length: 20

Get data for your tests

When creating AI tests, the data you use is as important as the AI features you're testing. Quality, consistency, and realism of data are critical for ensuring that your tests are comprehensive and meaningful.

Tip

Learn more in Best practices about other considerations when creating datasets.

Sources of data for your tests

Public datasets:
- There are many publicly available datasets that you can use for AI testing.
Synthetic data:
- In cases where real-world data is difficult to obtain or too sensitive, you can generate synthetic data.
- Synthetic data can be especially useful for testing edge cases or generating large volumes of data quickly.
Customer or internal data:
- If you have access to anonymized customer data or internal business datasets, this can be a valuable source for realistic AI testing.
- Ensure that the data is appropriately anonymized and that you comply with privacy regulations.
Crowdsourced data:
- Certain platforms allow you to gather custom data by using crowdsourcing.
Simulated data from domain experts:
- In certain domains, domain experts can provide valuable insights into generating realistic and relevant test data.
- This approach is helpful when real-world data isn't readily available or too sensitive to share.

More tips for collecting test data

Start with small datasets: Especially when testing new AI features, begin with small, manageable datasets to avoid overwhelming your testing process.
Incrementally increase complexity: As you refine your tests, increase the complexity of the datasets to better simulate real-world scenarios.
Document data sources: Always document the origins of your test data, including any transformations made, to ensure traceability and transparency in your testing process.

Business Central Copilot Test Toolkit
Build the Copilot capability in AL
Test the Copilot capability in AL
AI Test Tool
Write AI tests
Best practices for testing the Copilot capability

Share via