Edit

Share via


Manage indexing in Azure Cosmos DB for MongoDB

Azure Cosmos DB for MongoDB lets you use indexing to speed up query performance. This article shows you how to manage and optimize indexes for faster data retrieval and better efficiency.

Indexing for MongoDB server version 3.6 and higher

Azure Cosmos DB for MongoDB server version 3.6+ automatically indexes the _id field and the shard key (only in sharded collections). The API enforces the uniqueness of the _id field per shard key.

The API for MongoDB works differently from Azure Cosmos DB for NoSQL, which indexes all fields by default.

Editing indexing policy

Edit your indexing policy in Data Explorer in the Azure portal. Add single field and wildcard indexes from the indexing policy editor in Data Explorer:

Screenshot of indexing policy editor in Azure Cosmos DB for MongoDB.

Note

You can't create compound indexes using the indexing policy editor in the Data Explorer.

Index types

Single field

Create an index on any single field. The sort order of the single field index doesn't matter. Use the following command to create an index on the field name:

db.coll.createIndex({name:1})

Create the same single field index on name in the Azure portal:

Screenshot of adding a name index in the indexing policy editor.

A query uses multiple single field indexes where available. Create up to 500 single field indexes per collection.

Compound indexes (MongoDB server version 3.6+)

In the API for MongoDB, use compound indexes with queries that sort multiple fields at once. For queries with multiple filters that don't need to sort, create multiple single field indexes instead of a compound index to save on indexing costs.

A compound index or single field indexes for each field in the compound index result in the same performance for filtering in queries.

Compound indexes on nested fields aren't supported by default because of limitations with arrays. If a nested field doesn't have an array, the index works as intended. If a nested field has an array anywhere on the path, that value is ignored in the index.

For example, a compound index containing people.dylan.age works in this case because there's no array on the path:

{
  "people": {
    "dylan": {
      "name": "Dylan",
      "age": "25"
    },
    "reed": {
      "name": "Reed",
      "age": "30"
    }
  }
}

The same compound index doesn't work in this case because there's an array in the path:

{
  "people": [
    {
      "name": "Dylan",
      "age": "25"
    },
    {
      "name": "Reed",
      "age": "30"
    }
  ]
}

Enable this feature for your database account by enabling the 'EnableUniqueCompoundNestedDocs' capability.

Note

You can't create compound indexes on arrays.

The following command creates a compound index on the fields name and age:

db.coll.createIndex({name:1,age:1})

You can use compound indexes to sort efficiently on multiple fields at once, as shown in the following example:

db.coll.find().sort({name:1,age:1})

You can also use the preceding compound index to efficiently sort on a query with the opposite sort order on all fields. Here's an example:

db.coll.find().sort({name:-1,age:-1})

However, the sequence of the paths in the compound index must exactly match the query. Here's an example of a query that would require an extra compound index:

db.coll.find().sort({age:1,name:1})

Multikey indexes

Azure Cosmos DB creates multikey indexes to index content in arrays. If you index a field with an array value, Azure Cosmos DB automatically indexes each element in the array.

Geospatial indexes

Many geospatial operators benefit from geospatial indexes. Azure Cosmos DB for MongoDB supports 2dsphere indexes. The API doesn't support 2d indexes yet.

Here's an example of creating a geospatial index on the location field:

db.coll.createIndex({ location : "2dsphere" })

Text indexes

Azure Cosmos DB for MongoDB doesn't support text indexes. For text search queries on strings, use Azure AI Search integration with Azure Cosmos DB.

Wildcard indexes

Use wildcard indexes to support queries against unknown fields. Imagine a collection that has data about families.

Here's part of an example document in that collection:

"children": [
  {
    "firstName": "Henriette Thaulow",
    "grade": "5"
  }
]

Here's another example with a different set of properties in children:

"children": [
  {
    "familyName": "Merriam",
    "givenName": "Jesse",
    "pets": [
      { "givenName": "Goofy" },
      { "givenName": "Shadow" }
    ]
  },
  {
    "familyName": "Merriam",
    "givenName": "John",
  }
]

Documents in this collection can have many different properties. To index all data in the children array, create separate indexes for each property or create one wildcard index for the entire children array.

Create a wildcard index

Use the following command to create a wildcard index on any properties within children:

db.coll.createIndex({"children.$**" : 1})
  • Unlike in MongoDB, wildcard indexes can support multiple fields in query predicates. There's no difference in query performance if you use a single wildcard index instead of creating a separate index for each property.

Create the following index types using wildcard syntax:

  • Single field
  • Geospatial

Indexing all properties

Create a wildcard index on all fields with the following command:

db.coll.createIndex( { "$**" : 1 } )

Create wildcard indexes using Data Explorer in the Azure portal:

Add wildcard index in indexing policy editor

Note

If you're just starting development, start with a wildcard index on all fields. This approach simplifies development and makes it easier to optimize queries.

Documents with many fields can have a high Request Unit (RU) charge for writes and updates. If you have a write-heavy workload, use individually indexed paths instead of wildcards.

Limitations

Wildcard indexes don't support any of the following index types or properties:

  • Compound

  • TTL

  • Unique

  • Unlike in MongoDB, in Azure Cosmos DB for MongoDB you can't use wildcard indexes for:

  • Creating a wildcard index that includes multiple specific fields

    db.coll.createIndex(
      { "$**" : 1 },
      { "wildcardProjection " :
        {
          "children.givenName" : 1,
          "children.grade" : 1
        }
      }
    )
    
  • Creating a wildcard index that excludes multiple specific fields

    db.coll.createIndex(
      { "$**" : 1 },
      { "wildcardProjection" :
        {
          "children.givenName" : 0,
          "children.grade" : 0
        }
      }
    )
    

As an alternative, create multiple wildcard indexes.

Index properties

The following operations are common for accounts that use wire protocol version 4.0 and earlier versions. Learn more about supported indexes and indexed properties.

Unique indexes

Unique indexes help make sure that two or more documents don't have the same value for indexed fields.

Run the following command to create a unique index on the student_id field:

db.coll.createIndex( { "student_id" : 1 }, {unique:true} )
{
  "_t" : "CreateIndexesResponse",
  "ok" : 1,
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 1,
  "numIndexesAfter" : 4
}

For sharded collections, provide the shard (partition) key to create a unique index. All unique indexes on a sharded collection are compound indexes, and one of the fields is the shard key. The shard key should be the first field in the index definition.

Run the following commands to create a sharded collection named coll (with university as the shard key) and a unique index on the student_id and university fields:

db.runCommand({shardCollection: db.coll._fullName, key: { university: "hashed"}});
{
  "_t" : "ShardCollectionResponse",
  "ok" : 1,
  "collectionsharded" : "test.coll"
}
db.coll.createIndex( { "university" : 1, "student_id" : 1 }, {unique:true});
{
  "_t" : "CreateIndexesResponse",
  "ok" : 1,
  "createdCollectionAutomatically" : false,
  "numIndexesBefore" : 3,
  "numIndexesAfter" : 4
}

If you omit the "university":1 clause in the preceding example, you see the following error message:

cannot create unique index over {student_id : 1.0} with shard key pattern { university : 1.0 }

Limitations

Create unique indexes while the collection is empty.

Unique indexes on nested fields aren't supported by default because of limitations with arrays. If your nested field doesn't have an array, the index works as intended. If your nested field has an array anywhere on the path, that value is ignored in the unique index, and uniqueness isn't preserved for that value.

For example, a unique index on people.tom.age works in this case because there's no array on the path:

{
  "people": {
    "tom": {
      "age": "25"
    },
    "mark": {
      "age": "30"
    }
  }
}

But doesn't work in this case because there's an array in the path:

{
  "people": {
    "tom": [
      {
        "age": "25"
      }
    ],
    "mark": [
      {
        "age": "30"
      }
    ]
  }
}

This feature can be enabled for your database account by enabling the 'EnableUniqueCompoundNestedDocs' capability.

TTL indexes

To let documents expire in a collection, create a time-to-live (TTL) index. A TTL index is an index on the _ts field with an expireAfterSeconds value.

Example:

db.coll.createIndex({"_ts":1}, {expireAfterSeconds: 10})

The preceding command deletes any documents in the db.coll collection that were modified more than 10 seconds ago.

Note

The _ts field is specific to Azure Cosmos DB and isn't accessible from MongoDB clients. It's a reserved (system) property that contains the time stamp of the document's last modification.

Track index progress

Version 3.6+ of Azure Cosmos DB for MongoDB supports the currentOp() command to track index progress on a database instance. This command returns a document with information about in-progress operations on a database instance. Use the currentOp command to track all in-progress operations in native MongoDB. In Azure Cosmos DB for MongoDB, this command only tracks the index operation.

Here are some examples of how to use the currentOp command to track index progress:

  • Get index progress for a collection:

    db.currentOp({"command.createIndexes": <collectionName>, "command.$db": <databaseName>})
    
  • Get index progress for all collections in a database:

    db.currentOp({"command.$db": <databaseName>})
    
  • Get index progress for all databases and collections in an Azure Cosmos DB account:

    db.currentOp({"command.createIndexes": { $exists : true } })
    

Examples of index progress output

Index progress details show the percentage of progress for the current index operation. Here are examples of the output document format for different stages of index progress:

  • An index operation on a "foo" collection and "bar" database that's 60 percent complete has the following output document. The Inprog[0].progress.total field shows 100 as the target completion percentage.

    {
      "inprog": [
        {
          ...
          "command": {
            "createIndexes": foo
            "indexes": [],
            "$db": bar
          },
          "msg": "Index Build (background) Index Build (background): 60 %",
          "progress": {
            "done": 60,
            "total": 100
          },
          ...
        }
      ],
      "ok": 1
    }
    
  • If an index operation just started on a "foo" collection and "bar" database, the output document can show 0 percent progress until it reaches a measurable level.

    {
      "inprog": [
        {
          ...
          "command": {
            "createIndexes": foo
            "indexes": [],
            "$db": bar
          },
          "msg": "Index Build (background) Index Build (background): 0 %",
          "progress": {
            "done": 0,
            "total": 100
          },
          ...
        }
      ],
      "ok": 1
    }
    
  • When the index operation finishes, the output document shows empty inprog operations.

    {
      "inprog" : [],
      "ok" : 1
    }
    

Background index updates

Index updates always run in the background, no matter what value you set for the Background index property. Because index updates use Request Units (RUs) at a lower priority than other database actions, index changes don't cause downtime for writes, updates, or deletes.

Adding a new index doesn't affect read availability. Queries use new indexes only after the index transformation finishes. During the transformation, the query engine keeps using existing indexes, so you see similar read performance as before you start the indexing change. Adding new indexes doesn't risk incomplete or inconsistent query results.

If you remove indexes and immediately run queries that filter on those dropped indexes, results can be inconsistent and incomplete until the index transformation finishes. The query engine doesn't provide consistent or complete results for queries that filter on newly removed indexes. Most developers don't drop indexes and then immediately query them, so this situation is unlikely.

Note

You can track index progress.

reIndex command

The reIndex command recreates all indexes on a collection. In rare cases, running the reIndex command can fix query performance or other index issues in your collection. If you're experiencing indexing issues, try recreating the indexes with the reIndex command.

Run the reIndex command using the following syntax:

db.runCommand({ reIndex: <collection> })

Use the following syntax to check if running the reIndex command improves query performance in your collection:

db.runCommand({"customAction":"GetCollection",collection:<collection>, showIndexes:true})

Sample output:

{
  "database": "myDB",
  "collection": "myCollection",
  "provisionedThroughput": 400,
  "indexes": [
    {
      "v": 1,
      "key": {
        "_id": 1
      },
      "name": "_id_",
      "ns": "myDB.myCollection",
      "requiresReIndex": true
    },
    {
      "v": 1,
      "key": {
        "b.$**": 1
      },
      "name": "b.$**_1",
      "ns": "myDB.myCollection",
      "requiresReIndex": true
    }
  ],
  "ok": 1
}

If reIndex improves query performance, requiresReIndex is true. If reIndex doesn't improve query performance, this property is omitted.

Migrate collections with indexes

You can only create unique indexes when the collection has no documents. Popular MongoDB migration tools try to create unique indexes after importing the data. To work around this issue, manually create the corresponding collections and unique indexes instead of letting the migration tool try. You achieve this behavior for mongorestore by using the --noIndexRestore flag in the command line.

Indexing for MongoDB version 3.2

Indexing features and defaults differ for Azure Cosmos DB accounts that use version 3.2 of the MongoDB wire protocol. Check your account's version at feature-support-36.md#protocol-support, and upgrade to version 3.6 at upgrade-version.md.

If you're using version 3.2, this section highlights key differences from versions 3.6 and later.

Dropping default indexes (version 3.2)

Unlike versions 3.6 and later, Azure Cosmos DB for MongoDB version 3.2 indexes every property by default. Use the following command to drop these default indexes for a collection (coll):

db.coll.dropIndexes()
{ "_t" : "DropIndexesResponse", "ok" : 1, "nIndexesWas" : 3 }

After you drop the default indexes, add more indexes as you do in version 3.6 and later.

Compound indexes (version 3.2)

Compound indexes reference multiple fields in a document. To create a compound index, upgrade to version 3.6 or 4.0 at upgrade-version.md.

Wildcard indexes (version 3.2)

To create a wildcard index, upgrade to version 4.0 or 3.6 at upgrade-version.md.

Next steps