Edit

Share via


Frequently asked questions about Azure Cosmos DB for Apache Gremlin

Gremlin queries

How to evaluate the efficiency of Gremlin queries

The executionProfile() preview step can be used to provide an analysis of the query execution plan. This step needs to be added to the end of any Gremlin query. For example, you can add the step to the end of a g.V('example').out('relationship') query resulting in g.V('example').out('relationship').executionProfile().

The output of the profile shows how much time is spent obtaining the vertex objects, the edge objects, and the size of the working data set. This output is related to the standard cost measurements for Azure Cosmos DB queries.

Other frequently asked questions

How are RU/s charged when running queries on a graph database?

All graph objects, vertices, and edges appear as JSON documents in the backend. A Gremlin query can modify one or many graph objects at a time, and the cost depends directly on the objects and edges processed by the query. This process works the same way as it does for all other Azure Cosmos DB APIs.

The RU charge depends on the working data set of the traversal, not the result set. Consider an example where a query obtains a single vertex as a result but needs to traverse several other objects along the way. In this example, the cost is based on all the graph objects involved in computing the result vertex.

What's the maximum scale that a graph database can have in Azure Cosmos DB for Apache Gremlin?

Azure Cosmos DB uses horizontal partitioning to automatically scale storage and throughput as needed. The number of partitions in a container determines its maximum throughput and storage capacity. For optimal performance at scale, follow the specific guidelines for API for Gremlin containers. To learn more about partitioning and best practices, see the [partitioning in Azure Cosmos DB article.

For C#/.NET development, should I use the Microsoft.Azure.Graphs package or Gremlin.NET?

Azure Cosmos DB for Apache Gremlin uses the open-source drivers as the main connectors for the service. So the recommended option is to use drivers supported by Apache.

How can I protect against injection attacks using Gremlin drivers?

Most native Apache Gremlin drivers allow the option to provide a dictionary of parameters for query execution. This functionality is supported in both Gremlin.Net and gremlin (Node.js) libraries.

Why am I getting the "Gremlin Query Compilation Error: Unable to find any method" error?

Azure Cosmos DB for Apache Gremlin supports a subset of the Gremlin surface area. For details on supported steps, see the Gremlin support article.

To resolve this error, rewrite your Gremlin queries using the supported steps, as Azure Cosmos DB provides all essential Gremlin functionality.

Why am I getting the "WebSocketException: The server returned status code '200' when status code '101' was expected" error?

This error is likely thrown when the wrong endpoint is being used.

The endpoint that generates this error has the following pattern: https://<account-name>.documents.azure.com:443/. This endpoint is actually the documents endpoint for your graph database.

The correct endpoint to use is the Gremlin endpoint, which has the following format: https://<account-name>.gremlin.cosmosdb.azure.com:443/.

Why am I getting the "RequestRateIsTooLarge" error?

This error means that the allocated Request Units per second aren't enough to serve the query. This error is seen when you run a query that obtains all vertices:

g.V()

This query attempts to retrieve all vertices from the graph. So, the cost of this query is equal to at least the number of vertices in terms of request units (RUs). The request units per second (RU/s) setting should be adjusted to address this query.

Why do my Gremlin driver connections get dropped eventually?

A Gremlin connection is made through a WebSocket connection. Although WebSocket connections don't have a specific time to live, Azure Cosmos DB for Apache Gremlin will terminate idle connections after 30 minutes of inactivity.

Why can't I use fluent API calls in the native Gremlin drivers?

Azure Cosmos DB for Apache Gremlin doesn't support fluent API calls yet. Fluent API calls rely on an internal formatting feature called bytecode support, which Azure Cosmos DB for Apache Gremlin currently doesn't provide. For this reason, the latest Gremlin-JavaScript driver also isn't supported.

How do I find the request unit charge for a query?

You can find the request unit (RU) charge for an Azure Cosmos DB for Apache Gremlin query using one of a few methods:

  • Use the Azure portal

    1. Sign in to the Azure portal.

    2. Create a new Azure Cosmos DB for Apache Gremlin account and seed it with data, or select an existing account that already contains data.

    3. Go to the Data Explorer pane, and then select the container you want to work on.

    4. Enter a valid query, and then select Execute Gremlin Query.

    5. Select Query Stats to display the actual request charge for the request you executed.

  • Use the .NET software development kit (SDK)

    1. Run a query to get the result as an object of type ResultSet<>: ResultSet<dynamic> results = client.SubmitAsync<dynamic>("g.V().count()").Result;

    2. Get the request charge from the results using the StatusAttributes array and the x-ms-request-charge indexer: double requestCharge = (double)results.StatusAttributes["x-ms-request-charge"];

  • Use the Java SDK

    1. Run a query to get the result as an object of type ResultSet: ResultSet results = client.submit("g.V().count()");

    2. Get the request charge from the results using the statusAttributes method and the x-ms-request-charge key: Map<String, Object> attributes = results.statusAttributes(); Double requestCharge = (Double) attributes.get("x-ms-request-charge");

The request charge is available under the x-ms-request-charge key in the response headers returned by the API for Gremlin.