Performing bulk operations with GraphQL
With the GraphQL Admin API, you can use bulk operations to asynchronously fetch data in bulk. The API is designed to reduce complexity when dealing with pagination of large volumes of data. You can bulk query any connection field that's defined by the GraphQL Admin API schema.
Instead of manually paginating results and managing a client-side throttle, you can instead run a bulk query operation. Shopify’s infrastructure does the hard work of executing your query, and then provides you with a URL where you can download all of the data.
The GraphQL Admin API supports querying a single top-level field, and then selecting the fields that you want returned. You can also nest connections, such as variants on products.
Apps are limited to running a single bulk operation at a time per shop. When the operation is complete, the results are delivered in the form of a JSONL file that Shopify makes available at a URL.
Bulk query overview
The complete flow for running bulk queries is covered later, but below are some small code snippets that you can use to get started quickly.
Step 1. Submit a query
bulkOperationRunQuery mutation and specify what information you want from Shopify.
The following mutation queries the
products connection and returns each product's ID and title.
Step 2. Poll your operation’s status
While the operation is running, you need to poll to see its progress using the
currentBulkOperation field. The
objectCount field increments to indicate the operation's progress, and the
status field returns whether the operation is completed.
Step 3. Retrieve your data
When an operation is completed, a JSONL output file is available for download at the URL specified in the
Bulk query workflow
Below is the high-level workflow for creating a bulk query:
You can use a new or existing query, but it should potentially return a lot of data. Connection-based queries work best.
Test the query by using the Shopify GraphiQL app.
Write a new mutation document for
Include the query as the value for the
queryargument in the mutation.
Run the mutation.
Poll the bulk operation until the
statusfield shows that the operation is no longer running.
You can use the
objectCountfield to check the oparation's progress.
Download the JSONL file at the URL provided in the
Identify a potential bulk query
Identify a new or existing query that could return a lot of data and would benefit from being a bulk operation. Queries that use pagination to get all pages of results are the most common candidates.
The example query below retrieves some basic information from a store's first 50 products that were created on or after January 1, 2019.
Write a bulk operation
To turn the query above into a bulk query, use the
bulkOperationRunQuery mutation. It’s easiest to begin with a skeleton mutation without the
- The triple quotes (""") define a multi-line string in GraphQL.
- The bulk operation's ID is returned so you can poll the operation.
userErrorsfield is returned to retrieve any error messages.
Paste the original sample query into the mutation, and then make a couple of minor optional changes:
firstargument is optional and ignored if present, so it can be removed.
pageInfofields are also optional and ignored if present, so they can be removed.
If the mutation is successful, then the response looks similar to the example below:
Poll a running bulk operation
Because bulk operations are asynchronous and long running, you need to poll to find out when one is completed. The easiest way to do that is to query the
The field returns the latest bulk operation created (regardless of its status) for the authenticated app and shop. If you want to look up a specific operation by ID, then you can use the
You can adjust your polling intervals based on the amount of data that you expect. For example, if you’re currently making pagination queries manually and it takes one hour to fetch all product data, then that can serve as a rough estimate for the bulk operation time. In this situation, a polling interval of one minute would probably be better than every 10 seconds.
To learn about the other possible operation statuses, refer to the
Check an operation's progress
Although polling is useful for checking whether an operation is complete, you can also use it to check the operation's progress by using the
objectCount field. This field provides you with a running total of all the objects processed by your bulk operation. You can use the object count to validate your assumptions about how much data should be returned.
For example, if you’re trying to query all products created in a single month and the object count exceeds your expected number, then it might be a sign that your query conditions are wrong. In that case, you might want to cancel your current operation and run a new one with a different query.
Download result data
Only once an operation is finished running will there be result data available.
If an operation successfully completes, the
url field will contain a URL where you can download the data. If an operation fails but some data was retrieved before the failure occurred, then a partially complete output file is available at the URL specified in the
In either case, the URLs return will be signed (authenticated) and will expire after one week.
Now that you've downloaded the data, it's time to parse it according to the JSONL format.
The JSONL data format
Normal (non-bulk) GraphQL responses are JSON. The response structure mirrors the query structure, which results in a single JSON object with many nested objects. Most standard JSON parsers require the entire string or file to be read into memory, which can cause issues when the responses are large.
Since bulk operations are specifically designed to fetch large datasets, we’ve chosen the JSON Lines (JSONL) format for the response data so that clients have more flexibility in how they consume the data. JSONL is similar to JSON, but each line is its own valid JSON object. To avoid issues with memory consumption, the file can be parsed one line at a time by using file streaming functionality, which most languages have.
Each line in the file is a node object returned in a connection. If a node has a nested connection, then each child node is extracted into its own object on the next line. For example, a bulk operation might use the following query to retrieve a list of products and their nested variants:
In the JSONL results, each product object is followed by each of its variant objects on a new line. The order of each connection type is preserved and all nested connections appear after their parents in the file; however, child nodes might not appear directly after their parent. For example, in the following results, the product variant with title
52 is a child of the first product in the file.
Because nested connections are no longer nested in the response data structure, the results contain the
__parentId field, which is a reference to an object's parent. This field doesn’t exist in the API schema, so you can't explicitly query it. It's included automatically in bulk operation result.
Most programming languages have the ability to read a file one line at a time to avoid reading the entire file into memory. This feature should be taken advantage of when dealing with the JSONL data files.
Here's a simple example in Ruby to demonstrate the proper way of loading and parsing a JSONL file:
To demonstrate the difference using a 100MB JSONL file, the "good" version would consume only 2.5MB of memory while the "bad" version would consume 100MB (equal to the file size).
Bulk operations can fail for any of the reasons that a regular GraphQL query would fail, such as not having permission to query a field. For this reason, we encourage you to run the query normally first to make sure that it works. You'll get much better error feedback than when a query fails within a bulk operation.
If a bulk operation does fail, then its
status field returns
FAILED and the
errorCode field will have a code:
ACCESS_DENIED: there are missing access scopes. Run the query normally (outside of a bulk operation) to get more details on which field is causing the issue.
INTERNAL_SERVER_ERROR: something went wrong on our server and we've been notified of the error. These errors might be intermittent, so you can try again.
TIMEOUT: one or more query timeouts occurred during execution. Try removing some fields from your query so that it can run successfully. These timeouts might be intermittent, so you can try again.
To learn about the other possible operation error codes, refer to the
Canceling an operation
To cancel an in-progress bulk operation, use the
bulkOperationCancel mutation with the operation ID.
Currently, you can run only one bulk operation per shop at any given time. To run a subsequent bulk operation for a shop, you need to either cancel a running operation or wait for it to finish.
A bulk operation query needs to include a connection. If your query doesn’t use a connection, then it should be executed as a normal synchronous GraphQL query.
Bulk operations have some additional restrictions:
- Maximum of one top-level field in the query.
- Maximum of five total connections in the query.
- Maximum of two levels deep for nested connections. For example, the following is invalid because there are three levels of nested connections:
nodesfields can't be used.
bulkOperationRunQuery mutation will validate the supplied queries and provide errors by using the
It’s hard to provide exhaustive examples of what’s allowed and what isn’t given the flexibility of GraphQL queries, so try some and see what works and what doesn’t. If you find useful queries which aren’t yet supported, then let us know on the forums so we can collect common use cases.