Combining data from multiple microservices using GraphQL

neon pink outline building blocks on black backround

Introduction

Note: this post assumes a certain amount of familiarity with both microservice architectural style and GraphQL APIs. If you’re not familiar with either, check this in-depth tutorial on GraphQL and/or this primer on microservices.

Microservice-based application architectures have been increasing in popularity since their inception. Mostly because they’re widely held to solve several problems with monolithic application architectures: they’re more scalable, more reliable and easier to maintain.

It’s a similar story for GraphQL as an API architecture: since it went open source in 2015 it’s become extremely popular in application development because it improves on REST in several key ways: it solves under- and over-fetching problems, makes your APIs self-documenting, and again is easier to maintain.

At JDLT we’ve fully embraced both GraphQL and microservices for all these reasons and more, and as such we’ve learned a lot about how the two interact with each other, what challenges are posed by combining them, and how they bring out the best in each other. That’s what I’m going to talk about in this post.

The benefits and the conflict

One of the benefits of GraphQL is that it allows you to serve your whole API from a single endpoint. Rather than dozens of endpoints, each doing something different and defined by a unique path, your client applications and users can hit a single endpoint with a query which defines exactly what they want from it. This means you don’t have to worry about different developers using different naming conventions for paths and you don’t have to worry about having to make multiple requests just to get one data set. It also significantly simplifies API management.

Let’s say we have a social media app. Our app has a RESTful API with a /users/<id> endpoint. You might hit it and get back a user object, complete with a name and the id of our chosen user’s best friend. If we want to show the logged-in user their best friend’ names in our UI, then one option is to make two requests to /users/<id>; the first with the logged-in user’s id, and the second with her best friend’s id (which we only know from the response to the first call). This is slow (HTTP requests are probably the slowest part of your application) and it involves more code than we want to write. We want to write code to fetch some data we want, not code to fetch part of the data we want, and then once that’s come back to make another request which uses some of the first lot of data in order to fetch the rest.

Our other option is to define our API endpoints based on the UI view which they’ll provide the data for, i.e. to build an endpoint which returns not only a user’s data, but also some data about that user’s best friend. The problem here is that now our ability to change our UI is linked directly to our ability to change our API — even if we now just want to show our user’s best friend’s profile picture to this view, we have to update our API to return it from the relevant endpoint. This might not slow the application down, but it certainly slows development down.

With GraphQL, our API knows how to retrieve a user for a given id, and it also knows that users have names and bestFriends. It even knows that bestFriends are in fact users and therefore it can retrieve a name (or even a bestFriend!) for the bestFriend of a given user. When our UI changes, we update our query to ask for the relevant data, but the API remains unchanged.

So, we’re sold on the benefits of a single API endpoint from which we can request whatever data we want. But, on the face of it, this seems to conflict with the fundamental concept of microservices: if they’re self-contained individual services, they define their own APIs; but if they all define their own APIs then we can’t access all of our data from a single endpoint.

And we really do want our microservices to be self-contained. Perhaps checking the name of a user’s best friend is the most popular feature of our application. In that case, our users service, which contains all the code to retrieve that information, will get used more than, for example, our email service, which might send notifications to our users when it’s their best friend’s birthday. So we probably want more instances of our users service running than of our email service, so that nobody gets any dropped requests for their best friend’s name, and we’re not paying for unneeded availability for our email service. But if the two are part of the same monolithic application (allowing one API to access them both), this seems to be impossible.

The solution

The solution to this problem is schema stitching, or, more specifically, GraphQL remote schema stitching. GraphQL schema stitching involves taking two separate GraphQL schemas (the data representation of a GraphQL API) and combining them into one. This is handy even in a monolithic application, because it allows us to have two different services define their APIs in their respective directories, then stitch them together and expose them as a single API, essentially combining data from multiple microservices.

GraphQL remote schema stitching is exactly the same, except the schemas we’re stitching together aren’t just coming from different directories on the same server, they’re coming from different servers (in our case, different microservices).

Once we know how to do that, we can create a new microservice which exists purely to stitch together the schemas from all our other microservices and expose them to create one GraphQL endpoint which can call any and all of our microservices. Essentially we'll be using a GraphQL microservice to allow combining data from multiple microservices.

Now we have a single GraphQL endpoint for our whole application, and our microservices are still totally self-contained.

The implementation

To implement this, we use a couple of NPM packages from Apollo called graphql-tools and apollo-link-http. We'll also need node-fetch for communicating between servers. In order to create our new GraphQL microservice, which we’ll call graphql-server, we need to be able to do three things:

‘Introspect’ the schemas from our other microservices, which means to send them a GraphQL request which simply returns information about the schema
Turn the responses from the introspection queries into a schema which graphql-server can execute by delegating incoming requests to the relevant microservice
Merge all those new schemas together

Fortunately, graphql-tools has all the tools we need for each of these things!

import {
  introspectSchema,
  makeRemoteExecutableSchema,
  mergeSchemas,
} from 'graphql-tools'
import { HttpLink } from 'apollo-link-http'
import fetch from 'node-fetch'

Step 1

First, we introspect our remote schema. This is really simple: we just call introspectSchema, passing in instructions about how to find the remote schema. And guess what, Apollo have a tool for creating those instructions too.

Per the graphql-tools docs, "Apollo Links are are chainable 'units' that you can snap together to define how each GraphQL request is handled by your GraphQL client."

An HttpLink is a specific type of Apollo Link which describes where to find a schema, like this:

const schemaLink = new HttpLink({ uri: 'https://your-api.com/graphql', fetch })
const schemaDefinition = await introspectSchema(schemaLink)

Step 2

With step 1 complete, we already have everything we need for step 2, turning information about a remote schema into a schema which graphql-server can expose to client applications.

const remoteSchema = makeRemoteExectuableSchema({
  schema: schemaDefinition,
  link: schemaLink,
})

Step 3

remoteSchema is now a fully fledged GraphQL schema, ready to receive requests. Obviously we’ll need to repeat steps one and two for each remote schema (i.e. each of our microservices), and we’ll assume the resulting schemas are in an array called remoteSchemas. All that remains is to merge them together.

const mergedSchema = mergeSchemas({ schemas: remoteSchemas })

mergedSchema is a single schema which contains all the individual schemas from steps one and two, meaning it can execute any query or mutation that any of the remote schemas could execute. And it will do so by delegating the query or mutation to the microservice whose schema originally defined it.

And that’s all there is to it! Just create a GraphQL server with mergedSchema and expose it on a http endpoint and you’re good to go.

Summary

We’ve covered a lot of ground here and achieved a lot of cool stuff, so let’s recap. We have:

Fetched data from each of our microservices about their GraphQL schemas and created links to them which will be used in order to delegate requests to them
Turned all that data and those links into GraphQL schemas which are executable remotely
Merged the new schemas together

…all of which allows us to:

Expose a single GraphQL endpoint which provides access to the functionality of all our whole application
Scale our microservices separately
Develop any given microservice without affecting any other

…all at the same time!

We’ve been using this sort of architecture in production for about 18 months now and we’re improving and refining it all the time. For our clients’ software and our own internal applications, we’ve found that the GraphQL microservices architecture has helped us develop quick and reliable software quickly and reliably.