GraphQL Design

You know the basics of GraphQL, but you are still uncertain about how to get business content and API structures right in GraphQL?

GraphQL is indeed an attractive data API for applications (and people). It works as a provider, which, based on a defined schema, exposes data APIs to applications. Obviously, good quality of the schema(s) is essential.
We are also looking a servers producing data from (m)any different sources, legacy or new. Garbage in equals garbage out. In consequence, GraphQL API design may take you into
resolving data discovery and unification issues (quality, metadata, business alignment etc.).
The schema is a "
data graph" containing related concepts in a network organized as a directed graph. The query results use the same naming etc. as in the schema, and the resulting data are structured like hierarchical tree structures.

This means that
GraphQL data API designers face some challenges:
  • Alignment with business terminology and definitions
  • Understanding complex schemas, structured as graphs
  • Correctly expose the structure of the relationships inherent in the exposed data
  • Handling traversals of many-to-many relationships in order to produce a result tree
  • Resolving data quality issues
Graph Data Modeling is clearly highly relevant for GraphQL developers. Concept and property graph models can significantly ease both the analysis of the data at hand and the organization of the resulting API schema and query structures.

What is GraphQL -- and why is Design important?

GraphQL is getting a lot of interest. GraphQL is a Facebook Open Source project, which has its' primary information site here: [http://graphql.org/](http://graphql.org/).

My interest is the relationship between GraphQL and design. Because that relationship is certainly very real. In graphql.org’s own words:

“Describe what’s possible with a type system. GraphQL APIs are organized in terms of types and fields, not endpoints. Access the full capabilities of your data from a single endpoint. GraphQL uses types to ensure Apps only ask for what’s possible and provide clear and helpful errors. Apps can use types to avoid writing manual parsing code.”

The gist of GraphQL can be seen in this example (from graphql.org):
Stacks Image 92
The context of the above is Star Wars metadata. And what you see to the right is actually part of a GraphQL Schema. What you see on the left could well be a query towards the API -- and the resulting set of data will share exactly that (data) structure.
The open GraphQL project started in 2012 and belongs in the software architecture universe talked about as APIs these days. In Facebook's own terms: "... GraphQL, a query language created by Facebook in 2012 for describing the capabilities and requirements of data models for client‐server applications" (GraphQL on Github).
The graphql.org site does a nice job of explaining, so I will not repeat all of that here. William Lyon of Neo4j has made an excellent overview of GraphQL, which is available as a "refcard" from Dzone: GraphQL Refcard (login required).

GraphQL Concepts

There are a number of concepts defined in the GraphQL context. The following concept model lays out all the important ones:
Stacks Image 79
(Refer to this for all the schema details: GraphQL Schema introduction).

A few words on the diagram style above:
  • It is a directed graph (of the concept map category)
  • Relationships are named
  • Relationships may be:
    • One-to-one (no arrowheads)
    • One-to-many (arrow)
    • Many-to-many (doublesided arrow -- not found in the diagram above).
This diagramming style iis exactly the same as found on other pages of graphdatamodeling.com. And that is the point. This kind of modeling lends itself well to the GraphQL universe, because GraphQL is, well, based on graphs!
NB: Some concepts: "Mutation", "Input", "Function" and "Subscription" are application / server construction services (updates, inserts, DB mapping and push services), which also are part of the schema. However, since the focus here is on the structure and meaning of the exposed model in it's own rights, these "DML" aspects of GraphQL are not considered here.

Positioning the Graph for Generation of Trees

Basically GraphQL developers face the same challenge that DW developers do: Garbage in = Garbage out!. You have to think about the data quality and so on.
But there is a new fresh challenge in GraphQL: The schema describes a graph (a network) whereas the queries result in trees only.
Let us have a look, taking an Email data model as an example:
Stacks Image 98
The data model above is a property graph, and it can be traversed in any direction. However, GraphQL is a graph, but the results are trees for each and every *query*.

Much depends on what your "root query" looks like. It determines the "perspective" of the possible queries. A schema should be serving a particular application's needs.

If queries in a particular schema should all start at the Originator level (because that is what the application needs), then we can construct API query result trees according e.g. to this pattern:
Stacks Image 102
This implies that e.g. Address-level or Keyword-level properties will be redundantly available in the trees, since they must be denormalized into a lower level or, alternatively, into a GraphQL list construct.

See more about these issues and much more in Visual Design of GraphQL Data - The Book!

10 Tips for Prettifying Graph Data Content

Since you may be developing GraphQL on top of data from sources of variable data quality, you may have to do some data discovery and unification on the resolver side. There is (too) much information about that on the Internet and in books (including two of mine). In the GraphQL context you should be observant on these 10 most important issues:

  • Include business names in the API
  • Identity and uniqueness
  • Presenting the keys
  • Presenting state changes
  • Presenting versions of data
  • Presenting dates and times
  • Presenting relationships and missing references
  • Which objects and which relationships
  • Presenting the right level of detail
  • Good relationships

How much work is necessary on the resolver side really depends on issues, which are partly out of your control:

  • The quality of the data sources by themselves (structure, meaning and content)
  • Conflicts arising from unification of data from multiple sources (both upstream and downstream)

Final Comments

The GraphQL approach has many benefits that seasoned data professionals will admire. It has a good potential of being a long-lasting thing; self-describing, structured result sets are good for everybody. The legacy technologies for interfacing with data were as good as they could be at the time they came about, but that is not good enough today. GraphQL is still young, but maturing, and everyone could benefit from having graph visualizations in there. The same goes for a visual, interactive version of GraphIQL, for end-users!

Oh, and remember: **Information is based on trust, and if business people do not trust or understand the data presented to them, they will stop using it!**

Be prepared to do the additional work, if necessary, based on circumstances. Make sure what you deliver is visual and pretty. Then you are good to go.

Read more in the book Visual Design of GraphQL Data - Getting Business Meaning and API Structures right.

If you want to, or need to, understand more about the GraphQL Data Design issues, look no further than to this book:
Stacks Image 108
All the details of Graph Data Modeling are explained in this book:
Stacks Image 54