GraphQL Design

You know the basics of GraphQL, but you are still uncertain about how to get business content and API structures right in GraphQL?

GraphQL is indeed an attractive data API for applications (and people).

The GraphQL Schema is pivotal to the success of a GraphQL API. Most development projects involve many stakeholders. There are the developers, of course, and there are business experts as well as application owners, who are to consume the content of the API. This means that the schema is not only a scope contract, but also the authoritative source of structure and meaning of the context covered by the API. There are many other contexts, where complex structures and semantics must be communicated effectively between a number of people with various backgrounds. And the trick invariably turns out to be: Use good visualizations!

The main proposition of the book Visual Design of GraphQL Data - A Practical Introduction with Legacy Data and Neo4j is graph visualization: GraphQL Schema structure and meaning must be visualized, and the book shows you how. Since the schema is a “data graph” containing related concepts in a network organized as a directed graph, the increasingly popular property graph paradigm is very appropriate for visualizing GraphQL structures and semantics.

The second theme of the book is that of quality. GraphQL APIs can be used in many constellations possibly including legacy data and/or externally sourced data. Quality must be ensured in all cases. Both on the definitional level (business terminology etc.) and on the data content level (meaningful presentation of the data in business-friendly formats). The book summarizes the top 10 most important focal areas of such quality assurance remedies.

Visual Design of GraphQL Data - A Practical Introduction with Legacy Data and Neo4j contains simple guidelines based on lessons learned from real life data discovery and unification. This helps developers and architects to get good quality in the resulting API designs. And the visual techniques helps in producing convincing visual communication about the structure of the API designs.
Spending time on schema quality means that developers work from sharp definitions, which in turn leads to greater productivity and well-structured applications.

What is GraphQL -- and why is Design important?

GraphQL is getting a lot of interest. GraphQL is a Facebook Open Source project, which has its' primary information site here: [http://graphql.org/](http://graphql.org/).

My interest is the relationship between GraphQL and design. Because that relationship is certainly very real. In graphql.org’s own words:

“Describe what’s possible with a type system. GraphQL APIs are organized in terms of types and fields, not endpoints. Access the full capabilities of your data from a single endpoint. GraphQL uses types to ensure Apps only ask for what’s possible and provide clear and helpful errors. Apps can use types to avoid writing manual parsing code.”

The gist of GraphQL can be seen in this example (from graphql.org):
GraphiQL Query Result
The context of the above is Star Wars metadata. And what you see to the right is actually part of a GraphQL Schema. What you see on the left could well be a query towards the API -- and the resulting set of data will share exactly that (data) structure.
The open GraphQL project started in 2012 and belongs in the software architecture universe talked about as APIs these days. In Facebook's own terms: "... GraphQL, a query language created by Facebook in 2012 for describing the capabilities and requirements of data models for client‐server applications" (GraphQL on Github).
The graphql.org site does a nice job of explaining, so I will not repeat all of that here. William Lyon of Neo4j has made an excellent overview of GraphQL, which is available as a "refcard" from Dzone: GraphQL Refcard (login required).

GraphQL Concepts

There are a number of concepts defined in the GraphQL context. The following concept model lays out all the important ones:
GraphQL Concept Map
(Refer to this for all the schema details: GraphQL Schema introduction).

A few words on the diagram style above:
  • It is a directed graph (of the concept map category)
  • Relationships are named
  • Relationships may be:
    • One-to-one (no arrowheads)
    • One-to-many (arrow)
    • Many-to-many (doublesided arrow -- not found in the diagram above).
This diagramming style iis exactly the same as found on other pages of graphdatamodeling.com. And that is the point. This kind of modeling lends itself well to the GraphQL universe, because GraphQL is, well, based on graphs!
NB: Some concepts: "Mutation", "Input", "Function" and "Subscription" are application / server construction services (updates, inserts, DB mapping and push services), which also are part of the schema. However, since the focus here is on the structure and meaning of the exposed model in it's own rights, these "DML" aspects of GraphQL are not considered here.

Positioning the Graph for Generation of Trees

Basically GraphQL developers face the same challenge that DW developers do: Garbage in = Garbage out!. You have to think about the data quality and so on.
But there is a new fresh challenge in GraphQL: The schema describes a graph (a network) whereas the queries result in trees only.
Let us have a look, taking an Email data model as an example:
Email Property Graph Example
The data model above is a property graph, and it can be traversed in any direction. However, GraphQL is a graph, but the results are trees for each and every *query*.

Much depends on what your "root query" looks like. It determines the "perspective" of the possible queries. A schema should be serving a particular application's needs.

If queries in a particular schema should all start at the Originator level (because that is what the application needs), then we can construct API query result trees according e.g. to this pattern:
Email GraphQL Paths
This implies that e.g. Address-level or Keyword-level properties will be redundantly available in the trees, since they must be denormalized into a lower level or, alternatively, into a GraphQL list construct.

Taking Legacy Data into GraphQL

Taking existing data into GraphQL can be a daunting task. Even if you get the design right, there are many surprises and possibly "loose steps" on the bridge:
Pitfalls GraphQL Existing Data

10 Tips for Prettifying Graph Data Content

Since you may be developing GraphQL on top of data from sources of variable data quality, you may have to do some data discovery and unification on the resolver side. There is (too) much information about data preparation and ETL on the Internet and in books (including two of mine). In the GraphQL context you should be observant on these 10 most important issues, which are explained in the book:

  • Include business names in the API
  • Identity and uniqueness
  • Presenting the keys
  • Presenting state changes
  • Presenting versions of data
  • Presenting dates and times
  • Presenting relationships and missing references
  • Which objects and which relationships
  • Presenting the right level of detail
  • Good relationships

How much work is necessary on the resolver side really depends on issues, which are partly out of your control:

  • The quality of the data sources by themselves (structure, meaning and content)
  • Conflicts arising from unification of data from multiple sources (both upstream and downstream)

Using GraphQL together with Graph Databases

Since GraphQL does indeed contain directed graph schemas, it is obvious to consider a graph database as the underlying data source (or as part of the platform, anyway).

This GraphQL book explains in quite some detail how to:

  • Use GraphQL together with an existing graph database in Neo4j
  • Use GraphQL together with a new graph database in Neo4j
NB: You might want to consider generating a graph data model from a UML ERD or XSD metadata file in XML: See Metadata Recycling into Graph Data Models

Final Comments

The GraphQL approach has many benefits that seasoned data professionals will admire. It has a good potential of being a long-lasting thing; self-describing, structured result sets are good for everybody. The legacy technologies for interfacing with data were as good as they could be at the time they came about, but that is not good enough today. GraphQL is still young, but maturing, and everyone could benefit from having graph visualizations in there. The same goes for a visual, interactive version of GraphIQL, for end-users!

Oh, and remember: **Information is based on trust, and if business people do not trust or understand the data presented to them, they will stop using it!**

Be prepared to do the additional work, if necessary, based on circumstances. Make sure what you deliver is visual and pretty. Then you are good to go.

Read more in the book Visual Design of GraphQL Data - A Practical Introduction with Legacy Data and Neo4j.

If you want to, or need to, understand more about the GraphQL Data Design issues, look no further than to this book:
Visuel Design GraphQL Data
All the details of Graph Data Modeling are explained in this book:
Graph Data Modeling NoSQL SQL
RapidWeaver Icon

Made in RapidWeaver