DEV Community

Steve Coffman
Steve Coffman

Posted on • Edited on

Authorization (authz) and GraphQL

Hi! I maintain gqlgen a popular GraphQL library for Go. I've noticed many people stumbling on GraphQL authentication regardless of programming language. This is partly because they don't know what the possibilities are, what the tradeoffs of those possibilities are, or even what to ask for advice about.

Hopefully, this article will clarify choices, and help you make informed decisions.

Securing dynamic data access is hard

GraphQL APIs are appealing because they are flexible, but this makes adding authorization difficult. In the REST world, you can (at a minimum) authorize individual endpoints, but in GraphQL, you have to find a way to authorize each query and mutation generically.

In addition to the client/server behavior changes, GraphQL also introduces features like federation that make it possible to deploy many different GraphQL services and expose them via a single unified API to clients. The big issue you face when building authorization in a distributed architecture is not having local access to all of the data required to make authorization decisions.

The correct solution is (as always) dependent on your expected scale and complexity budget.

Authorization models in GraphQL

Imagine a user/subject with an ID like id_1234567890 that makes an HTTP POST request with a GraphQL mutation to rename a District with ID of "47b3f42a-65c2-46b1-93d5-47ff4e92cf5b". That looks like this:

mutation {
  updateDistrict(input: {name: "Test", id: "47b3f42a-65c2-46b1-93d5-47ff4e92cf5b"}) {
    id
  }
}
Enter fullscreen mode Exit fullscreen mode

How could you handle authorization (allow or deny) for this request in different authorization (authz) models?

  • Role-Based Access Control (RBAC)

    • Only the subject’s role determines whether to allow or deny access. e.g. DevAdmin Role (“Can Edit District”)
  • Attribute-Based Access Control (ABAC)

    • The user/subject, action, and resource are combined to determine whether to allow or deny access. e.g. userID + updateDistrict + district:ID would be allowed for users who have the Attribute of owner for the district:ID of 47b3f42a-65c2-46b1-93d5-47ff4e92cf5b
  • Policy-Based Access Control (PBAC)

    • Like ABAC but the rules are kept in policy documents. Attributes in rules can be on subject, object, or action.

      [request_definition]
      r = sub, obj, act
      
      [policy_definition]
      p = sub, obj, act
      
      [policy_effect]
      e = some(where (p.eft == allow))
      
      [matchers]
      m = r.sub == r.obj.Owner
      
  • Relationship-Based Access Control (ReBAC)

    • Relationship-based access control (ReBAC) is both a subset of ABAC and a superset of RBAC. ReBAC builds a relationship graph between subjects and objects via relations. These relations include data ownership, parent-child relationships, groups, and hierarchies (or relation chains). Google’s Zanzibar proved this could be extremely low latency at a huge scale. Zanzibar’s relations are built out of tuples defined like:

      tuple := (object, relation, user)
      object := namespace:id
      user :=  object | (object, relation)
      relation := string
      namespace := string
      id := string
      

      For instance, "User id_1234567890 owns District 47b3f42a-65c2-46b1-93d5-47ff4e92cf5b" is represented as a tuple of <resource>#<relation>@<subject>like:

      district:47b3f42a-65c2-46b1-93d5-47ff4e92cf5b#owner@user:id_1234567890
      

      The policy that operates on the graph of such tuples would always have both a check_permission and check_relation similar to:

      allowed {
        ds.check_permission({
          "subject": {"id": input.user.id},
          "permission": "can-edit",
          "object": {"key": input.resource.district_id, "type": "district"},
        })
      }
      
      allowed {
        district = ds.object({"key": input.resource.district_id, "type": "district"})
      
        ds.check_relation({
          "subject": {"id": input.user.id},
          "relation": {"name": "manager_of", "type": "user"},
          "object": {"id": district.properties.owner_id},
        })
      }
      
  • Graph-Based Access Control (GBAC)

    Similar to how ReBAC builds a restricted graph, GBAC uses arbitrary queries of an unrestricted graph. For instance, in DGraph you can add schema directives with @auth rules that are arbitrary GraphQL queries:

    mutation {
      updateDistrict @auth(
        query: { rule: """
            query($USER: String!) {
                queryDistrict {
                    owner(filter: { username: { eq: $USER } }) {
                        __typename
                    }
                }
            }"""})(input: {name: "Test", id: "47b3f42a-65c2-46b1-93d5-47ff4e92cf5b"}) {
        id
      }
    }
    

Ad hoc ACLs or Centralized System?

Access Control List (ACL): An access control list (ACL) is a list of rules that specify which users or systems are granted or denied access to a particular object or system resource.

For our example mutation, would we want the access control list of rules for it to be unique (bespoke or ad hoc), or compose it from some standard rules? These are some common ones that might be standardized:

  • OpenAccess is an ACL that is a no-op: it marks that this resolver is open-access. (Add a comment if it’s not obvious why!)

  • IsLoggedIn checks if the request is made by some sort of authenticated user.

  • IsCurrentUser checks that the given user (target) and the current user (actor) are the same.

  • ActorHasPermission checks that the actor has a specified permission (TODO link).

  • IsManagedByActor checks that the current user (actor) manages the resource (target).

  • ValidatesSecret checks that the request includes the correct shared secret.

Roles vs. Scopes

Scopes are usually associated with API access. An API defines what scopes are available (what services it provides). For example, a user account management API might define scopes like read:user, create:user, update:user. These are the capabilities the API provides, but not necessarily what any given user can do. In the Role & Scope model, Roles are defined, and users are given a Role. Individual Scopes are associated with a given Role, combining all these elements together. For example, you might have:

Role: Audit, API: user_manager, Scopes: read:user
Role: Access Control, API:  user_manager, Scopes: create:user, update:user, read:user
Enter fullscreen mode Exit fullscreen mode

So if a user, Alice has the Audit role, while Bob has the Access Control role, then a GraphQL query of users might have the scope read:user and allow them both, but a mutation to edit a user might have the scope update:user and only Bob should be allowed to make that mutation request.

Where to perform authorization in a GraphQL architecture?

There’s a great article on https://www.osohq.com/post/graphql-authorization that you should just read. Assuming that authentication (authn) has already happened and the user has some sort of authenticated identity token (by the way, this is an excellent article on the different token types), where should you perform authorization (authz)?

  • GraphQL resolver

    Often people put code into the beginning of their resolver like this:

    if !(acl.IsCurrentUser(ctx, userId) ||
        acl.ActorHasPermission(ctx, capabilities.CanChangeUserData, acl.UserScope(userId)) ||
        acl.IsManagedByActor(ctx, userId)) {
        // return an unauthorized error
    }
    
  • Directives

    You could add directives to your schema like in wundergraph or DGraph:

    mutation @rbac(requireMatchAll: [superadmin]) {
      updateDistrict(input: {name: "Test", id: "47b3f42a-65c2-46b1-93d5-47ff4e92cf5b"}) {
        id
      }
    }
    
  • Middleware

    Middleware can decouple your authorization logic from your schema as with GraphQL Shield. Middleware authorization works best for rules that apply to your whole schema at once i.e. every query and resolver, since middleware doesn’t have more specific information for more complicated domain rules. Good middleware rules could be used to filter out invalid tokens, reject non-safelisted queries, or to use calculated query complexity scores and reject overly expensive queries (e.g. see compgen).

  • Data Access Layer

    Since GraphQL can return partial results, authorization for reading data can be pushed below resolvers. If you are using PostgreSQL you can even use row and column-based access to push it all the way out of your app into your database! However, authorizing this deep is awkward:

    • without request and user-specific variables
    • for write access control
    • if your data is in more than one place.
  • Federated Gateway


    Like middleware, this is either a great or terrible place for authorization. If there is no way to bypass a federated gateway, and you have only a few simple rules without needing domain or application-specific information then it can greatly simplify things. However, for more complicated needs (e.g. ABAC) trying to do authorization at the gateway is an example of https://www.thoughtworks.com/radar/platforms/overambitious-api-gateways that encourages designs that are difficult to test and deploy. You can still use it like a middleware to filter out obvious bad actors and filter out invalid tokens, reject non-safelisted queries, or to use calculated query complexity scores and reject overly expensive queries

  • External Authorization System

    Using Policy engines like SpiceDB, OpenFGA, ORY Keto, OpenPolicy Agent (OPA), let you put your

    ReBAC rules in an external system and references them from your queries. The main benefit you get from the centralized relationships model is it makes it possible to manage authorization centrally. This means that development teams can create new applications and add new relationships without needing to update any application code.

    However, the downside is that you are constraining your application to use a very specific data model and you need to design your application around that data store.

    At a certain scale, the balance tips towards centralization.

  • Identity token claims

    Tokens are great since HTTP is stateless. If you have a few roles (or claims), you can put them in the token, and your authorization logic is done. A claim of mutation:* and query:* would give full access, just as a role of admin would. However, cookies are headers, and you are limited in the total size of any one header, as well as the total size of all your headers together.

    Microsoft specifically calls out Windows authentication (NTLM/Kerberos/Negotiate) as not supported on HTTP/2 due to HPACK performance issues that those large headers cause.

    HTTP/2 sets a default 4K limit on all headers together. Beyond that, plan to set your relationship status to "It's Complicated" as you will need to negotiate increasing it with all your intervening infrastructure (load balancers, proxies, CDNs).

    • nginx: link --> 8KB for 1 header max
    • envoy (used in istio): link --> 60 KB for headers
    • node.js: link --> 16KB for headers
    • traefik: link --> re-uses go/http: 1MB for headers

More Reading

I cannot stress enough that you should read Patrick O'Dougherty's GraphQL Authorization Patterns and Thomas Ptacek's API Tokens: A Tedious Survey.

Conclusion: Wait, what is your advice?

Start with RBAC, and use schema directives, codegen and identity tokens. Then you can grow out of it to more complicated setups.

Move as much authorization into the schema, so everything is more transparent and visible. It helps you to set good boundaries.

Parting Questions

What are your favorite authorization schema directives? What is your favorite token claim, attribute, policy, or role?

Top comments (0)