Context For The Problem
I work on a tool called the Field Tasking Manager (Field-TM) at the Humanitarian OpenStreetMap Team.
The goal is to coordinate field mapping activities, to both enrich OpenStreetMap data with field verified info, and collect field data for both development and humanitarian contexts.
Underneath, we use two tools from ODK.
Collect is used on mobile devices to send data to the Central server.
However, we need a way to be notified that a user has submitted new data from their phone, triggering an update from ODK Central --> Field-TM.
There are three obvious approaches for this type of behaviour:
- Pull via polling: polling of the ODK APIs at an interval (simple, requires no change to ODK, but resource inefficient).
- Push/pull via websocket: bi-directional communication, but potentially complex to implement, and requires full development control of both services (requires significant changes to ODK).
- Push via webhook: a change in ODK Central triggers a call to a remote API, including POST data (simple and efficient, requiring no persistent connection).
Using a webhook could potentially involve changes to the ODK codebase, but we can work around that - with the solution I will describe below!
There are of course other application communication methods that are out of scope of this article: event driven messaging, Redis pub/sub, gRPC, etc.
Postgres LISTEN / NOTIFY
Having not previously used the LISTEN / NOTIFY notifications functionality in Postgres, I stumbled across this excellent article by @brojonat, which in turn references this article by @brandur.
They describe a 'notifier pattern' for Postgres, where a connection is made to establish a 'listener'. Events in Postgres (such as a data insert) can be configured to trigger sending a notification to this 'listener', including the data that was inserted, or other data in the database.
image credit to @brandur's post linked above.
This sounds perfect for a webhook service!
It could be achieved like this:
- Create a connection to Postgres, with a LISTEN for events.
- Create a TRIGGER in Postgres, to NOTIFY the listener and submit data.
- Parse the data in our service, then create a POST request with the data on a remote API endpoint (webhook).
Ideally we could deploy this small service alongside ODK Central, scanning the Postgres database and triggering when new submission are made.
This would mean no changes are required to the codebase of the application we develop the webhook for - making this a performant, but non-intrusive method of implementing via middleware.
Why Golang?
I would say that I am far from proficient in Golang at this point, with Python being my preferred backend language.
Python has the excellent psycopg
library for interfacing with Postgres, and has some great documentation for how to use LISTEN / NOTIFY in Python.
However, as a language, Golang has many advantages for this type of project:
- High performance / resource usage ratio, with a low memory footprint.
- Statically compiled binary that can be distributed anywhere. Simply download the binary / executable and run, without worrying about dependencies, versions, etc.
- Has a very comprehensive standard library, meaning it should require little maintenance going forward (minimal dependency upgrades).
- Reasonably simple to use. Of course, it has a few extra concepts then Python, such a pointers, composition, and goroutines, but not a huge lift to learn.
Alternative Approaches
It should be noted there are alternative approaches to achieve the goals, which may or may not be better depending on your requirements.
Three notable examples:
Instead of implementing a custom listener/notifier, it's possible to use existing solutions, such as this Express app that be configured by JSON to listen for events and trigger webhooks. See related blog.
Possibly you can call the webhook directly from a Postgres trigger using pgsql-http, as long as the API has no authentication requirements.
This very neat logical-replication based approach to do similar.
The Implementation
Right, now let's dig into the code.
I won't replicate much here, as it's all available on the linked code repo, but will attempt to explain how the tool works.
- Here we use the github.com/jackc/pgx/v5/pgxpool package (the only external dependency used) as the Postgres database driver. The
lib/pq
package might look more official at first glance, but it is community led and effectively in maintenance mode only.
- This is where we implement the LISTEN / NOTIFY functionality, as described in the previous section / blogs.
- Here we have a reasonably complex TRIGGER function, with CASE statements to handle three different scenarios.
- This is the core logic of what event we want to trigger the webhook. Your implementation could be significantly simpler.
- We are using the ODK Central functionality called audit logs here, essentially an event stream we can hook into and trigger our notifications from.
CREATE OR REPLACE FUNCTION new_audit_log() RETURNS trigger AS
$$
DECLARE
js jsonb;
action_type text;
result_data jsonb;
BEGIN
-- Serialize the NEW row into JSONB
SELECT to_jsonb(NEW.*) INTO js;
-- Add the DML action (INSERT/UPDATE)
js := jsonb_set(js, '{dml_action}', to_jsonb(TG_OP));
-- Extract the action type from the NEW row
action_type := NEW.action;
-- Handle different action types with a CASE statement
CASE action_type
WHEN 'entity.update.version' THEN
SELECT entity_defs.data
INTO result_data
FROM entity_defs
WHERE entity_defs.id = (NEW.details->>'entityDefId')::int;
-- Merge the entity details into the JSON data key
js := jsonb_set(js, '{data}', result_data, true);
-- Notify the odk-events queue
PERFORM pg_notify('odk-events', js::text);
WHEN 'submission.create' THEN
SELECT jsonb_build_object('xml', submission_defs.xml)
INTO result_data
FROM submission_defs
WHERE submission_defs.id = (NEW.details->>'submissionDefId')::int;
-- Merge the submission XML into the JSON data key
js := jsonb_set(js, '{data}', result_data, true);
-- Notify the odk-events queue
PERFORM pg_notify('odk-events', js::text);
WHEN 'submission.update' THEN
SELECT jsonb_build_object('instanceId', submission_defs."instanceId")
INTO result_data
FROM submission_defs
WHERE submission_defs.id = (NEW.details->>'submissionDefId')::int;
-- Extract 'reviewState' from 'details' and set it in 'data'
js := jsonb_set(js, '{data}', jsonb_build_object('reviewState', js->'details'->>'reviewState'), true);
-- Remove 'reviewState' from 'details'
js := jsonb_set(js, '{details}', (js->'details')::jsonb - 'reviewState', true);
-- Merge the instanceId into the existing 'details' key in JSON
js := jsonb_set(js, '{details}', (js->'details') || result_data, true);
-- Notify the odk-events queue
PERFORM pg_notify('odk-events', js::text);
ELSE
-- Skip pg_notify for unsupported actions & insert as normal
RETURN NEW;
END CASE;
RETURN NEW;
END;
$$ LANGUAGE 'plpgsql';
- The code to parse the TRIGGER output into a Go struct, the marshal a JSON for sending to the webhook.
- Send a request with the given JSON payload, to the configured webhook URL.
- Handle authentication on the API by means of an
X-API-Key
request header.
The Outcome
The final code repository can be found here.
The final binary is ~10MB in size, 15MB when distributed in a minimal container image, and only consumes ~5MB of memory at idle.
Of course, this uses a connection to the Postgres database (that is cycled continually to remain alive), but overall I am extremely happy with this approach of a small lightweight service to run alongside ODK.
Forum discussion around the approach can be found here
The repo issue and linked PRs related to implementation with Field-TM can be found here
Top comments (0)