Peter Strøiman

Posted on Feb 8

I created a headless browser in Go. Here's what I learned

#go #tdd #webdev #javascript

I have more then 20 years of web development experience, both using server-side rendering, and React. React is a great framework for advanced UI; but it does bring a lot of complexity compared to SSR.

When I watched The Primeagen's introduction to HTMX, I had a eureka moment. A tool that enables the same smooth UX as React for the most parts; but with the simplicity of SSR. And when HTMX doesn't cut it, and a hybrid is a very plausible solution. PWAs are not possible, but who really need that? (who are willing to pay for the complexity)

I started learning HTMX, but as a long time TDD practitioner, I wanted to write tests to provide fast feedback helping drive implementation of behaviour.

Testing HTTP servers in Go is extremely easy, as an HTTP server is just a function; easily callable from test code. This makes testing the HTTP layer like testing any other part of the Go codebase, supporting mocking of dependencies as necessary.

But request/response verification of an HTMX test is tightly coupled to implementation details instead of behaviour and responsibility. The application behaviour is a result of a choreography of attributes in the HTML, backend route handlers, and the responses headers and response bodies intended trigger a specific behaviour in the browser.

I searched for what people used to test HTMX applications, and the answer was: browser automation, mostly playwright.

Such a tool has so much overhead that it discourages a TDD process. In my mind, there is only one way, a headless browser with an interface in the same language as your backend code, allowing tests to mock dependencies. And there wasn't one for Go.

So I decided to build my one: Gost-DOM.

Eliminating the unknowns

To get something like this working, I knew that there were two major uncertainties I had to address.

Parse HTML into a DOM
Execute JavaScript

I started exploring both options, and after just two days, I had an early prototype that was able to parse an HTML page with a <script> tag, and execute it.

It gave me confidence to believe that I could pull it off. But I hadn't fully grasped the scope of the project.

Parsing HTML

I'm not foreign to language parsers, generating an AST from source code. But HTML parsing has a unique twist. You cannot just exist with a non-zero exit code if the HTML is malformed. It must generate a DOM; and it's clearly specified how.

I decided to defer the problem of malformed content for the first prototype, and I quickly had a prototype that could parse a simple document into a DOM. I started implementing specific rules, such as executing scripts when they are mounted. Well, that's about it at the time.

While I was getting feedback on the actual DOM implementation in the golang subreddit, a friendly mentioned I might want to consider using the x/net/html package. This is a an HTML parser that already handles all the weird rules for malformed HTML.

While the types themselves couldn't work as the DOM in my scenario, and the parser didn't handle the DOM tree construction rules, it did handle the HTML parsing rule, including dealing with malformed HTML.

I decided to discard my own parser, and parse HTML into the x/net/html structure, then iterate that tree to build my own.

This solution has limitations, such as document.write doesn't work; the input stream has already been consumed when it's executed. But the solution is perfect for the first version of the tool. The priority is to support testing modern web applications, and the solution allowed me to quickly turn address other problems.

Embedding and extending V8

As fun as it would be to write a JavaScript engine, I would never have gotten to the point I am now, had I written my own JavaScript engine. But Go support linking to C code, so embedding V8 was a definite possibility, and I'm confident it has the features needed in a browser context.

I searching for anything about embedding V8 into Go, and I found more than I was looking for. A project already existed, exposing V8 to Go, a project called v8go. This hadn't been maintained for some time, but tommie's fork had not just been kept up-to-date with latest v8 versions. It automatically pulls the latest version from chromium sources!

I added tommie's version to my project, but soon learned, that many essential features of V8 wasn't exposed in to Go code. When the DOM itself is implemented in Go, I need objects in V8 to wrap Go objects, and build the prototype hierarchy in Go. All essential V8 features for this use case was not available in v8go.

I have added these in my own fork. Most of my changes have been merged to tommie's fork now, a few still only exist in my fork.

Learning CGo

To add the functionality, I needed to learn the V8 API, as well as CGo, the part of Go that allows calling C code, and calling back from C code.

As Rogchap, the original author of v8go, had already creates the framework, I could start with little knowledge, and copy/paste his code, adapting to the new V8 API functions that needed to be exposed. But the header files had a weird structure, which I realised was due to another problem.

Go and C++ is like oil and water

Go cannot call C++ code, just C code. Rogchap had implemented a solution using preprocessor directives. When header files are compiled as part of the Go compilation, it sees C compatible code, and when they are part of the C++ compilation, it appears as C++ classes.

The Go types only store pointers, so the types the the Go compiler see doesn't need to be complete. Go just need to know that they are pointer types. The following snippet of header file shows the general approach:

#ifdef __cplusplus
namespace v8 {
// Forward declaration, the header file only uses pointers to this, so 
// the header file doesn't actually _need_ to include v8's "isolate.h"
class Isolate;
}
// Define a v8Isolate type, which C++ code sees as the v8 Isolate class
typedef v8::Isolate v8Isolate;

extern "C" { // The following functions should have C-calling conventions.
#else
// For C code, v8Isolate is defined as a struct
typedef struct v8Isolate v8Isolate;
#endif

// There is a C-compatible function, NewIsolate() which returns a pointer to
// a V8 Isolate. C++ compilation will know what it is, the v8::Isolate class.
// Go code will just know that it's a pointer, and that's all that Go code
// needs to know. It's still a nominal type, so Go will still check that you 
// use the right types.
extern v8Isolate* NewIsolate();
#ifdef __cplusplus
}  // extern "C"
#endif

I can safely say, if Rogchap hadn't created this, my project wouldn't have had a V8 engine. I have written C++ and assembler code, so I understand the fundamentals of C compilation and linking, as well as C++ scoping rules, and the use of smart pointers for resource management; which V8 relies heavily on. But it's 20 years ago since I last wrote C++. I don't think I would have discovered this pattern on my own.

Passing Go objects to C

In order for an object in JavaScript to wrap an object in the host the V8 object needs to keep some reference to the host object.

For this purpose, V8 has the concept of an external value which can hold a generic pointer. But Go pointers aren't safe to pass to C for this usage. Go's garbage collector can move objects in memory, updating pointer variables accordingly. So any pointer held on to by C code would become invalid.

Go provides two solutions for this.

runtime.Pinner can prevent an object from being moved by the garbage collector, making pointers "safe" to pass to C code.
cgo.Handle creates an integer (uintptr) handle for the object. Internally, handles are generated by an atomically incrementing number, and a synchronised map convert a handle to a pointer.

Of course, both mechanisms prevent the object from being garbage collected, so you need to explicitly cleanup the object after use.

V8go didn't use any of these tho mechanisms, but had its own system which works exactly like the cgo handle.

A mistake was made

It took some time before I realised I had made a mistake. Before I started, I had already made a technical decision based on my existing knowledge: Integrate V8. It never occurred to me that there could be other options. There were two alternate options I should have considered, and one I should have picked:

SpiderMonkey, the JavaScript engine for Firefox.
Goja, a JavaScript engine implemented in Go.

Could spider monkey have been a better choice than V8? V8 isn't exactly easy to use outside C++ due to it's heavy reliance on C++ style resource management.

But Goja! A pure JavaScript engine! If I had known about this up front, I have used this instead.

I started adding support for Goja as an alternate engine. It's not ready yet, but when it catches up with the V8 integration, I will switch to Goja as the default engine, eliminating the need for CGo.

Integrating with Goja is significantly easier than integrating with V8

V8 support will not go away, as I should be able to safely rely on this being updated with latest JavaScript features.

Implementing an Object-Oriented API

To build a browser, you have to implement the DOM specification. And that specification is very 90s style object-oriented in its design. It is full of of circular references; and subclasses overriding behaviour of superclasses.

Everything in the DOM is a node, and you can append children to nodes. Children all know about their parent. Specialised nodes could be Text, Comment, and Element, which again has an HTMLElement subclass, further broken down into HTMLBodyElement, HTMLFormElement, HTMLAnchorElement, etc.

A first solution could be something like:

type Node interface {
    AppendChild(Node) Node
    // Unexported; we need to call it to maintain the tree, but client
    // code shouldnt.
    setParent(Node)
}

type Element interface {
    Node
    // methods that exist on element
}

type node struct {
    parent Node
}

type element struct {
    node
    tagName string
}

func (n *node) AppendChild(child Node) {
    n.childNodes = append(n.childNodes, child)
    child.setParent(n)
}

func (n *node) setParent(p Node) {
    n.parent = p
}

But this code doesn't work.

The type *node is a valid Node implementation, which makes *element one too as it embeds node. But when you call AppendChild on an element, it is a method on the embedded node that is executed. This method updates the parent of the new node to itself, the receiver value. Which is the embedded node, not the element.

So the DOM API, which is beyond my control, isn't a good fit for Go.

A solution. Was it the right one?

The solution I chose is to add a new method, SetSelf to Node, and single type that embeds a node must call it with a reference to itself as a Node.

type node struct {
    self   Node
    parent Node
}

func (n *node) SetSelf(self Node) {
    n.self = self
}

func (n *node) AppendChild(child Node) {
    n.childNodes = append(n.childNodes, child)
    child.setParent(n.self)
}

func NewElement(tagName string) Element {
    result := &element{newNode(), tagName}
    result.SetSelf(result)
    return result
}

The element tells the embedded node what it's own Node implementation is, pointing to the element, not the embedded node. So for an element, the Node passed to SetSelf can still be cast back to Element, and any method on Node, where *element has provided its own implementation, it is now the method on *element which is called, not the one on *node.

I'm not really happy with this. The necessity of the call is not very obvious from the code; and if a specialised node type forget to call it, the system will fail in ways that does little to reveal why.

The And as I don't want to have to implement every HTML element in the same package, the method also needs to be exported (what other languages call public).

I am wondering if a Strategy Pattern would have been a better solution, but that presents other challenges, like methods and attributes on specialised elements are exposed. E.g., a <form> element is represented by an HTMLFormElement and it has a specific properties and methods, e.g., a method property, and a requestSubmit method. And the HTMLTemplateElement hides all its children in a document fragment, accessed through a content attribute.

But so far, the code works as intended, and perhaps a different solution will present later?

p.s. If you have actual experience building a rendering engine, I'd like to hear from you, about your experiences.

"You Know Nothing"

I have been in the software development industry for 25 years, and most of that has been web development. I am surprised at how much I didn't know.

There are special elements with special behaviour. For example, the <template> which doesn't have any children. Instead, it's children in the HTML are added to a document fragment accessible in the content property. Yet they are rendered when reading outerHTML. This means that the HTMLTemplateElement needs to override outerHTML (object-oriented API).

Attributes aren't just attributes

The first good half of my career was server-side rendering of HTML (really the only option back then). Later, I have written a lot of React using the JSX syntax, which is almost indistinguishable from HTML.

So I was rather surprised, when I learned that attributes on elements aren't just attributes. After all, for both SSR and React, I write attributes in an HTML-like style, not revealing any difference, except for a few with different names. Yet there are two types of attributes: data attributes and idl attributes. The attributes in the HTML are data attributes; an accessible in the DOM using getAttribute and setAttribute.

The IDL Attributes are exposed as properties on the JavaScript element objects. They often reflect an identically named data attribute, but the can implement specific behaviour.

For example, the HTMLFormElement has a method IDL attribute. This will always return either "get" or "post", no matter the value of the data attribute. The default value is "get", and any invalid value will result in "get".

form = document.createElement("form")
form.getAttribute("method")
// null
form.method
// "get"
form.method = "invalid"
form.getAttribute("method")
// "invalid"
form.method
// "get"
form.setAttribute("post")
form.getAttribute("method")
// "post"
form.method
// "post"
form.setAttribute("pOsT")
form.getAttribute("method")
// "pOsT"
form.method
// "post"

Implementing IDL attributes isn't too problematic; they often expose behaviour I need anyway; e.g., both the method and action IDL attributes on a form implement logic necessary for form posting anyway.

But I was rather surprised I didn't know of this distinction.

Some IDL attributes also have different names then their data attribute counterpart, for example all aria- data attributes have camel-cased IDL attribute names, e.g., arialLabel. And the data attributes for and class are reserved JavaScript words, so the corresponding IDL attributes are named htmlFor and className.

React developers will undoubtedly recognise the latter two.

Where's the event loop?

I was browsing through the V8 API documentation, searching for the place to register a callback handling errors caused by setTimeout or setInterval callbacks; but nothing like that existed.

That's because those functions aren't in V8 at all because they are actually not part of the ECMAScript specification.

Browsers and node.js just happen to implement these functions. I did know that the two actually don't have the same interface; they differ in return type of the returned handle. The handle is a number in a browser, while it's an object type in node.

In hindsight, that alone should have been the dead giveaway that it's not implemented by V8. If chrome and node.js both use V8, and a global function returns different types, then it's probably not supplied by V8 itself.

So to write a headless browser, you need to write an event loop.

Autogenerating Code

Much of the behaviour of the DOM is specified by Web IDL specifications. After the first couple of JavaScript wrappers were written by hand, I started to investigate the possibility of autogenerating this code. I found webref, repository that contains a collection of all the IDL files, and is automatically updated. I added this as a submodule to start consuming those files.

Over time that process led to two new separate Go packages that have made available separately, as they have general purpose use:

gost-dom/webref - A package that exposes a subset of the IDL data as native Go types.
gost-dom/generators - Helper types for code generation. This is a layer on top of jennifer, providing an interface that lends itself better to composition.

Cutting corners

It's a long tradition in JavaScript to use polyfills to backport new browser features to older browsers. The same polyfills can also be used to quickly add functionality to Gost, and many already exists. Currently, XPathEvaluator is a pure JavaScript using slightly modified code from jsdom¹.

Many smaller functions, and constants, are only implemented in JavaScript code. For example, node type constants are in JavaScript space:

Node.ELEMENT_NODE = 1;
Node.ATTRIBUTE_NODE = 2;
Node.TEXT_NODE = 3;
Node.CDATA_SECTION_NODE = 4;
// etc

For CSS selector queries, I use a Go package, CSS. This works with x/net/html, the types that were used during HTML parsing. So when performing a CSS query, I recreate an "x" HTML tree from the Gost DOM, run the query on that tree, and translate the result back to the original elements.

Unfortunately, the library doesn't support checking if an element itself matches a CSS selector. So eventually I had to create my own matcher for Element.matches. But that is a minimal implementation, only used here, and only supporting the selectors that HTMX uses.

The really cool part!

As mentioned in the beginning, testing an HTTP server in Go, is just a matter of calling an HTTP handler function.

A headless browser in Go can do the same. From day one, I added the ability to bypass the TCP layer completely, connecting directly to the http handler implementation.

b := browser.NewFromHandler(myserver.RootHttpHandler)
// The host is ignored, but cookies don't work if it's not there.
window := b.Open("http://localhost:1234/")
window.Document().GetElementById("login-link").Click()

There's no overhead of the TCP stack, no hassle of managing startup and shutdown. You can run all tests in parallel with full isolation; of your own code supports isolation. Each browser has it's own isolated V8 instance.

The state of Gost

I recently released version 0.1. This release signifies that:

This has very limited functionality, but enough to support some real usage patterns.
The API is not guaranteed to be stable.

But it is now in the state where you it covers basic scenarios like a "login flow"

Navigate to the "index".
Click an HTMX boosted link that should navigate to a page requiring authentication, responding with a hx-push-url header, and the login page
Verify that the current location is the login page, and the history has a new entry.
Fill out the form (setting the "value" data attribute on the input fields).
Click the form's submit button (<button type="submit"> or <input type="submit">, both works).
Verify that the user is now redirected to the page they were trying to access, the location is updated, and the history has a new entry.

Any JavaScript/DOM feature that is not directly affected by this flow isn't supported yet. E.g., there is a setTimeout function, but it disregards the timeout value; the callback is scheduled for immediate execution. There isn't yet a setInterval.

What's up.

The currently planned next feature is to have a proper event loop supporting setInterval, as well as the clear function. And to support time travel in order to test behaviour that requires time to pass, without actually having to wait in the test. Very much like ~~lolex~~ sinon fake timers.

But that shouldn't take too long to implement. I'll just inject fake timers on the JavaScript side.

But user feedback can to a large degree change priorities. If real users struggle with specific problems due to lack of support, that will be a natural focus of attention.

Please go use it

I believe that this would be an extremely useful tool for anyone using Go for web development with SSR and JavaScript, in particular HTMX. And as the popularity of this stack increases, this tool becomes even more relevant.

In theory, this should work with React apps as well; but the current focus is HTMX support.

I invite everyone to try it out. Use it for the simple cases, as explained above.

And please, spread the word.

You can find Gost here.

jsdom doesn't have an XPathEvaluator class, but it does implement the global xpath evaluation functions. My adaptation was to write a class using those functions. ↩

DEV Community