How we apply Git to solve document collaboration and version control pains

Barbal was inspired by how software teams collaborate on code and delivers those benefits to professionals working on documents in traditional industries. Under the hood we’re powered by the popular Git version control system.

To mark the 500th pull request on our own codebase, we thought it timely to explore how Barbal uses software engineering approaches to make collaborating on documents painless, including an explanation of our opinionated approach and where we depart from what you might expect.

When we looked at how engineers, consultants and lawyers were working together on complex documents we saw similar trends:

  • Documents are highly structured and require consistency of style and numbering throughout
  • The content and presentation of documents are decoupled and under the control of different people
  • Rigorous approaches to review and approvals are applied before documents can be shared externally
  • Professionals don’t like using real-time collaboration, they prefer working in private and sharing their changes when their ideas are fully formed
  • Version control is essential both for efficiencies, but also to manage risk and liabilities in case of dispute
  • Often teams need to look back at how decisions were taken in the preparation of documents
  • People often have no control over who they will be required to collaborate with externally or receive comments from, and frequently it’s more people than expected

Barbal provides an intuitive document editor that addresses these needs. Under the hood, we’re powered by the world’s leading Git version control system, which has solved these challenges for software engineers.

Git was released in 2005 to address the pains software teams faced when working on source code. It has become the de-facto version control system for software teams; used by the likes of Google, Microsoft, Facebook, Twitter, LinkedIn and Netflix to reliably manage their software codebase.

In this article, I explore how Barbal makes the best use of Git for collaborating on documents whilst providing a user friendly tool with little-to-no learning curve. I assume if you’re reading this that you have a basic understanding of how Git works, if not this brilliant blog on Lawtomated explains it for a non-software audience, and this video from Git themselves gives a good introduction to what it looks like.

How we apply Git for documents

Principles

First and foremost, our users don’t need to know that we’re powered by Git or even what Git is. Barbal is operated solely through a graphical interface in the web browser. We use Git as a dependency on our Google Cloud Platform and interact with it via APIs. Barbal is designed to be intuitive with no training requirement, so we have abstracted and reframed many of the functionality and concepts that software people might expect or be familiar with.

Whilst people will use tools like Github, Gitlab or Bitbucket to manage their codebases, developers will be most familiar with Git as a command line tool for day to day interaction when writing code. For all its strengths, Git comes with a reasonable learning curve. So until a certain level of Git-mastery is attained, developers can get themselves into a myriad of different “pickles” To overcome these issues, we apply very opinionated ways of working to the point where allowing git-savvy users access to the git repositories directly, would risk breaking it.

A key challenge for us in building Barbal was remaining inspired by how Git works, but allowing ourselves the latitude to move away when it makes sense to do so. When taking decisions about when and how to be opinionated in using Git concepts and functionality our primary focus is on helping people to reach consensus. Barbal is now actually quite abstracted from the core Git library where we have replaced many of the plain text oriented aspects with new algorithms that can cope with rich text.

Barbal isn’t distributed in the way software teams would be familiar with, but we do allow authorised users to pull local copies for backup purposes.

If you haven’t yet seen a demo of Barbal, it would be worth watching it now and referring back to it as you read this.

Document format

Barbal supports rich text documents, so we store them as HTML. We clean the DOM at several stages of the save and merge cycle to strip any unsupported nesting or attributes.

For the types of documents we work with HTML offers several benefits over both markdown and .odf/.docx. For instance it is much more standardised than markdown and allows more complicated document functions like in-line diff’ing (tracked changes) paragraph classes and cross-referencing between parts of documents. But we don’t need to support the complicated layout and in-line markup of docx. With translation libraries like Pandoc available we expect at sometime in the future we will support bi-directional translation, but for now all work is undertaken within Barbal.

Branching strategy

Git is natively un-opinionated about branching strategies, especially when combined with tags, etc. Whilst norms have emerged around feature branching, when dev teams start working together there are decisions to be taken around branch naming, release management, pull request review and approvals, etc. In some ways, which task management and Git GUI system you use will lead you towards some of these answers as Git management tools are often themselves opinionated.

We tried introducing feature branching for document development, but found that this doesn’t reflect the way people work with documents. People flit between different sections, spotting errors and introducing ideas in an ad hoc way. When reviewing changes, people need to see the net effect of all the changes not just individual features or bugfixes as would be the case in a software pull request.

Barbal maintains a master branch (we call branches “copies”, see the section below for a dictionary), a separate branch for each team and then one for each collaborator within a team. When a team or the master branch has new changes we automatically push those out to all the branches on lower tiers. Users have control of their own copies, so we don’t make any changes without their permission.

When git is represented graphically, it looks like the London Underground map, with all branches running in parallel with the occassional fork and merge.

Typical Git representation with two branches

We think of Barbal much like a multi-tiered fountain where changes go up and down until all outstanding edits are resolved.

The Barbal version control approach resembles a fountain with three tiers

If a team wants to mimic feature branches they can do so by creating teams for focus areas, which is often the case with specialist working groups. Similarly, documents are typically structured around the different topic areas, so this keeps parts of the document segregated until the team is ready to share, avoiding potential conflicts.

Merge conflicts

We love conflicts. Or, rather, how we handle conflicts sets us apart from any other product we’ve seen. The typical first step taken by most projects, whether in Git or traditional document email tennis, is conflict avoidance. This is a) a fallacy, and b) a collaboration bottleneck.

In code, a line represents a single piece of logic or instruction and functions should be kept completely separate. In prose, a line (read: paragraph) can contain multiple statements and concepts within a single block of text. Even if you separate out work into features, edits are going to clash. Git has no concept of the structure of the DOM, so native merging algorithms can easily slice tables in two or make lists behave badly.

We have written our own merging algorithms that understand both the structure of sentences and the HTML DOM. Conflicts are handled in-line, our core strategy is to never block someone from working. We use HTML syntax to markup conflicts and allow them to move around the workspace without making things grind to a halt. Users can easily make out the nature of the conflict, comment and discuss the best way forward, then resolve it as agreed.

Example of how merge conflicts are handled in Barbal

Issue tracking and pull requests

The relationship between tickets in an issue tracker and feature branches in code is loosely coupled. They bear a strong resemblance but it’s not unheard of for feature branches to have no issue, or issues to be resolved without a corresponding branch.

As mentioned, we do not support feature branching. But issue tracking is clearly an important part of how work is planned, executed and ultimately approved and it’s a capability we wanted to give to Barbal users.

In traditional Word Processors, tracked changes show where edits are made in a kind of build-as-you-type diff. Users like this because it allows them to quickly see where their changes have been made and find those needles in the haystack.

Barbal has a classic issue tracker much the same as Github’s, but to put a positive (or at least neutral) spin on it we call them Proposals. Authors can tag individual tracked changes across a document with links to Proposals so that they can flip between conceptual discussions about the ideas and the details of the actual drafting. Proposals show extracts of the document with the changes so non-authors can quickly see what the changes are and make comments without wading through the whole document.

This latter serves like pull requests, rather than approving individual tracked changes (which is also supported), changes to documents can be approved in bulk via the associated Proposal.

Screenshot of proposals in Barbal's editor

Benefits of applying Git for documents

Collaboration

Barbal’s mission is to help professionals collaborate and reach consensus faster. Everything we do is geared around this, so naturally we selected Git on this basis.

Moreover, we were inspired not only by how Git allows internal teams to collaborate, but how it supports highly structured collaboration across organisations where everyone keeps control of what happens with their own copy. Take this to its logical extreme, as with open-source projects, Git allows people to collaborate with people they have never even heard of or otherwise interacted with. It’s such a powerful idea that runs counter to how we manage documents today with legacy editors; limiting the number of collaborators to avoid merging and version chaos and risks.

That everyone has control of their own copy, whether they are the owner of the master version or a just a minor contributor, is the key to supporting collaboration without having first to build trust.

Version Control

Version control is more than just making sure everyone is working with the latest changes.

With Git it means having several versions of the same document in circulation at the same time without causing an administrative nightmare. Imagine a fixed published version along with a draft revision that’s out for consultation, whilst teams continue to work and share new edits internally. Being powered by Git means that we can merge the latest changes in any direction at the click of a button with the full provenance of each edit preserved for scrutiny.

It means being able to look back and see how a document evolved over time, which team made which edits and how the discussion unfolded. Unlike with a vanilla Git implementation, we sometimes abstract a contributor’s details; when sharing documents with the other side in a negotiation the changes were made by the organisation, not the individual.

Forking documents

One of the most powerful aspects of Git version control is that a codebase can exist in separate repositories simultaneously whilst supporting merging between them. It means that two products can wander off in different directions, but their shared heritage means that features can be brought across between codebases simply.

For knowledge or advisory businesses serving a portfolio of clients with similar work, this unlocks new revenue opportunities. It creates the facility to truly productise their knowledge, hand finishing the outputs for different clients but allowing them to push changes out when, for instance, legislation changes. It not only gives a new scalable capability based on automating tedious admin, but also allows them to reframe away from per-hour billing to subscription based models. We call this capability Knowledge as a Service.

Collaborating externally

Also leveraging the forking capabilities of Git for documents, Barbal supports collaboration across organisational boundaries. Businesses shouldn’t be exposing their internal discussions about tricky technical or commercial matters, especially where privileged legal advice is sought. So not only can we create a hierarchy of teams within a workspace, we can also create a hierarchy or transactional workflow across organisations; squashing the intermediate changes as they’re transferred between repositories so each only has access to the net changes and information they require.

Audit history

Raised several times throughout, collaborative technology does not only need to remove frictions for working together today. In litigious, complicated or contentious areas knowing why a document says what it does and how decisions were reached is crucial.

Whilst the version control capabilities of Git allow a timeline of changes to be maintained, Barbal augments that by preserving the comment history and Proposals in the issue tracker.

Imagine three years after a contract was signed being able to click on a paragraph in a specification and see the full history of its authorship and negotiation. It’s the sort of capability that will accelerate the resolution of disputes and solve many organisational knowledge management conundrums.

Conclusions

In the two years since we launched our first prototype for Barbal we’ve had all sorts of people use it. They all tend to be experts in their field, but their technical literacy ranges from just about confident with MS Office to cutting code with the latest web frameworks. We’ve found that across the board there’s an unsolved challenge to be addressed and that Git provides an excellent foundation to build upon. We’ve also found some severe usability challenges with Git that we’ve had to develop a lot of opinionated and proprietary approaches overcome.

Collaboration involves people and so is, by its very nature, a messy problem to solve. Consensus, getting people with opposing worldviews to find a middle ground with something they can both stand behind, even more so. We’ve launched our beta platform and had over 500 pull requests on our own codebase to get to the stage we’re at, but we’re only at the beginning of our journey with helping professionals collaborate and reach consensus faster.

If you’d like to speak with me, have a demo or explore how Barbal can help your organisation, I’d be delighted. Please book a meeting here.

Git to Barbal dictionary

Git term Barbal term

Branch Copy

Code Document

Commit Save

Diff Tracked changes view

Issue Theme or Proposal

Merge Merge

Merge conflict Conflicting changes

Repository Workspace