In software development we use tools constantly that allow us to go back to the way things were. Keyboard shortcuts for every editor to undo a change are committed to muscle memory. Any number of changes to a file can be reverted before it is saved. After the file is saved it has to be committed to a version control system like Git that allows any number of changes to coexist neatly organized into their own branches. The ability to quickly and safely revert to the way things were a second ago, an hour ago, or three months ago is absolutely key to our ability to fearlessly build the features that our stakeholders need.
On one recent project we used the Rails web framework backed by a Postgres database. Input form elements we’ve built give our users normal undo abilities in each field. All changed fields in a form can be discarded with a cancel button. This is about where the ability to revert ends for users of most web applications.
As the needs for our application have grown we’ve added aggregations of data and reporting based on the data that was entered into those forms. Some users have different needs so there is configuration enabling different aggregations, reports, and even different workflows for entering data. As we add features we increase the power and usefulness of the application, but along with those benefits come additional costs. Some configuration changes are very difficult to walk back. To see the results of some reports a lot of data must be entered. Our users become preoccupied with “What If?” questions before they make any changes as the cost of reverting those changes grows. We wanted the ability to try out sweeping changes, and answer the “What If?” questions and get back to the status quo or even compare two paths forward.
First we considered using disaster recovery backup and restore procedures to create a separate database for each scenario. The application is multi-tenant and each scenario would be specific to one tenant. Inefficient, but not a deal breaker. The ops automation for restoring a database required a login with appropriate permissions and two-factor authentication. Allowing a user to independently start a scenario would require a system account with appropriate permissions to create and destroy entire databases in our cloud with API calls originating from the application. The security implications were a little concerning.
We started looking into a smaller more targeted copy of data within our single postgres database. This was daunting in its own right. Primary key unique constraints and foreign key constraints ruled out directly copying data within each table. Loading up entire data sets in ActiveRecord and saving them as if they were new would be impossibly slow.
Rails has the ability to connect to a number of databases, but this ability comes at the expense of not supporting all the features that make Postgres unique. With a little digging we found a number of incredibly useful tools.
CREATE UNLOGGED TABLE destination_table (LIKE source_table INCLUDING ALL)
LIKE does the heavy lifting of replicating our existing table definitions. UNLOGGED saves a fair bit of time by skipping the write ahead log (WAL). For these hypothetical scenarios speed is more important than crash resilience, so this is an acceptable trade off.
INSERT INTO destination_table SELECT * FROM source_table
Copying the bulk of the data is done directly by selecting it out of existing tables unchanged. Since the data is being copied from a query we can use a WHERE clause to filter out anything not relevant to the scenario. All of the data copying is happening within the database server and without jumping through our Rails application server.
With the ability to copy tables comes the problem of how to organize them. Table name prefixing or suffixing schemes would quickly become unmanageable for the number of tables and arbitrary number of copies we want to support. Postgres comes through again with “schemas”. This term is a bit overloaded, but for our use case it mostly means namespaces for our tables. One brilliant detail is the ability to use a “search path” where queries will prefer tables in one schema, but fall back to using same named tables in other schemas. This allows us to skip copying tables that we don’t intend to change during the lifetime of the scenario.
Our data copying and organization needs were met. The last piece of the puzzle was building some sort of switching mechanism to allow a user to enter and leave the hypothetical scenario copy of our application at will. Enter Postgres roles! At first read these seem like user accounts for the database, but they are much deeper than that. Roles can be created to manage permissions for groups of roles. Access to a role can be granted and even assumed in the middle of a session! We were able to associate our new scenario schemas with a scenario specific role. Then our Rails application controller in an around_action can set the active role to match the scenario the user selected and our Rails application transparently uses our newly cloned tables.
Putting it all together
With some UI work for managing scenarios our user can select one scenario and instantly enter a complete and functioning alternate universe! They can enter new data, change configuration, and see the results of reports without worrying about how to get back to how things were.
This whole feature was motivated by a stream of “What If?” questions.
“What if I type in a different number?”
“What if we aggregate these numbers differently?”
“What if we change the reports we’ve been using for years?”
“What if our application users could branch their work like we do with source code?”
“What if we explore beyond the database features our framework supports natively?”
While I’m not sure this was the only or even the best way to build this feature, I’m satisfied with the way this scenario played out.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.