When Vendor Licensing Changes Force Architectural Decisions

When Vendor Licensing Changes Force Architectural Decisions

After months of investigating issues with a new version of one of our third party tools, we were finally ready to deploy. Then we discovered our vendor had introduced processor-based licensing that made most of our production servers non-compliant. Here’s how a modernization effort forced us into emergency architectural changes.

The application

The application I work on is called ANODE (ANalysis On DEmand). It’s a distributed system that runs and displays the results of outage reliability, contingency, and transfer limit analysis studies using models we build that attempt to accurately predict the topology and conditions on the grid for each hour or day in a study.

ANODE uses two third party tools to do this. We use the first tool to build and solve the initial model from EMS (Energy Management System) snapshot data, reset certain values to their defaults (like whether a circuit breaker is open or closed), and apply future topology changes.

We use the second tool to build models for each hour or day in the study by applying generation, load, and equipment limit changes to the initial model and then run different kinds of studies against them. The vast majority of the server processing we need is for running these studies.

Modernization efforts

We’ve been stuck on an old version of the model-building tool for a long time. The first upgrade attempt started in fall 2023.  Every time we tried to upgrade to a newer version, we found issues in the study results that seemed to trace back to the initial model. None of the electrical engineers we worked with could figure out why.

This old version of the model-building tool comes with significant drawbacks. Most notably, the API used to access the tool only works in Python 2.7 which went EOL on January 1st, 2020. This forces us to use the last supported version of a lot of Python packages, doesn’t allow us to switch to different tools that only work in Python 3 like Dramatiq, and keeps us from using more modern language features like type hints.

This spring, after version upgrades to both third party tools and some code changes on our end, we finally fixed all the study results issues we’d been seeing and were finally ready to deploy to production.

Deployment

When I went to install the new version of the application and model-building tool on one of our production servers, I discovered a major problem. The tool would start up, perform a quick system check, detect the server had too many processors, and shut down. It turns out the vendor had introduced high performance computing licenses in the new version of the tool and most of our production servers now exceeded the limits of our licensing agreement. Servers that had been running our application using the old version of the tool for half a decade were suddenly unusable without a license upgrade.

Initial Response

When I escalated the issue to our product owner, his initial response was, “Pay them whatever they want.” The application was too important to let licensing issues block our modernization efforts. But getting a quote proved to be a challenge.  It took six weeks before the vendor provided pricing information for the HPC licenses and when they did, the price was a dramatic increase. At this point, the product owner’s opinion of the vendor had soured. “They are not being a good partner”.

Exploring Our Options

After a few meetings discussing our options, we decided on three:

Option 1: Negotiation

The product owner had a connection at the vendor so he reached out to them in the hopes they could help us get a better price. We hoped this would buy us time (a year) so we could go ahead and deploy the new version of the application now and take our time working on a more long-term solution. This option was moderately successful. The vendor agreed to lower their initial quote, but even the reduced price was too expensive for our budget.

Option 2: Build Our Own

One of my coworkers, who has expertise in both electrical engineering and software engineering, built a prototype for a model-building tool that would take EMS snapshots and output models in different model (bus-branch and node-breaker) and file formats. This prototype took a few weeks for him to build and was close to having working models, but was ultimately rejected as being too risky by the product owner. The application had been using the third party model-building tool for over a decade, so it was dependable, and he was worried about getting bit by edge cases that we couldn’t solve with the new tool.

Option 3: Split The Code Out Into A Service

Most, but not all, of our production servers exceeded the processor limit. We still had six smaller servers that didn’t. And we were already running parts of our application as a distributed system. If we split all uses of the model-building tool out into a service and only ran the service on the smaller servers, we could continue using our existing licenses.

This is the solution we ended up using. It wasn’t without significant challenges. The code that used the model-building tool was deeply coupled with the application logic because no one had considered that we’d have to either stop or change how we used it. But this refactoring was welcomed as part of the modernization effort.

Lessons Learned

The most important takeaway from this is that external dependencies can quickly become huge liabilities. Make sure the tools you use are still being actively developed and supported, monitor licensing agreements for changes that could impact you, and decouple your application code from those external dependencies so it’s relatively easy to pivot if necessary.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All