Taming Names in Software Development

Taming Names in Software Development

Good names

What is a name? A name is a label, a handle, a pointer in your brain’s memory. A complex idea neatly encapsulated. A name lets you refer to “the economy” , or “dogfooding” mid-sentence without needing a three-paragraph essay to explain the term.

If you think of software development as just carving up data into boxes and labeling them, it becomes clear why Naming Things is one of the two hard problems in computer science. Your brain has only so much space in working memory, and a good name makes the most of it. A good name is succinct, evocative, fitting. It reduces cognitive load and stand outs in your mind. Bad names are obscure, misleading, fuzzy or outright lies.

In software, really good names are meaningful, descriptive, short, consistent, and distinct. You will notice that ‘descriptive’ and ‘short’ are diametrically opposed. As are ‘consistent’ and ‘distinct’. There is no solution, only tradeoffs.

Descriptive names are safe, legible, clear. They tell you what exactly you’re dealing with, bring you up to speed, don’t require you to be an expert in the codebase or a mind reader. I understand exactly what BasicReviewableFlaggedPostSerializer is on my first time seeing it. But they can also be bulky and unwieldy.

Short names are easy to use, easy to scan, pithy and convenient. They use abbreviations and shorthands to get out of your way so you can focus on the logic. pc_auth_token is so much easier to say than premium_customer_http_authentication_token. But short names can also be confusing and opaque.

Consistent names communicate patterns, placing your variable in a familiar larger context. Distinctive names stand apart, safeguarding you from conflating different concepts based on surface-level similarities while avoiding uninformative and generic Foo.bar or “DataHandler” style names.

Balancing these opposing principles is what makes good naming so hard. The amount of knowledge conveyed in a single word is what makes good naming so powerful. The exact balance will depend on the size of the codebase and developer team, the domain complexity, frequency of use and many other factors.

Personally, I favor descriptive and conventional by default, and reserve shorter names for oft-repeated variables and classes.

Conventions to the rescue!

Luckily, people much smarter than us have thought long and hard about this problem, and developed some really excellent guidelines.

Conventions communicate intent, both by form and content. For example, Ruby naming conventions recommend classes be written PascalCase (form) , and preferably as nouns, concrete and thing-y (content). So you can see User or CustomerAccount and recognize them as classes. Ruby methods on the other hand should be snake_case, and preferably unabbreviated verbs (e.g. publish, invite_user, find_all). A method ending in an exclamation mark, like archive!, warns that it modifies data when called. A question mark a la archived?, on the other hand, implies the method will return a boolean true or false. Modules, constants, foreign keys, singular, plurals, cases, all come together to create a unique style, a shorthand that make them recognizable. By providing consistently varying forms, conventions can make names distinct and expressive without adding length.

A foolish consistency is the hobgoblin of little minds

Ralph Waldo Emerson – Not a programmer

Occasionally conventions directly clash with our good naming principles. Rubyists often use k and v for key and value. Certainly short, but only meaningful if you know or intuit the convention. In JavaScript, you’ll often see i, j and subsequent letters as iteration variables. i is not descriptive, and j is somehow even less so. But they at least communicate “here’s an iteration variable, and here’s the second one”; sometimes that’s all you need. If you’re going to use meaningless variable names, at least use the ones everyone else is using.

Know the naming conventions for your language or framework and follow them. Better yet, install a good linter to make adherence easy and automatic.

Make your own conventions

Your code has its own needs. Patterns of logic specific to it. Think about ways to make those immediately evident. Consider namespaces to indicate hierarchies. Use prefixes like user_ and admin_ to group related classes together. Use suffixes like _job or _spec to identify jobs and tests. These organize your codebase, signpost intentions, and are often required to interface with popular libraries.

Be consistent

A good linter can check your syntax and name forms, but you’ll have to watch out for semantic consistency yourself. If you have a function like searchKeyword(needle, haystack), don’t make searchName(haystack, needle) in a different class. (I’m looking at you, PHP.)

Consistency is your friend. For style, consistency beats out clever every time. I couldn’t care less whether you use dd/mm/yyyy or mm/dd/yyyy, just don’t mix them both together. Consistency is predictable, and predictable leads to less buggy code. If I look at your codebase and see fetchValue(), getValue(), and retrieveValue(), I’ll have no idea whether they differ. If they’re the same, name them the same. If they’re different, make sure they’re always different in the exact same way.

Consistency is the last refuge of the unimaginative

Oscar Wilde – Definitely not a programmer

Name molds – A Consistency Tool

One tool you can use in your quest for consistency is name molds. Felienne Hermans has a nice video explaining them, or you can read D. Feitelson’s paper “How Developers Choose Names” if you want the full experience.

The crux of the idea is any given variable could be written any number of ways. Creating a “mold” that dictates a certain naming structure will make your codebase more consistent. Let’s go back to date formats for a second. They can be written all sorts of ways. February 24th, 1997, 02/24/97, 24/02/97 Feb 24, ’97. But say you want to change the day on a date like “01/01/01”. Out of context, it’s unclear if the first “01” is the month or the day. We need an established arbitrary pattern like dd/mm/yyyy to know which is which. A name mold establishes a set pattern for variable names in much the same way.

Let’s take the simple example of a variable representing the minimum permitted length for a message. How should it be written?

min_message_length
minimum_message_len
message_lgth_min
minimum_msg_len
msg_lgth_min

We could come up with over a dozen reasonable valid variable names, even limiting ourselves to the exact same three words. The odds of two developers naming a variable exactly the same is low. The odds of 8 different devs on two different teams doing so is almost non-existent. And the more names, the more mental energy it takes to figure out if they’re actually the same thing or subtly different.

But if we create a simple name mold of adjective/noun/unabreviated_measurement, we can reliably get minimum_message_length, or at least a lot closer to it.

Maybe your project will establish preferred abbreviations (e.g. “len” not “lgth”) or say adjectives go before nouns (active_user, blocked_user). A little time thinking about it will save a lot of time renaming things later.

Acronyms, Initialisms and Abbreviations – Shortening Tools

Abbreviations sacrifice clarity for conciseness. See the “short names” section we covered above and decide whether it’s worth it in your case.

Acronyms are nifty. We’d never get anywhere if we wrote “Application Programing Interface Key” instead of API key. Laser and radar aren’t even acronyms anymore, they’re just straight-up words. But tech has more acronyms than you can shake a stick at. We are occasionally too eager to hack off 90% of the letters off a phrase and pretend that somehow makes it easy to understand.

Shorten wordy names that are frequently referenced, but don’t overdo it. Write a glossary with their definitions and/or put the full meaning in comments above the class.

Further Reading

I know that rules are made to be broken. But if ever you feel stifled by these arbitrary conventions, remember this. Which side of the road you drive on is an arbitrary rule too. arbitrary != unimportant

If this hasn’t been enough for you, check out Chapter 2 of Clean Code “Meaningful Names”.

What is Name Complexity?

We’ve talked about good names, now let’s talk about… not bad names, per se. Just difficult cases. Systems where names get complicated. I’ll term this “Name Complexity”.

Name complexity is when your codebase has 28 distinct acronyms and, if asked, no one is sure what 13 of them actually stand for. It’s jargon, frequently changing terminology, naming collisions, and competing naming standards between frontend and backend. Semantic Drift. It’s when poorly understood classes gradually outgrow their original purpose.

It’s when a core entity is known by different names to different teams. “This is the AssociatedGroup model, but in the database it’s usr_team_id_no. And actually, the client company calls them ‘Franchise Partners’. Oh, except for their marketing team, who rebranded it in the last sales push from ‘Posses’ to ‘Business Cadres’ or something.”

It’s when two completely unrelated entities share an exact name or attribute name. “There’s a project_status database table and a ProjectStatus react component but they’re not the same project and definitely not the same status.”

You probably understand what I’m talking about.

Risk factors

Name complexity builds up over time, like any other sort of technical debt. Sometimes it’s a gradual drift over time as code accumulates features and unforeseen uses. Sometimes it’s a sharp break in a rebranding or company. And like any other form of debt it scales with codebase size, company size, and business domain complexity. A new 3-developer project is very different from a 10-year-old health system that needs to refresh their ontology with updated disease names.

What’s the big deal?

Well, (you might protest) things change, stuff gets renamed, but it’s still basically the same thing. “A rose by any other name” and all that. What’s the problem?

The problem is increased cognitive overhead, developer time wasted deciphering outdated terminology, burnout and buggy code. That last one, buggy code, is especially bad. A common source of bugs is when what you think should happen is badly mismatched with what will happen. Deceitful names are dangerous.

Once I wrote a memorable bug by calling deleteResource() and assuming it would delete the resource. Silly me! I spent the afternoon hunting all over the codebase for the logic flagging a resource as deleted. I naively assumed that logic would live in deleteResource(). No? Well maybe sqlSetResourceDeleted()? Huh. sqlCoreDeleted()? Nada… Ah… there it is, right in prepResourceOperation(). Of course, why didn’t I think of that!?

Remember when I said bad names could be outright lies? A badly-named deleteResource() function will lie right to your face and prepResourceOperation() will stand there not saying a single word.

Coding with that sort of mental overhead and cognitive juggling is tantamount to playing Simon Says on Opposite Day while patting your head and rubbing your stomach. You might eventually succeed, but it’s needlessly difficult and way more error-prone. It makes it harder for new devs to contribute to the codebase, and reduces the likelihood anyone will be brave enough to refactor that old overgrown module gathering moss.

Putting it into practice. Let’s do a Name Audit.

Ok, so we’ve talked a lot, let’s try it out. I’m going to dive into the opensource rails forum app Discourse for the first time, analyze its name complexity, look at its conventions, and get a sense of the naming patterns I should follow.

If you want to analyze a non-Rails app, you can follow along at the database level with something like this: List all tables in PSQL.



-- PostgreSQL
SELECT distinct column_name
FROM information_schema.columns
WHERE table_schema = 'public'
ORDER BY column_name ASC;

First, let’s get a list of all our models.


# In Rails console
Rails.application.eager_load!

# Can use ApplicationRecord.descendants on Rails 5 and up
> ActiveRecord::Base.descendants.collect(&:name)
=>
["SiteSetting",
"User",
"DeletedChatUser",
"PushSubscription",
"UserChatChannelMembership",
"ChatChannel",
"DirectMessageChannel",
"CategoryChannel",
"ChatChannelArchive",
...
]

> ActiveRecord::Base.descendants.collect(&:name).count
=> 202

202 models. A nice round number. Let’s sort so we can dig into “Chat” a bit more.

> ActiveRecord::Base.descendants.collect(&:name).sort
[...
"CategoryTagStat",
"CategoryUser",
"ChatChannel",
"ChatChannelArchive",
"ChatDraft",
"ChatMention",
"ChatMessage",
"ChatMessageReaction",
"ChatMessageRevision",
"ChatUpload",
"ChatWebhookEvent",
"ChildTheme",
"ColorScheme",
...
]

Looks like we have good naming patterns here. Lots of nouns, seems to be 9 Chat-related models at first alphabetical glance. Let’s take a second glance.

> ActiveRecord::Base.descendants.collect(&:name).grep(/chat/i)
=> ["DeletedChatUser",
"UserChatChannelMembership",
"ChatChannel",
"ChatChannelArchive",
"ChatDraft",
"ChatMessage",
"ChatMessageReaction",
"ChatMessageRevision",
"ChatMention",
"ChatUpload",
"ChatWebhookEvent",
"IncomingChatWebhook",
"ReviewableChatMessage"]
> ActiveRecord::Base.descendants.collect(&:name).grep(/chat/i).count
=> 13

So there are actually 13 models with “chat” in them, something our simple alphabetical sort wouldn’t see. More digging shows 18 models with “Post” in the name, 21 with “Topic”, and a whopping 47 models with “User”.

They seem to follow the common AdjectiveNoun name mold, if the ReviewableFlaggedPost ReviewableQueuedPost and ReviewableUser are anything to go by.

Let’s see if there are any acronyms.

# Check for Acronyms in model names
ActiveRecord::Base.descendants.collect(&:name).grep(/([A-Z]){2}/)
=> ["HABTM_WebHooks",
"HABTM_WebHooks",
"HABTM_WebHookEventTypes",
"HABTM_Groups",
"HABTM_Categories",
"HABTM_Tags",
"HABTM_WebHooks"]

Only one! That’s not too bad. If I don’t know what HABTM means, a quick google search reveals has_and_belongs_to_many, an associative relationship. Excellent.

Looking around I see a couple abbreviations that I don’t immediately understand, like AllowedPmUser, but nothing I couldn’t learn quickly with minimal effort.

Hang on… What’s “Stat” mean in UserStat? Status? No… There’s already a UserStatus Statistics, maybe?

> ActiveRecord::Base.descendants.collect(&:name).grep(/stat/i)
  => ["CategoryTagStat",
  "PostStat",
  "UserStat",
  "UserStatus",
  "MiniScheduler::Stat"]

Ah, yes. The code confirms “Stat” is short for “Statistics”.

Let’s look at attributes or database table names. At a total of 1919, there are too many to view here, but let’s sort and filter out duplicates.

# Count all attribute names
> ActiveRecord::Base.descendants.collect(&:attribute_names).flatten.count
=> 1919

# Count distinct attribute names
> ActiveRecord::Base.descendants.collect(&:attribute_names).flatten.uniq.count
=> 758

#  Get distinct attribute names
ActiveRecord::Base.descendants.collect(&:attribute_names).flatten.uniq.sort
=> ["about_url",
"access_control_post_id",
"acting_user_id",
"action",
"action_code",
"action_type",
"active",
"admin",
"admin_only",
"agreed_at",
"agreed_by_id",
"all_score",
"all_topics_wiki",
"allow_badges",
"allow_channel_wide_mentions",
"allow_global_tags",
...

Now we have an easy reference we can use to check for terminology, in-use abbreviations, or name molds! Whenever we’re adding a table or new field name, we can do a quick scan and stay consistent.

# Get attributes with digits
ActiveRecord::Base.descendants.collect(&:attribute_names).flatten.uniq.sort.grep(/d/)
=> ["content_sha1",
"day_0_end_time",
"day_0_start_time",
"day_1_end_time",
"day_1_start_time",
"day_2_end_time",
"day_2_start_time",
"day_3_end_time",
"day_3_start_time",
"day_4_end_time",
"day_4_start_time",
"day_5_end_time",
"day_5_start_time",
"day_6_end_time",
"day_6_start_time",
"featured_user1_id",
"featured_user2_id",
"featured_user3_id",
"featured_user4_id",
"include_tl0_in_digests",
"original_sha1",
"sha1"]

There’s a little confusion on whether day_1 or user1 is better, but all in all, this project does a fantastic job with consistency. From what we’ve seen in our brief foray, it does an excellent job of managing name complexity.

Lessons learned

What does this teach us?

A project with good naming conventions can be surprisingly easy to navigate and explore. Methods say what they mean, classes are logically named and grouped.

Try looking at your project or codebase with the above tools and see how many unexplained acronyms, inconsistent numbering systems, or poorly named domains you can find. Imagine explaining everything to a newly onboarded developer and see which ones are hard to justify. Remember the better your names, the easier life gets for everyone.

Even a brand new codebase can seem familiar and understandable if it follows naming standards, framework conventions, and signposts intent with excellent variable names. Naming might be hard, but it pays off.

Loved the article? Hated it? Didn’t even read it?

We’d love to hear from you.

Reach Out

Comments (5)

  1. Bravo, this is the hardest thing in software, and few will ever recognize that! Well written.

    Do note, one of the things we can do is avoid worrying about it while writing code. Leave ‘naming’ to the ‘editing’ phase thereby separating cognitive load a bit. Also, while coding, things change a lot, so no sense in undue hardship. Make it work, then clean it up.

    1. Been there, done that and became very very careful about it. The clean-up could easily introduce new bugs. For example replacing all occurrences of “xyz” with “xxx” might have unintended side effects (for example “xyzabc” might become “xxxabc”). You would need to do thorough regression tests as well. Also, your manager could well say “it works and it’s urgent so skip the clean-up for now”

  2. We should think about a naming convention right from the beginning, and we should stick to it. No excuses. Easy to understand and easy to follow for every team member. Countless projects tried to fix things later. This never worked in practice. Who is counting all the projects that didn’t do it properly right from the start? And never ever fixed these “tiny” things?

  3. I have a theory about the origin of variables named i, j, etc. for iteration variables. In Fortran, by default, any variable that started with those letters (from i to n) denoted an integer. Back in the day, we all used i as the iteration variable in a DO loop, just because it was the first of that set. If you needed another one, then j is the logical choice. And so on…

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

More Insights

View All