DEV Community

Cover image for Why Big Classes Get Bigger: Understanding Preferential Attachment in Your Code
Austin Vance for Focused

Posted on • Originally published at x.com

Why Big Classes Get Bigger: Understanding Preferential Attachment in Your Code

The Law of Preferential attachment: Why your User class keeps growing

Software often models the natural world. One pattern I have been thinking about is why our biggest components continue to grow. In natural systems there is a phenomenon known as preferential attachment, which describes how new elements in a system tend to collect at the most connected nodes in the network. Zoologists observed this phenomenon in taxonomy where the largest class, order, family, etc collect newly named organisms.

The pull of the common class

This is exactly what happens in our codebases. Someone once said there are two hard things in computer science: “Naming things, cache-invalidation, and off by one errors.” I love the joke, but it hurts every time. Poorly named components create a gravitational pull forcing the next developer between a rock and a hard place: either comb through the component to understand what its purpose is and then refactor the name to be more specific, or take the path of least resistance and add new functionality further expanding the responsibility of the component. This decision creates a feedback loop attracting more and more “like” functionality.

Consider a component named UserHandler. The original intent could be to manage a user's authentication, but because of the vague name, the component could also logically handle anything user related. The UserHandler soon contains preferences, notifications, and social connections along with the original intention, authentication. Each addition makes new complexity even easier to add, moving it closer and closer to the “god class”.

Yes there’s math

There’s actually some math to back this up. Preferential attachment follows a power law distribution where the probability of new functionality being added to a component is proportional to the existing functionality of the component. We can express this mathematically if F(c) represents the functionality in component c, then the probability P of adding complexity to that component is:

P(c)=F(c)αiF(i)α P(c) = \frac{F(c)^\alpha}{\sum_{i} F(i)^\alpha}

Where α\alpha represents the strength of the preferential attachment.

Imagine a system where we have three components

* A - 100 lines of code
* B - 200 lines of code
* C - 1000 lines of code
Enter fullscreen mode Exit fullscreen mode

And the strength of preferential attachment is 2 (which is a pretty common value in the power law) then we end up with

P(A)=0.006=1002(100+200+1000)2 P(A) = 0.006 = \frac{100^2}{(100+200+1000)^2}

Vs

P(C)=0.591=10002(100+200+1000)2 P(C) = 0.591 = \frac{1000^2}{(100+200+1000)^2}

The probability of adding to C is not 10x but 100x more likely. Which makes it more clear why some classes continue to grow while most remain small.

There is no strength in numbers

If it is not painfully obvious, the larger the component the more impact that that component has on the efficiency of future development. Large components become immovable objects in architectures shaping the systems evolution. This shape has a significant cost on the ability of a system to evolve.

Consider testing. As a component attracts more and more functionality its dependencies grow as well creating more and more potential interactions that must be validated every time the component changes. Changing one part of a system will eventually cascade to the “God Component” and then the changes to that component will require testing and updates to other seemingly unrelated parts of the code.

The answer to preferential attachment is more simply said than done, just name things better. It’s always easier to bring components together than it is to untangle them and refactor them apart.

Using the UserHandler example above

interface UserHandler {
    authenticate(credentials: Credentials): User;
    validateSession(token: string): boolean;
    revokeAccess(userId: string): boolean;
    authenticate(credentials: Credentials): User;
    validateSession(token: string): boolean;
    revokeAccess(userId: string): boolean;
    fetchPreferences(userId: string): UserPreferences;
    updatePreferences(userId: string, preferences: UserPreferences): UserPreferences;
    getDefaultPreferences(): UserPreferences;
    getNotificationSettings(userId: string): NotificationSettings;
    updateNotificationSettings(userId: string, settings: NotificationSettings): NotificationSettings;
    addSocial(userId: string, connectionId: string): string;
    removeSocial(userId: string, connectionId: string): string;
    listSocial(userId: string): string[];
}
Enter fullscreen mode Exit fullscreen mode

This component clearly does a lot and if each method is implemented it could be hundreds of lines long with dependencies on databases, caches, and other whole components

The only constant is change

There are four clear concerns here that each could have their own class or component, creating more clear naming and a more clear place for new functionality to go

interface AuthenticationService {
  authenticate(credentials: Credentials): User;
  validateSession(token: string): boolean;
  revokeAccess(userId: string): boolean;
}

interface PreferenceService {
  fetch(userId: string): UserPreferences;
  update(userId: string, preferences: UserPreferences): UserPreferences;
  getDefaults(): UserPreferences;
}

interface NotificationService {
  getSettings(userId: string): NotificationSettings;
  updateSettings(userId: string, settings: NotificationSettings): NotificationSettings;
}

interface SocialService {
  addSocial(userId: string, connectionId: string): string;
  removeSocial(userId: string, connectionId: string): string;
  listSocial(userId: string): string[];
}
Enter fullscreen mode Exit fullscreen mode

This form is more clear and gives clear indication of the behavior for each component and makes it easy to test and easy for developers to grok the context.

Build for change

Our job as developers is to build code that enables change in a system and by creating clear boundaries with naming we avoid the pull of preferential attachment making new components easier to build an new functionality easier to change old.

Just as natural systems tend toward entropy, software systems tend toward complexity, and the science supports it. Our role is to manage that complexity by introducing patterns and constraints ensuring sustainable and predictable growth of the system over time.


Shameless plug
If you've got a "god class" or "god service" and need help decomposing and untangling your largest components or systems drop me a note or check out focused.io. We love working with legacy code and helping it welcome the future!

Top comments (0)