DEV Community

How 4 lines of Java code end up in 518772 lines in production.

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 on August 24, 2020

A few weeks ago I had the opportunity to give a presentation for the Dutch Java Conference JSpring. The talk was about Java dependency management. ...

Read full post

Michiel Hendriks • Aug 25 '20 • Edited

Problem with many of those Spring Boot Starter is that they include a lot of dependencies you might not even use. It gets even worse when you also use an alternative start package. For example spring-boot-starter-jetty next to your spring-boot-starter-web. If you don't exclude spring-boot-starter-tomcat from spring-boot-starter-web you have even more dependency cruft.

The way Spring organized the Spring Boot packages you basically have two huge packages which have a lot of optional dependencies which get auto-magically activated when you have an other dependency included (this is all in spring-boot-autoconfigure). Those starter packages don't contain any code. If spring-boot-autoconfigure was split up in small packages for each supported library, then it would end result would be much smaller.

Another popular obese library is Google's Guava. As a library it's poorly designed, and also quite a dependency hell with their really active removal of deprecated features. The library itself is a massive 2.7MiB. In most cases people only use a really small subset of this library.

David Dal Busco • Aug 24 '20

Your article made me think "npm, maven, same same 😅". Interesting to notice that regardless if front- or backend, somehow and to some extension, the same problematic might be faced.

Thank you for sharing the experiment.

P.S.: The Url of Snyk in "Scanning your application using Snyk..." is not valid, you might want to correct it, just in case 😉.

Jing Xue • Aug 24 '20

Your conclusion is perfectly valid, but I'm not sure your experiment really illustrates it.

If you statically decompile all the dependencies, you end up with essentially the entire codebase of all of them, but I'll bet only a small portion of it would actually be loaded into the memory and a fraction of that would be actually executed. And that is true not only for your placeholder app, but for most typical applications today as well, because libraries tend to declare all the optional or feature specific dependencies. From a library developer's point of view, that is a sensible approach, because they would rather a user waste some disk space than they run into errors out of box due to missing dependencies.

Runtime coverage would probably give you a more accurate picture of how much of the dependencies is actually relevant.

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 25 '20 • Edited

Hi Jing,

I see what you mean and I understand your point of view. When using a large framework like Spring-Boot do you actually know what is in memory? Many things are available by default and changing a single property or adding a single line of code might a domino effect. Not even mentioning the use of reflection in Spring.

In addition, although a class may not yet be loaded in memory, the class is available and can be loaded on demand. As a developer, you should be aware of this as you are responsible for the complete binary.

Small example: (and this use case happened before) if a vulnerability is found over time that let me inject SpEL (e.g. by adding a specific malicious header) to an existing endpoint. All the classes are available to me.
The same holds for deserialization / unmarshalling problems and potential code injection.
Vulns are found over time, unfortunately (but hopefully will not).

In this example you are right. Nothing much happens and no vulns are yet found. However, adding a bit of code and / or another library might already change this.
I also can go into dependency hell and the possible collisions that can come up in underlying libraries that need a specific version but I believe you get my point.

Thanks for reading the article and thanks for your comment, I honestly appreciate it.

Dave • Aug 25 '20 • Edited

So, your analysis tells you the contents of the fat jar... it's called a fat jar for a reason (and there's things we can do to minimise it's size - including but not limited to the use of modules & jlink, or graalvm etc).

The fact that we can optimise the fat jar for file size, should tell you that not all of the fat jar is loaded into memory, but does point out one concern with Spring. Spring makes an awful lot of assumptions - it's a very opinionated framework, and unfortunately, there's no way for Spring, at build time, to know what dependencies you will want to use, so it gives you everything.

This "feature" of Spring is both what makes it so popular, and so risky. A Junior will love Spring for the convenience, but Seniors and above are often wary of it - not because of the jar file size, but because of the potential attack profile (did someone set an environment variable in Prod exposing too many metrics to the outside world?)

My question: why do you care about the size of the jar? I mean, sure, if you're deploying in any one of the cloud platforms, scalability time is impacted by network transfer time. But why, in this demo, did you care? What point were you trying to make, when anyone experienced with Spring knows that the Jar files are huge by default?

Would it be more productive, in that demo session, to respond with "hey, jar files are huge, but here's one way we can minimise their size..."

I can hear the Spring devs in their office now: "Some guy just proved to the internet that we remove boilerplate and let him just write the business code he needs! Awesome."

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 25 '20

Spring is not the issue, I honestly love spring-boot!

The issue is knowing what you are using and why
Being aware that by blindly importing dependencies, there is a risk.
You are responsible for not only the code you write, both from a security point of view and maintenance.

I'm sorry if that was not clear to you from the article.

Dave • Aug 25 '20

I'm honestly going off Spring, and have been for the last few years - mostly because of the opinionation issue.

The security aspect can be somewhat mitigated by having your own dependency repository hosted internally, with appropriate security controls, but the upgrade path (and trusting 3rd party developers both in terms of security and bugs) is troublesome no-matter what.

That said, blind dependency importing & monitoring changes isn't unique to Spring, hell, with Maven I can write custom plugins to jump into the build phases & inject whatever code I want.

Ultimately, measuring the fat jar size is probably not the best way to illustrate the number of dependencies - many of them will be in the jar, but never executed because they're not referenced from any other code - they're just floatsam & jetsam.

Maybe a better way would be to spin up something like SonarQube with an appropriate rule set (not the default), write 99.999% test coverage, and then look at the SonarQube report to see if it flags up issues - since one thing it does do, is an OWASP scan).

Sergiy Yevtushenko • Aug 24 '20

Spring should not be used not just for such a simple app, but in general.

Brian Vermeer 🧑🏼‍🎓🧑🏼‍💻 • Aug 24 '20

I think you are missing the point here.
This is merely an example to illustrate awareness.

Sergiy Yevtushenko • Aug 24 '20

Well, I very well realize how much code we're adding with dependencies. My comment just expands "using Spring Boot for a hello world application is overkill" sentence from your post.

Jan Wedel • Aug 30 '20

This is an interesting experiment you did there and yes it shows some downsides. We also use something similar to Snyk to manage our dependencies. This actually an important point also for JS/npm or other tools.

But it’s a trade off as always. You could mentioned a couple of the positive aspects as well:

Reduced boiler plate code
Reduced risk of security issues in boiler plate code. I saw people configuring servers and HTTP clients wrong so often.
Improved development speed
A production ready application with HTTP server, monitoring endpoints. If you create all that code by hand in Java or any other language, you’ll create a buggy and vastly insecure application for sure.
The fact that we use code developed by others is actually also a pro itself because it’s open source, developed by people collectively much smarter then we are and tested by millions in production.

That said, I can sleep very well with the 518772 lines 😉

John Mercier • Aug 26 '20

This is an awesome idea! Thanks for sharing. I agree that the example is overkill but I also believe it points to a problem with modularity. It would be interesting to know if the spring dependencies are bloated or if it is tomcat. Could the modules be broken down further so they are not so big for trivial applications like this?

I think addressing this issue is one goal for Java Modules and building applications with jlink.

Some may not see the jar size as a big deal but as rates of deployments increase I can see it being a problem. I imagine deploying 10 times a day is not big deal but deploying millions of times a day is.