This is a follow-up to The proper care and feeding of your Gradle build, in which I wrote about how to use the Dependency Analysis Gradle Plugin to help maintain a healthy build for your Android, Java, and Kotlin projects. In this first of a series of posts, we will discuss how that plugin works.1
To begin, we will take a look at bytecode analysis with the ASM library, and how it is essential in the detection of unused dependencies. In future posts, you can expect discussions of source code parsing with ANTLR, introspecting jars for capabilities such as service loading and annotation processing, dependency management with Gradle APIs, and more.
When is a dependency unused?
This is a complex question. Let’s start by inverting it: when is a dependency used? And what even is a dependency? Let’s define it like so:
Dependency. A jar (Java library) or aar (Android library) that is on the compile and/or runtime classpath of your project. It provides a collection of zero or more: .class files, Java resources, Android resources, or Android components (such as Activities, Services, etc). May be external or a separate module in your project.
From here on, I will use “dependency” and “library” interchangeably.
Now that we know what a dependency is, we can express what a used dependency is:
Used dependency. A library required to compile your project, or which your project requires at runtime.
And therefore an unused dependency is one that your project doesn’t need at compile or runtime.
Knowing if a dependency is used for compilation
Not to belabor the point, but this is a complex question. To know if a dependency is used during compilation, we must know two basic things, fundamentally:
- What does the dependency produce? What is in the dependency’s compiled bytecode?
- What does your project consume? What is in your project’s compiled bytecode?
The plugin’s primary focus at this time is analysis of compile-time dependencies. It does do some limited analysis of runtime dependencies (such as noting if a dependency provides Java
ServiceLoader
s), but otherwise elides this domain. As such, you may see incorrect advice (“remove X dependency, it’s unused”) if that dependency is only used at runtime. An example of this is if the dependency is only used via reflection. (This issue highlights one such case.)
Analyzing the producers (libraries)
The DependencyReportTask is what analyzes the bytecode of all dependencies, whether they’ve been directly declared or have been brought in transitively from a directly declared dependency.2 Here we rely on the ASM library3 for some basic bytecode analysis. You can see the analysis in its full glory at asm.kt, but here I will present a simplified view.
private fun analyzeJar(artifact: Artifact): AnalyzedJar {
val zipFile = ZipFile(artifact.file)
val analyzedClasses = zipFile.entries.toList()
.filter { it.name.endsWith(“.class”)
.map { classEntry ->
val visitor = MyClassVisitor()
val reader = zipFile
.getInputStream(classEntry)
.use { ClassReader(it.readBytes()) }
reader.accept(visitor, 0)
visitor // “return”
}
.map { it.analyzeClass() }
return AnalyzedJar(analyzedClasses)
}
The signature tells the story: given an artifact, return an analyzed jar. The Artifact
in this case is a custom data type that wraps a jar file on disk. Starting with that jar, we iterate over the class files it contains, visiting each one with an implementation of ASM's ClassVisitor
; in this case MyClassVisitor
. This visitor produces something called an “analyzed class”, which is a representation of a compiled class file that contains all the things we care about. We then return an AnalyzedJar
, which is a wrapper around a set of AnalyzedClass
es. This visitor implementation looks like
class MyClassVisitor : ClassVisitor(ASM8) {
fun getAnalyzedClass(): AnalyzedClass {
return AnalyzedClass(
className = className,
superClassName = superClassName,
retentionPolicy = retentionPolicy,
isAnnotation = isAnnotation,
hasNoMembers = fieldCount == 0 && methodCount == 0,
access = access,
methods = methods,
innerClasses = innerClasses,
constantClasses = constantClasses
)
}
override fun visit(
version: Int, access: Int,
name: String, signature: String?, superName: String?,
interfaces: Array<out String>?
) {
className = name
superClassName = superName
if (interfaces?.contains("java/lang/annotation/Annotation") == true) {
isAnnotation = true
}
this.access = Access.fromInt(access)
}
}
where the remainder of the implementation is elided for brevity, but may be seen at the link above. You can get a sense for the things that matter from the parameter names, however. And we can also get a sense here for how ASM works. It parses the bytecode so we don’t have to; it visits each node in the class file and extracts the information from it so that we, the toolmakers, can operate on that information at a higher level.
Analyzing the consumer (our project!)
The next step is to understand what our project, the “consumer,” uses at compile time. This is done by ClassListAnalysisTask which takes as its primary input — you guessed it — a list of class files. The output of this task is a set of strings, which are the fully-qualified class names of the class references that appear in our project’s bytecode. This also uses ASM, albeit a different ClassVisitor
implementation because here we have a different need. We’re no longer looking at capabilities, but rather raw strings representing class references that appear in our bytecode. It might help here to understand that a Java class’s compiled bytecode doesn’t contain “import statements” in the way that our Java source does. Instead, every class reference is fully-qualified at the use-site. With that in mind, we simply need to visit every node in the class file — members, method bodies, annotations, type annotations, and the superclass (which may just be ”java/lang/Object”
) — and extract the type references.
I will not include the implementation in this post for the sake of brevity but, as always, you may take a look at the source here.
ABI analysis
Computing a project's ABI (or application binary interface) also involves bytecode analysis. However, in order to keep this discussion focused, I am deferring discussion of ABI computation to a future article.
Tying it together
We know what the producers produce and what the consumers consume. These two inputs are fed into the DependencyMisuseTask (forgive me, I am often colorful in my naming conventions), which iterates over all the produced classes and looks to see if they are used by the consumer. Those that aren’t used are therefore, well, unused. Armed with this information, the plugin can now emit the advice “dependency X is not used by your project, so you should remove it!”
Example console output against an Android project:
> Task :app:aggregateAdvice
Unused dependencies which should be removed:
- implementation("androidx.appcompat:appcompat:1.1.0-rc01")
- implementation("androidx.core:core-ktx:1.0.1")
- implementation("com.google.dagger:dagger-android-support:2.24")
- implementation(project(":entities"))
- implementation("org.jetbrains.kotlin:kotlin-stdlib-jdk8:1.3.72")
Transitively used dependencies that should be declared directly as indicated:
- implementation("org.jetbrains.kotlin:kotlin-stdlib:1.3.72")
- implementation("javax.inject:javax.inject:1")
- implementation("com.google.dagger:dagger:2.24")
Existing dependencies which should be modified to be as indicated:
- api("com.squareup.moshi:moshi:1.8.0") (was implementation)
- api("com.squareup.retrofit2:converter-moshi:2.5.0") (was implementation)
- api(project(":entities")) (was implementation)
Dependencies which could be compile-only:
- compileOnly("androidx.annotation:annotation:1.1.0") (was implementation)
Conclusion
In this post we sketched out how to use ASM to analyze the bytecode of a project and its dependencies, and how that information can be used to determine whether a dependency is used. We included links to source code that solve this problem quite thoroughly.
So that’s it, in a nutshell. Tune in next time for a discussion of source code parsing with ANTLR.
Extra credit: a note on performance and profiling
One of the most complicated parts of the “dependency misuse” algorithm alluded to above is the recursive function I have laconically named “relate” and which is so dense I eventually had to write 26 lines of KDoc for it to remember how it worked. There was a point early on in the plugin’s development when Square was unable to complete an analysis pass because the plugin just spun for hours and hours; inspection of a CPU profile told us that one hotspot was this relate
function.
(Yeah, that's pretty clear!)
I instrumented the algorithm and saw that the same node in the dependency graph was, in the case of even a small project, visited potentially thousands of times (and each time beyond the first was duplicative). I was able to use a very simple caching strategy to skip nodes that had already been analyzed, and managed to reduce the workload by 99%. This very simple fix means that, for most real-world projects (up to and including Square’s gargantua), performance is excellent and should not be a blocker for adoption in your project.
Profiles are powerful things.
Special thanks
Thanks once again to Stéphane Nicolas for providing feedback on an early draft, along with the profiles I discussed above.
Endnotes
1 For more information about the plugin’s capabilities, please see the wiki. up
2 In general, every module in your project is the beneficiary of a graph of dependencies. At the top level are all the dependencies you've declared directly, but below those are the transitive dependencies the top-level (or direct) dependencies depend on; these are often available for direct use, despite not being declared by your project. up
3 In the tradition of fine open source libraries, no one knows what the name means. up
Top comments (1)
Thanks! Any book/article recommendation to preparing or becoming an MDX?