We Analyzed 60,678 Libraries on Github – Here are the Top 100

 ● 14th Apr 2015

6 min read

What are the top Java libraries used by some of the most popular projects on Github? Based on analyzing 60,678 dependencies

We like backing up everything we say with data, that’s why some people claim we’re not that fun at parties. Obviously, they’re going to the wrong parties. In this post we’ve looked into 60,678 import statements of 11,939 unique Java libraries that are used by the top 5,216 Java projects on Github – and extracted the top 100 to a list. Or how we like to call it, a fun way to spend a rainy weekend.

There’s a tension between new rising technologies and good ol’ tried and tested libraries that we all like to use. The new libraries and frameworks tend to generate more buzz up to a point where it seems everybody are using them and you’re left behind the curve – This is often NOT the case and this post brings the numbers to prove it.

Without Further Ado: The Top 20 Java Libraries

Github Top 20 Java Libraries

The Main Insights From the Top 100 Libraries List

The Unexpected

Hadoop Blows Spark Out of the WaterHadoop comes in at #42 with no mention of Apache Spark in the top 100 list whatsoever. Apache Zookeeper made it to #75, helping maintain Hadoop clusters and keeping the elephants at bay.

Log4j is 2x More Popular Than Logback – We clearly see that Log4j, which is used in 16.76% of the projects we examined, is outrunning Logback that’s used as the logging engine only behind 8.45% of the top projects.

SQL > MongoDB > PostgreSQL – The Java SQL connector came in at #27, MongoDB showed up in #87, and PostgreSQL barely made the list at #97.

ElasticSearch has the Most Justified Buzz Around a Java Library – ElasticSearch, the search server based on Apache Lucene (which made #90 in the list), the E in the ELK stack, and a personal favorite of ours, is the library with the most justified buzz we have on the list.

And… The Usual Suspects

JUnit is the Undisputed King of Java Libraries – With 3,345 entries, 64% of Github’s top Java projects imports are set on JUnit. Followed by spring-test on the Spring front and testng, these are the top 3 Java testing libraries that we saw in the top 20 list.

SLF4J is the Most Popular Logging Library – Whether you’re using Log4j, Logback or any other logging engine, with 1,184 entries over 22% of Github’s top Java are using slf4j has their logging facade.

14 Out of the Top 100 Libraries are Coming From the Spring Framework – The most popular framework among the top 100 libraries (even more than apache-commons which has 12 libraries in the top 100), with spring-context as its most popular library.

Google Guava Rocks the Charts as the #4 Most Popular Java Library – With 815 entries which make 15.6% of Github’s top Java projects. We actually love using Guava here at OverOps as well and recently published a post about some of its useful yet lesser known features.

apache-commons is Really Common Coming in at #5 – With its top representative holding 659 import statements (12.63%) in Github’s top Java projects and 12 of its libraries in the top 100, apache-commons continues to justify its name.

Mockito is the Most Popular Java Mocking Framework – 559 entries (10.72%) show that mocking makes it big in Java, ranking as the 7th most popular library.

Developers Love Using joda-time – This comes as no surprise but it’s interesting to see the joda-time library by Stephen Coulbourne reach the 18th place.

5 More Entries Worth Mentioning

#65 – Bukkit – The only gaming library in the top 100 list, you guessed it right, Minecraft servers.
#66 – Jetty – Because Netty didn’t make it to the list.
#81 – PowerMock – A fresh entry to the top 100 list, states that “it can be used to solve testing problems that are normally considered difficult or even impossible to test”.
#90 – Google Protobuf – A language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.
#100 – AssertJ – Rising in popularity over the last year and also included in the new version of Dropwizard, one of the popular new testing libraries, accepting migrations from FEST Assert.

Top 100 Libraries by Type

Github Top 100 Java Libraries by Type

To get a better sense of the the types of libraries that gather the most attention from the Java community, we’ve plotted the top 100 by type and their number of uses in Github’s most popular Java projects.

These numbers, where do they come from?

Let’s add some context to the stats: For starters we’ve pulled out the top 25,000 Java projects from Github by stars. On the second step we extracted the ones who use either Maven or Ivy for dependency management to gain quick access to their pom.xml / ivy.xml dependencies, this left us with 5,216 projects. Now that we had thousands of xml dependencies on hand, it was time to get a beer. Once we ran our of beer, we crunched out the data and got a total of 60,678 records of libraries in use with 11,939 unique libraries on hand. This means the average Github project in our dataset uses 11.6 external libraries. To make the analysis easier, we’ve processed the stats for the top 100 libraries by the number of Github projects they appear in. And added some classification by the type of library just for the heck of it.

The raw data is available right here and you’re welcome to take a look and make sure we didn’t miss any interesting insights. Although the beer drinking phase was an essential part of this research, the numbers are accurate.

Further Reading

Another interesting analysis comes from apiwave who looked into the top Java apis used by number of classes at each client’s project. The analysis was inspired by a previous post we’ve published in November 2014.

And what about the top tools Java developers use?

We’ve got you covered right here: The Top 15 Tools Java Developers Use After Major Releases

Seeing anything that we missed in the data? Please let us know in the comments section below.

OverOps shows you when and why your code breaks in production. It detects caught and uncaught exceptions, HTTP and log errors, and gives you the code and variable state when they happened. Get actionable information, solve complex bugs in minutes. Installs in 5-min. Built for production.

This post in now in Spanish.

Alex is the Director of Product Marketing at OverOps. As an engineer-turned-marketer, he is passionate about transforming complex topics into simple narratives and using his experience to help software engineering navigate their way through the crowded DevOps landscape.

Troubleshooting Apache Spark Applications with OverOps OverOps’ ability to detect precisely why something broke and to see variable state is invaluable in a distributed compute environment.
Troubleshooting Apache Spark Applications with OverOps

Next Article

The Fastest Way to Why.

Eliminate the detective work of searching logs for the Cause of critical issues. Resolve issues in minutes.
Learn More