Evaluating Open Source Packages for Beginners

When searching for an open source solution, we often find multiple solutions. How do we decide to select one over another?

As developers, our time is precious, and fiddling around with an unknown package may have ramifications later if we decide to use it. This can be particularly troublesome if that package does not work as expected or has other problems.

When examining a package, I use specific criteria that the package should include.

This discussion applies only to projects with all source code supplied. A project may or may not have further dependencies.

These guidelines apply to any programming language. We will concentrate on Python as the primary implementation language.

Before You Start Searching

Spec out your target requirements. This may or may not include formal specifications. I usually have a general idea of how I want to use this software. In particular, how flexible must I regard my requirements?

The more requirements you include, the fewer packages meet the requirements.

Two Parts: Immediate and Fine Tuning

This discussion has two sections

  • Immediate impressions. When looking at packages, how do I quickly determine if I should spend time for more detail?
  • A deeper insight. Let’s get our hands a bit dirtier.

Immediate Impressions

Do I know what I want?

You’ve found a title or short blurb about a package. Get an overall impression.

What goals do you want from a package? What major points do you need? How flexible are each of the goals?

What does this package do for me?

Look for a summary at the top of the documentation that answers the question: “What will this software do for me?” If I can’t easily answer that question, I usually move on to the next package.

That package summary clarifies the authors intent:

  • A reason that this package exists
  • A problem solution. Without something stating the reason the software exists, the struggle to evaluate the software may just not be worth the trouble.
  • How this project solved a problem of the authors

I’ve looked at some packages that did not state what the package does. What problem it addresses? Why I should use it. I assume this package was written by a hobbyist and I just move on.

How old is this project? Has it been updated recently?

The age of a project indicates the support you may need.

But, just because a project is old and not updated for several years does not mean a project cannot be useful. The project may be stable with no recent bugs.

An old project may also be abandoned. This implies little if any maintenance.

Versions

The project should state what versions of Python were targeted with the current release.

If your product version cannot use the that package’s version, you may create unnecessary problems for yourself.

For example, if the version uses Python 2.X and your development system is Python 3.7, you may require conversion.

If you convert that package to 3.7, I encourage you to update the package in github/gitlab.

API Versioning

Following the rule of major/minor versions, do minor versions break the API? For instance, if version 2.3.1 uses history.track(data, filter) as an API call and 2.3.2 has an additional parameter such as history.track(data, filter, except), then the newer code has broken the API because older code is not likely to run with the latest version. Ask yourself if you can live with this. Of particular concern ask if the authors will continue to break the API in future changes?

Popularity: How many downloads?

While the number of downloads does not indicate how many people are actually using the project, it does indicate the level of interest in the project. And certainly the greater the number of downloads, the greater the number of people actually using the project.

Forking numbers also indicate community interest in possibly modifying or customizing the code to their own needs. Developers may be interested in bug fixes as well.

Examples, Please!

I always look for runnable examples. This indicates to me that the package developers actually tried their code on something more than just unit tests.

Lack of runnable examples raises a red flag. Does this actually work or is this just abandoned or still in development?

  • Do examples exist? Runnable examples?
  • Do they even run properly?
  • How configurable are the examples? Configuration may or may not be useful.
  • How much of the package do the examples demonstrate? Coverage utilities can answer this.
  • Does an example exist similar to your needs?
  • Are they explained well enough so that you can hack on them and tweak them in “interesting” ways? I frequently explore the code by modifying and extending examples.

Runnable examples provide fertile ground for hacking. By running an example close to what I want, I am nearer to hacking that code and getting something I can use.

Consider checking in your hacked example with appropriate documentation so that others may use it.

Additionally, others, including the author, may comment on your changes that may reveal misinterpretations on your part. This benefits both you and your community.

When I encounter a package with no runnable examples, I am strongly tempted to ignore that package.

Deeper Insights

Now you’ve found a package that may or may not fit your requirements. A deeper examination helps to match your requirements to the package offerings.

Dependencies

That project may require dependencies that need special care. Potentially more difficult would be a dependency requirement that is written in C/C++ while your project is written in Python. Most higher-level languages provide access to C/C++ but this may involve tricky interfacing.

Each dependency requires examination with the same set of guidelines as described in this blog. Some dependencies such as video, audio, graphics, and fonts require a rather large footprint of memory and CPU cycles to effectively use their power. While not necessarily a bad thing, users must be aware.

Some of the older C/C++ particularly may rely upon older versions of Linux libraries whilst your code uses the current library versions. Resolving these conflicts may require more effect than they are worth.

TDB Python dependency management tools

Project Naming – Find web-references?

Suppose you like a particular project and want to search the web for reviews, tutorials, examples, problem resolution, etc. Can you find these easily?

Get something other than go – make it a python package

Take the excellent go language from Google. Searching for go yields an enormous amount of cruft: definitions of `go`, the excellent game of go and the go language. Google and the community have been accustomed to `golang` in searches.

Naming Consistency

Watch for consistent naming. Inconsistent naming appears everywhere. For example, when working with files a common inconsistency appears as `filename` and `file_name`. This can further lead to `outfile_name` or `out_filename` or `out_file_name`.

Inconsistent naming presents a burden on working memory: “How do I remember these different names and do they all refer to the same thing?”

Another common example is the use of `index`, `ndx`, `indices`.

Reviews

How many reviews exist for that project? Are they just fluff reviews to fill the publishers’ pages or do they get down-and-dirty?

Be aware that some site companies pay writers for good reviews. If the review presents only a rosy picture and does not dwell on problems or negative aspects, the review must receive a jaundiced eye.

Documentation

Some packages have sparse to no documentation. Time spent investigating these packages could delay finding a better package for your needs. When, not if, but when, you run into problems and questions, will the documentation at least indicate a possible solution?

It’s an old saying that in the software industry, industrial-strength documentation is poor documentation.

Tests

Do tests even exist? A decent test provides a template for usage. Tests demonstrate the current implementation and usage of the code. A good test beats documentation since a test exercises the code as it currently exists. Documentation, on the other hand, does not necessarily explain current code behavior.

When I begin examining a repository after I deem it a possibility, I compile if necessary, and then run the tests.

A major problem with most popular languages arises if concurrency gets used to implement portions of the code. Testing concurrency is fraught with dangers!

Code Coverage

Along with the tests, the coverage demonstrates the completeness of testing. Missing or low coverage raises a red flag. Most modern high-level languages have code coverage utilities.

Warning: A high percentage of code coverage does not necessarily indicate good tests.

NEED MORE EXPLANATION OF THIS WARNING!!! TODO BUG

Lint

All popular languages have some sort of linting utilites. I rarely see a project released with lint. Generally adding lint to the project provides easy insights into the author’s coding style. And, yes, most lints spew unnecessary messages that can be safely ignored.

Complexity and other Analytics

Other static analytics such as complexity, coupling, size, cohesion, etc. exist for all popular languages. Each language has its own ecosystem. Use them and understand that your own codebase will improve by the use of these utilities.

Change Log

A change log indicates the direction of evolution of a product. As the initial release gets new features, refactoring, bug fixes, the change log documents this evolution.

A change log also indicates the number of different developers on that project. The larger the number of developers, the greater your chances of finding help if you need it. The dates on the change log demonstrates the currency of the project.

Forums

What are others saying about this code?

Use a search engine to find various forums. Then look at the discussions. Notice if people report bugs or request clarification. Do questions get answered in a timely manner? Is the tone of the answers relatively civil or hostile?

Most popular high level languages have dedicated forums for that language. Search for the target package and see what pops up.

General Clean Code Practices

Use your knowledge of clean code to review the code base. Do modules/functions/variables have reasonable names? Functions too long? Functions and modules well commented? Does test documentation demonstrate decent use of the code? Can you understand the tests well enough to use them as a model for your code?

Well written tests can replace documentation. Tests refine the documentation such that if there were no documentation, you could develop your project using only the tests. Tests definitively define the operations of your code. Documentation may become outdated but tests will always speak their truth.

Community

Community support and participation around an open source project indicate interest amongst its users. No matter how perfect a project, someone needs to answer questions and discuss various aspects of that code. Make sure that code will be around or you’ll find yourself burdened with the unknowns.

Licensing

There is a lot to be said about software licensing. I could make an entire blog post about software licensing. I’m not a lawyer and can’t give you legal advice. However, I can tell you what I look for when choosing open-source software. Make sure any open-source software you’re using in a commercial project has a license. You may see a license description in the README, or there may be a LICENSE file in the repository.

Just because code is available on Github doesn’t mean you can use it on your project. You should be aware of the implications of using open-source software. Aaron Kalin gave a great talk at the 8th Light University about software licensing, I highly recommend spending 30 minutes viewing his presentation. It will give you a good idea of what to look for as far as licensing in open-source software and what those licenses mean to you.

Links

A Critique of Software Defect Prediction Models delves into various problems of code coverage and why your code coverage might be suboptimal.

McCabe’s Cyclomatic Complexity and Why We Don’t Use It In nearly every audience we present our analysis and quality management tool to, there is at least one person asking whether we also measure Cyclomatic Complexity. After all, almost every existing software quality tool calculates this metric and so it is very well known. Much to their surprise, our answer is usually no and I want to explain our rationale in this post.

How to Evaluate Open Source Software / Free Software (OSS/FS) Programs. This paper describes a general process for evaluating programs, with specific information on how to evaluate Open Source Software / Free Software (OSS/FS) programs. This process is designed so that you can compare OSS/FS programs side-by-side with proprietary programs and other OSS/FS programs, and determine which one (if any) best meets your needs.

About Cecil McGregor

As a software developer with many years experience, I am offering some of the many insights learned through the school of hard knocks. My passion is writing software! And I like to write superior software. I try to follow current trends and techniques as I apply them to my everyday work.
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.