Dan's Musings

Which Programming Language Should I Use? Part I

The question is ubiquitous, relevant, and gets highly opinionated answers.

Usually, the answer is "whichever one you know best", but what if you know lots of them?

Here is a guide, based on my opinions and observations, on what differentiates programming languages, in order from most important to least.

The first part of the guide is covered by this post, and covers the most important considerations. Others will be more of a guided tour of the zoo we call programming languages.

Decision Point: Memory (and CPU) Management

The first and most important design element about any language is memory management. To a lesser extent, this is also true of CPU management (e.g., are threads a thing? Is there an event loop involved? Etc.)

How does the language manage memory, CPU, and resources in general?

Before you start screaming "that's not a language thing, that's a platform thing", YES IT IS. For example, Java doesn't have a notion of a destructor built into the language, because it doesn't need one. Memory management approach (and resource approach in general) is the most pervasive design element of a language, the one with the most language-specific far-reaching consequences.

Memory management of the language often determines what the language is used for. It is more important than CPU management, as most programs only ever comprise one execution path. However, all programs must deal with memory. Languages generally fall into one of three memory management categories: programmer/compiler, garbage collected, or reference-counted.

Programmer/compiler languages are well-suited to systems-level projects -- building a browser, database, kernel module, or programming embedded systems. However, it is laborious for the programmer to deal with memory and CPU management. Even if the compiler deals with it, as in Rust, it significantly adds to the complexity of the language. It is more difficult to program in these languages.

Garbage collected languages which also provide cleaner constructs for managing CPU are much easier to use. Many of the most popular languages fall in this camp. Golang is perhaps the best poster child for this category, as that language provides abstractions around both CPU and memory management, supporting green threads by default. The downside of this category is that it is much more difficult to call code which is not written in that language. Golang and Java are largely walled gardens in this way. Calling into C or Rust code is more difficult to do.

To provide a nice middle ground, the reference counted languages were born. In this category we find languages such as Python, Ruby, Perl, and Swift. These are languages which attempt to handle memory in a way that makes it easy to call systems code, most of which has been written in C. They work by counting references to an object. When there are no more references to something, the memory for that thing is automatically reclaimed. This process makes calling into C code much, much easier, as this strategy is much more compatible with the nature of how C (and friends) track memory. to make reference counting thread-safe. Many languages that employ it also employ a mechanism known as a Global Interpreter Lock, which ensures only one thread of execution can run at the same time. Another drawback is that tracking references is computationally expensive. Together, a GIL and the extra time taken to track references introduces considerable overhead. These languages tend to be slow as a result, with the silver lining that since they can call C code, parts of them can be largely sped up by writing them in a systems language.

This slowdown highlights that CPU management is also very important. How to make parallelism possible is usually the question. Different design decisions have different pros and cons. There are two most primary constructs in parallel CPU management. The first is actual, physical OS-level separation of execution paths via threading in the CPU. The second is cooperative scheduling, where different logical execution paths share CPU and yield CPU to other paths when it is convenient.

Most languages allow the programmer to choose between cooperative scheduling and thread-based scheduling. There are notable exceptions. Golang erases the distinction between cooperative execution and threads, letting the compiler and runtime choose which one is most appropriate for the code written. JavaScript is explicitly run on an event loop, whereon cooperative scheduling is embraced. However, JavaScript is single-threaded; it cannot employ OS-level threads. This makes JavaScript really fast for small workloads (millions of tiny JSON blobs for example), but it doesn't have great performance when working on large, batch workloads due to its inability to grab the CPU and hold onto it.

Decision Point: Typed System

Whether or not a language is explicitly typed determines how easy and quickly a progam can be written and expanded in a language.

There is a tension between changing a program and adding to it. Small changes can introduce large effects, so programmers do them carefully and with forethought. By contrast, there is no profit from having no code, so programmers want real speed when starting out.

Type systems hinder the programmer from making changes. Often, design patterns and techniques have to be developed solely for the purpose of working around the type system in order to make changes. However, they also protect the programmer from making incorrect changes. Thus, they help a lot when the program is large and becoming larger, but hinder initial development. Therefore, languages with a type system scale better.

Most code bases being worked on by more than a "pizza-sized" team or that have more than 100k lines of code are mostly code bases written in typed languages. They are often old, because they take a long time to write but also because they last longer.

Popular typed languages include Java, Golang, C, and Rust.

Popular untyped languages include Python, Ruby, and JavaScript.

Some languages try to bridge this gap by introducing gradual typing such as Python's mypy. They provide the benefits of a type system to languages which choose to go without them, but they also introduce the drawbacks of a type system. Therefore, they have mixed success in deployment.

Making a language choice will have the most far-reaching effects on our projects because of the type system employed.

Think Long and Hard

The hardest part about the above considerations is that it is difficult to get away from them while staying in the same language. It's not something you can choose now and then fix later (without splitting off functionality into other codebases). They affect the life of the codebase.

Thinking about how to deal with this reality is an open question in our field. Making the right choice about which language to use is important. Think long and hard.