суббота, 27 марта 2021 г.

Exploring a large pile of someone else's code

I am occasionally approached by students who are currently learning to program and are sometimes taken aback by assignments that look like "here is a mostly working app - go fix this and that to make it actually work". What they are looking for is some simple and reliable way to approach that huge pile of code that someone else has written and make sense of it, so that eventually it is possible to address the problems they are asked to fix.

Funny enough this not a student only problem - any working professional faces it whenever they join a new project that has already been running for some time without them. So for the sake of everyone who runs into this here are several things that I like to bear in mind whenever I run into a new code base.

1. Entry points

Any program (literally) has entry points - a limited number of code sections which connect the application to the external world. A CLI utility will normally have just one - there would be something akin to the main procedure that gets executed, whenever a user runs the utility. A web backend application will have a number of HTTP endpoints/routes which get fired whenever a corresponding request arrives (plus, again, the main function the performs the setup). A frontend application will expose a number of screens/pages available under particular routes. For example, in a React application with a router these would be some larger components mounted at particular routes. Or in the same React application you can treat the root component (who in many cases bears this proud an meaningless name App) as another such an entry point. All of these are the points in the code which get invoked by some external actions and start the journey of data and control flow through the code.

For any kind of a program you will be able to locate a number of such "entrances" and these serve as a good place to start exploration of the codebase due to a couple reasons. First, they are usually limited in quantity, which helps to concentrate. Second, and more important, when you start there, you have the benefit of understanding what happens to one side of the entry point - there is either a user or another system that performs some actions against the program being explored and thus 'calls' these entry points. This gives your exploration a better sense of direction - from an entry point you can only go deeper. Plus, it allows you to experiment - you can be that external thing - run the program a couple times or throw a couple requests at it and see how it behaves.

2. Database / persistent storage

Many applications (here I can't use that nice term "all") utilize some kind of storage - maybe they save some stuff to files or manipulate records in a database. In any case, these may aide your exploration efforts. Here you get a chance to understand what kind of data the program manipulates, how it choses to represent and store it. You may also experiment: throw a request at a web app and see how it affects its database.

The goal here is to grasp the program's data model and see how it is utilized - that illuminates a lot about the program itself. You would want to see which classes represent which kinds of data here or which functions create or read which records, and who calls them. Manipulating data is what most of the code is about, so understanding what data is at hand inevitably helps understanding the code.

3. Find yourself a simple challenge

When it comes to action, find yourself a very simple task that actually involves changing some code and do it. Maybe the original set of problems that you came here to solve includes something simple. If it doesn't - just invent it. When you're dealing with a frontend app such simple task can be tweaking the order of controls or changing the colors of some UI elements. If you work with an HTTP backend, try to add an optional filter to a GET request handler or to prevent updates for records based on any kind of criteria, which you can control.

The common place here is to keep this first task trivial - without even the goal of doing something meaningful. That's a good idea because at this stage you're still focused on getting a feeling of the code, learning to navigate it and validating that your changes actually have some effect. Once you're done with that you'll have some stable grounds under your feet: the code will feel more familiar, you'll develop some understanding of what's available to you and feel more confident about solving more complicated issues.

4. Breadth, then depth

Here is another thing that I keep returning to again and again. When facing something large you'd normally want to study what is it on higher level first, without going into details. If we speak of the entry points, identify all of them before trying to make sense of the implementation details of any particular one. When you're exploring the data model list all the tables / collections and how they are related to each other before studying in depth what and how each of them stores.

This is useful because having a broad understanding of what the whole thing deals with will help to have better ideas about the implementation details when you dive into them - having some context is always valuable. On top of that, many bits of code will work with several higher level concepts at once, so it's useful to have a rough understanding of what these are in the first place. If you get and overview of the landscape first, exploring the details will feel more like connecting the dots, rather than just wandering in the dark - don't get yourself lost too quickly.

5. Use tests

If you're lucky enough, the project that you're coming to will have some tests, which would serve as a superweapon to study it. Tests encode developer's assumptions about the code, so just reading them carefully may shed light on a lot of things. Moreover, you can change them and see what happens. And if that's not enough, you can actually play with the code itself trying to break specific things - tests will show you whether your assumptions about what breaks what are correct.

If there are no tests your position is weaker, but there is still a wat to go - try rolling up a couple unit tests for some bits of logic on your own. It turns out, testing someone else's code is the second best way to understand it, so don't ignore that. Just don't take that too far - your goal here is not to ensure 100% test coverage, but rather to make a couple assumptions about the code that you're studying and proof them right.

6. Refactor

I said testing is the second best way to make sense of unfamiliar bits of code. The best one is refactoring. In practical terms that means taking that part of code that you're interested in - a function or a class - and trying to rework it in any way that you find plausible with the main goal of making every step of it clearer to you. Don't get me wrong here - I am far from stating that it is easy to make code better when you barely understand what's going on there. The point is to try to rewrite something and thus understand it better and then just discard all of your changes. If you already have tests to support you, the effect will be tremendous - essentially you will reimplement the sections of code, which you intended to understand, and it is quite hard (although possible) to do that without developing a degree of understanding on the way. (If you don't have tests, you know what to do first).

7. Write it down

The task of fixing a couple bugs or implementing a feature in an unfamiliar repository is akin to the quest of entering a labyrinth in hope to find a couple gems in there, when you know what the gems should look like but have zero idea about the place itself. If the maze is fancy enough, finding all the gems will involve a lot of going back and forth and getting lost sometimes. To make it easier make yourself a map while you're traveling. Write down what you're after, fix your assumptions, chart the course, pin down whichever discoveries you make on your way, note what doesn't work and why.

This may sound like a lot of extra work and a major distraction, but it is not as time consuming as it sounds, while the benefits are grand. One reason, why you want to do that is it helps to get back on track when you feel lost of trying several things that don't work. Another is that taking time to make notes will help you avoid the rush of just trying to hack a couple things together. When you're familiar with the codebase doing things quickly can be fine - your deep understanding helps you make the right decisions about what to do where fast. However, while you're only making yourself comfortable with the program a more thoughtful, slow but steady approach will be more apt and will also equip you with deeper knowledge for whatever comes next.

8. Be brave and humble

Above all, don't be afraid to start and start small. Don't take a big challenge at once - break it down into smaller, easier to approach problems and handle them one by one. As long as you are taking steps, you will eventually arrive at some place. And if you take small and carefully planned steps, you will get there sooner and will see that it's the right place.