How To Write Cleaner and Maintainable Code


From Simple To Solid

There are many different things that separate great code from a code that “does the job”, from the style guidelines to the maintainability and scalability of the code.

In this post, we will work through a task and iteratively improve the code while asking ourselves some key questions that everyone should ask themselves while writing code.


Task

Write a program that recursively scans a directory.
For every .txt file, open it and check if the word “Monkey” appears in it.
If it does, print the file name and line number, otherwise do nothing.

Simple Solution

The task is pretty straightforward and simple, scanning files and applying a rule to them shouldn’t take too many lines of code.
Here’s the most simple solution I could come up with

import os

def directory_scanner(root_dir, word='Monkey'):
    for root, dirs, files in os.walk(root_dir):
        for file in files:
            if file.endswith('.txt'):
                with open(os.path.join(root, file), 'r') as f:
                    for idx, line in enumerate(f):
                        if word in line:
                            print(f'File: {file}, Line: {idx}')

The above solution is the kind of solution that only “does the job”, it is custom-made to the specific task, and upon changing or adding new requirements, we might have to rewrite it.

Here are a few questions that will guide us to improve this code

  • What if we would want to scan other files than the .txt ones?
  • What if we would want to search for multiple words?
  • What if we would want to add another rule? for example, printing the file name and line index if we found a line that is over 50 words.
  • What if we would want to change the action (printing of file name and line index) if a rule passes? and that every rule will have its own action?

There are probably more questions you can ask yourself, but I think we have enough on our plate at the moment.

All these questions should lead us to write a better code than the one we did, so let’s iterate on it one more time and see how to improve it.

If you are in the mood I encourage you to try and improve the code by yourself and compare it with what I am about to show — you might even do it better.

Not Too Simple, Not That Solid

In this section, we are going to improve our code based on the question asked in the previous section.

Another good point for improvement that we didn’t cover, is that everything is done in a single function.

As a rule of thumb, we would want a function to do exactly one thing, and ours scanned the directory, read files, and applied a rule to a file which is just too many responsibilities.

In this section, instead of splitting the code into a few functions, I will split it into 2 classes, one which will define a rule and the action to make if it passes, and another class that given a list of rules, will scan the directory and apply them on every file.

class FindRule():
    def __init__(self, to_find: str) -> None:
        self.to_find = to_find

    def on_success_action(self, line_index: int, file: str) -> None:
        print(f"File: {file}, Line: {line_index}")

    def apply_rule(self, line_index: int, line: str, file: str) -> None:
        if self.find in line.split():
            self.on_success_action(line, file)

The above class encapsulates all the functionalities needed from a rule, which means that the directory scanner class will just have to accept such object and call the apply_rule function without having to know how the rule is applied and what to do in case it passes. 

The directory scanner class will get a root directory, a file extension filter, and a rule to apply. given this information, we will scan the root directory and apply the rule on every file that passes that file extension filter.

class DirectoryScanner:
    def __init__(self, root_dir: str, rules: List[FindRule] = None,
                 file_filter: str = None):
        self.root_dir = root_dir
        self.rules = rules
        self.file_filter = file_filter

    def scan(self):
        for root, dirs, files in os.walk(self.root_dir):
            for file in files:
                if self.file_filter:
                    if file.endswith(self.file_filter):
                        yield os.path.join(root, file)
                else:
                    yield os.path.join(root, file)

    def apply_rules(self):
        if not self.rules:
            return

        for file in self.scan():
            with open(file, 'r') as file_obj:
                for line_index, line in enumerate(file_obj):
                    for rule in self.rules:
                        rule.apply_rule(line_index, line, file)

In this implementation of the task, we definitely improved the first solution but it is still lacking.

Although we support multiple rules, we only support one type of rule, as mentioned in the previous section, if we would want a different rule to apply we would have to think of a better packing of rules.

Moreover, the DirectoryScanner class has more than a single responsibility, and it shouldn’t be coupled with the rules, as the directory scanning functionality could be useful for other tasks that don’t include the rules.

With the above in mind, I will improve the implementation in the next section, and as in the previous section if you’re in the mood I encourage you to try and improve it on your own.

Solid Solution

In this section, I am going to break the DirectoryScanner from the previous section into two more classes so that each will have a single responsibility.

In addition, I will modify the rule class so that we could potentially support multiple different types of rules.

class DirectoryScanner:
def __init__(self, root_dir: str, file_extensions: List[str] =
None):
self.root_dir = root_dir
self.file_extensions = file_extensions

def scan_directory(self):
for root, dirs, files in os.walk(self.root_dir):
for file in files:
if self.file_extensions:
if file.endswith(tuple(self.file_extensions)):
yield os.path.join(root, file)
else:
yield os.path.join(root, file)

The current implementation of DirectoryScanner is only responsible to scan the directory and yield a path of a file and is totally decoupled from the rules we might want to apply.

Bear in mind that this implementation of DirectoryScanner is also decoupled from the action we might want to perform on a file.
In our case, we want to run through its lines, but it might not always be the case.

In order to run through every file’s lines, I created a LineGenerator class that given a directory scanner object, it will use its scan functionality and yield a line.

class LineGenerator:
def __init__(self, dir_scanner: DirectoryScanner):
self.dir_scanner = dir_scanner

def __call__(self, *args, **kwargs):
for file in self.dir_scanner.scan_directory():
with open(file, 'r') as file_obj:
for line_index, line in enumerate(file_obj):
yield FileLine(line, line_index, file)

As you might have noticed, I am using a FileLine class that I created in order to encapsulate the relevant information I need while generating a line.

class FileLine:
def __init__(self, line: str, line_index: int, file: str):
self._line = line
self._line_index = line_index
self._file = file

@property
def line(self):
return self._line

@property
def line_index(self):
return self._line_index

@property
def file(self):
return self._file

Now, we will implement a class that, given a list of rules and a line generator, will have the responsibility to apply rules on a generated line.

class RulesChecker:
def __init__(self, rules: List[Rule], line_generator:
LineGenerator):
self.rules = rules
self.line_generator = line_generator

def __call__(self, *args, **kwargs):
for file_line in self.line_generator():
for rule in self.rules:
rule.apply_rule(file_line.line_index,
file_line.line, file_line.file)

After breaking down the DirectoryScanner class, let’s work towards accepting different types of rules.

As you might have noticed, in the RulesChecker class I used a class Rule (in bold) that I haven’t defined yet.

This class will be an abstract class, and every type of rule will implement its interface.

from abc import ABC, abstractmethod

class Rule(ABC)

    @abstractmethod
    def apply_rule(self, line_index: int, line: str, file:           str) -> None:
        pass

    @abstractmethod    
    def on_success_action(self, line_index: int, file: str) -> None:
        pass

By inheriting from the above class, a rule will have to implement both abstract methods.

Here are the minor changes we need to make to the FindRule class we defined earlier

class FindRule(Rule):
    def __init__(self, find: str) -> None:
        self.find = find

    def on_success_action(self, line_index: int, file: str) -> None:
        print(f"File: {file}, Line: {line_index}")

    def apply_rule(self, line_index: int, line: str, file:str) -> 
                                                               None:
        if self.find in line.split():
            on_success_action(line_index,line, file)

Now, when needed to add a new type of a rule, we basically create a class that inherits from Rule and implement both abstract methods.

By creating the abstract class Rule we were able to package all types of rules under the same class that RulesChecker will be able to use.


In this article, we started from the most straightforward solution and improved our code iteratively while asking ourselves some important questions.

It might not be the perfect solution to the given task, and you might think of different approaches or things that I did wrong — in that case, please don’t keep it to yourself, I encourage you to put on your code reviewer hat and share your thoughts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: