Part 3: Functions, loops and conditionals

Functions

Python functions allow us to wrap commonly used pieces of code into reusable methods.

We have already used several built-in functions, including print() and len().

Defining a function

Python functions definitions are indicated by the def keyword. Past the initial def line, all function code is indented by 4 spaces (some editors will convert a tab keystroke to 4 spaces for you). This indentation indicates that this code all belongs to the function.

def get_sum_of_protein_lengths(protein_lengths):
    '''Takes a dictionary of protein lengths and returns the total combined length.
    
    Args:
        protein_lengths (dict): A dictionary mapping gene names to protein lengths.
    Returns:
        int: Combined length of all proteins.
    '''
    lengths = protein_lengths.values()
    summed_lengths = sum(lengths)
    return summed_lengths

Note the extensive description in the above example (surrounded by triple-quotes)!

This is known as a doc-string (documentation string), and is an excellent way to document your code. Any function without a docstring is effectively useless!

Calling (or using) a function

Functions are called using ( ) brackets, with arguments provided within the brackets as required.

protein_length_dict = {'PF3D7_0731500': 1504, 
                       'PF3D7_1133400': 2056,
                       'PF3D7_0707300': 746}

get_sum_of_protein_lengths(protein_length_dict)

4306

Returning a value from a function

Functions can return values (which are usually assigned to a variable), but sometimes they don’t return anything at all!

The returned value is indicated by the return statement within the function definition — the value to be returned is written directly after the return statement.

Even if your function is not returning anything, it is good practice to conclude your function with a return statement. If you are not returning a value, you can just write return or return None.

Note that once a function reaches the return statement, it does not execute any more code — return essentially tells the function that is has finished running, and needs to return a value!

def print_gene_summary(gene_list):
    '''Print a formatted summary of all genes.
    
    Prints number of genes in list and number of unique genes.
    
    Args:
        gene_list (list): A list of genes.
    Returns:
        None
    '''
    print(f"There are {len(gene_list)} genes, with {len(set(gene_list))} unique genes.")
    return 


gene_list = ['PF3D7_0731500', 'PF3D7_1133400', 'PF3D7_0707300', 'PF3D7_0712300', 'PF3D7_0712300']
print_gene_summary(gene_list)

There are a total of 5 genes, with 4 unique genes.

Note that while the above function does something, it doesn’t return anything we can save into a variable. If we try and save the output into a variable:

function_output = print_gene_summary(gene_list)

There are a total of 5 genes, with 4 unique genes.

print(function_output)

None

Importing functions from other packages

It is also possible to import functions from other files or packages. This means we don’t have to define (def) the function within our code, we can just import it and use it. We will discuss imports in more detail later…

Loops and iterators

When doing programming, we often need to perform the same action on a whole set of objects/data/values.

Programming loops enable us to do this efficiently (without writing out almost identical blocks of code many times).

The main loop used in Python is the for loop. (Note that this is a little different to for loops in some other languages)

The for loop iterates through an iteratable object (i.e. a list, tuple, set etc), letting you run code on each item in the iteratable.

For example, let’s iterate over a list of numbers, printing each number in the list:

for index in [1, 2, 3, 4, 5]:
    print(index)

1
2
3
4
5

We could also iterate over the list of gene names we created earlier:

for gene in gene_list:
    print(gene)

PF3D7_0731500
PF3D7_1133400
PF3D7_0707300
PF3D7_0712300
PF3D7_0712300

Note that the loop variable can be named whatever you like, but it is best to choose a descriptive name.

Don’t do this:

for banana in gene_list:
    print(banana)

PF3D7_0731500
PF3D7_1133400
PF3D7_0707300
PF3D7_0712300
PF3D7_0712300

A commonly used pattern is to loop a certain number of times using the range function:

The range(n) function returns a sequence of numbers, starting at 0 by default, and returning a total of n values.

for i in range(10):
    print(f"This is iteration #{i}")

This is iteration #0
This is iteration #1
This is iteration #2
This is iteration #3
This is iteration #4
This is iteration #5
This is iteration #6
This is iteration #7
This is iteration #8
This is iteration #9

Another type of loop is the while loop. This allows us to loop until a condition is met:

i = 10

while i > 1:
    i = i - 1
    print(i)

9
8
7
6
5
4
3
2
1

One danger of using a while loop is the potential to loop forever!

Although while loops can be useful in some circumstances, there are often better options. If you think you need to use a while loop, consider if there is another (better) way of looping. Can you use a for loop instead?

A note about indentation

You may note the particular indentation being used in the above examples. In Python, indentation is used to define a block of code. In the above examples, all indented code after the for statement is part of the for loop. Code that is no longer indented indicates that we have reached the end of the for loop. Note that we can also have multiple levels of indentation. What do you think the following code will do?

for i in [1, 2, 3]:
    for j in [6, 7, 8]:
        print(i * j)

Conditionals

We often want to perform a task only if a given condition is met.

In Python, we do this with an if block (with an optional elif and else blocks):

read_depth = 561

if read_depth < 10:
    print("We have low read coverage")
elif read_depth < 100:
    print("We have acceptable read coverage")
else:
    print("We have high read coverage")

We have high read coverage

read_depth = 49

if read_depth < 10:
    print("We have low read coverage")
elif read_depth < 100:
    print("We have acceptable read coverage")
else:
    print("We have high read coverage")

We have acceptable read coverage

Combined with for loops, if blocks allow us to create complex rules for processing data.

However, there are often better ways of filtering and processing data using dedicated packages such as numpy or pandas — more on this later!

Next Lesson: Imports and Packages →

← Previous Lesson: Variables and data types