Home
About
CV
Portfolio
Presentations
Publications
Workshops
USGS Staff Profile

Python Intro Day 2

17 Sep 2021

A blog post for Washington State University’s Python Working Group Blog

This post is the second in a two-part series of introductory python talks for the start of the semester. The majority of this talk is based on Software Capentry’s Python Notice Inflammation lesson.

—

This week Sarah Murphy (sarah.y.murphy@wsu.edu) gave the second in a two-part series of introductory talks. A recording of this talk can be found here. Instructions on how to download the required packages and the data used in this lesson can be found here.

Introduction to Jupyter Notebooks

Jupyter Notebook and Jupyter Lab are two Python coding interfaces. They both take the "notebook" approach, meaning they can combine markdown, or text, with code. They have similar interfaces and can be used in the same way, so I will be referring to the two collectively as Jupyter for the rest of this talk. These come as part of the Anaconda set of tools, meaning if you have Anaconda downloaded, you also have Jupyter.

When you open Jupyter Lab or Notebook from your Anaconda navigator, a file browser interface will open within a web browser. Your default web browser serves as a way to display Jupyter, you do not need an internet connection to use it and nothing is uploaded. To create a new notebook, navigate to where you would like it to reside in your file system, then click "New" or "+". This will bring up a menu asking what type of file you would like to create. Select "Python 3" under the "Notebook" heading. Your new notebook is a .ipynb file, or an iPython Notebook file.

When you open your notebook, you will see an empty space to type. This is referred to as a cell. Cells can be either code cells or markdown cells. Code cells are for just that, code. Markdown cells allow you to write text with the ability to format it using markdown. If you don't know markdown, that's okay, you don't need to. You can switch between the two types of cells using the dropdown menu at the top of the notebook.

In our first code cell, we can code in Python just as we did with Spyder on day 1.

a = 15
print(50)

To run your code cell, or to format text in markdown cells, you can either hit the "run" button at the top of the notebook or hold shift and press enter. Either option will run your current cell, display any output below the cell, and create a new one.

Revisiting Types

During last week's talk, there were some questions about types, so this week we took some time to dig into this a bit more.

a = 50
type(a) # Shows that a is an integer

When we check the type of a variable which contains a number without a decimal it returns 'int', indicating that it is an integer. However, if we wanted to change this integer to a floating point number or string, we can do that with the following commands:

float(a) # Gives a as a floating point number (50.0)
str(a) # Gives a as a string ('50')

It is important to be aware of types when doing math. A number stored as a string cannot be used in calculations.

a = 50 # Defining a as an integer
b = 27 # Defining b as an integer
b = str(b) # Converting b to a string

a + b # This will give you an error because the variable b is a string

Introduction to Loops

First, let's define a list of odd numbers:

odds = [1, 3, 5]
print(odds)

We can print out the entire list at once by printing it out as we have above. If we wanted to only select one of those values, we can indicate the location of the value using square brackets. Note that Python starts counting at zero, not one, so the first place in the list is considered index number zero in Python.

print(odds[0]) # This will give you 1
print(odds[1]) # This will give you 3
print(odds[2]) # This will give you 5

If we want to print all the values out one by one, we could do it as we have above. But what if we change the variable 'odds'? What would happen if we tried to print the three values but our list is only two values long?

odds = [1, 3]

print(odds[0]) # This will still give you 1
print(odds[1]) # This will still give you 3
print(odds[2]) # This will now give you an error

This means that if we make any changes to the length of the original list, we will either get an error or we won't see all of the values in the list. To avoid this, we can utilize loops!

A loop can be defined using the following formula:

for variable in collection:

Where collection is your list, and variable is a placeholder variable you can select. It doesn't matter what you select for variable, as long as it isn't the name of another variable within your code. After this line of code, hit return, Python will automatically indent your next line. Everything within the loop should have this indentation for Python to understand it should be grouped together. To end the loop, just continue to write your code without the indentation.

odds = [1, 3, 5]

for num in odds:
     print(num) # This will print each value on it's own line

In our situation, our collection is 'odds', the list we defined above, and 'num' is the placeholder variable. What does does is sets 'num' equal to the first value in the 'odds' list, executes the indented code (in this case print(num)), then returns to the top of the loop, sets 'num' to the next value in 'odds', and, again, executes the indented code. All variables defined outside the loop can be used within it, and all variables defined within the loop can be used outside of it.

We can do math within a loop, too:

names = ['Python', 'R', 'MATLAB'] # Create a list of strings
length = 0 # Create a variable for us to do math with, we will start our addition at zero

# Below is the loop:
for value in names: # Set "value" equal to each string in "names"
    print('The current name is:', value) # Print what the variable "value" currently is set to
    length = length + 1 # Add 1 to the length
    print('The new length is', length) # Print the current length

The above code should give you the following output:

The current name is: Python
The new length is 1
The current name is: R
The new length is 2
The current name is: MATLAB
The new length is 3

Each time we looped through the code, length increased by one each time the line 'length = length + 1' was run, showing that we can build upon a variable within a loop.

Plotting Refresher

Before we can start importing data and plotting it, we need to import the libraries numpy and pyplot from matplotlib by using the following import commands. Note that we are setting both equal to an alias (shortened nicknames to use in your code, in this case np and plt, respectively).

import numpy as np
import matplotlib.pyplot as plt

We imported numpy so we can import the data. We will use the 'np.loadtxt' command with two arguments, 'fname' and 'delimiter'. The argument 'fname' is the path to your data and 'delimiter' refers to the character being used to separate values within your csv or text file. In this case, our delimiter is a comma.

data = np.loadtxt(fname='inflammation-01.csv', delimiter = ',')

Now, we can plot the mean of the data just as we did last week.

plt.figure(figsize = (5, 5)) # Create one figure with the size 5 x 5
plt.plot(np.mean(data, axis = 0)) # Plot the mean of data, taken across the 0th axis (rows)
plt.title('Average Data') # Give the plot a title

Using Glob

We can import the library 'glob' to read in multiple files at once. The glob.glob command searches for all files matching the criteria specified in parenthesis. For example, below we are printing all files that have the extension .csv in the directory my notebook is in.

import glob
print(glob.glob('*.csv')

Glob doesn't guarantee any specific order, so you can sort the list using the sorted command.

filenames = sorted(glob.glob('*.csv')) # Save a sorted list of all the .csv files in this directory to the variable 'filename'

Creating images from multiple files with loops

We can combine everything we've learned today by looping through a list of files and creating a figure of each.

filenames = sorted(glob.glob('*csv'))

for filename in filenames: # Set the strings in the variable 'filenames' equal to 'filename' one by one
    print(filename) # Print the current filename
    data = np.loadtxt(fname = filename, delimiter = ',') # Load the current file
    plt.figure(figsize = (5, 5))
    plt.plot(np.mean(data, axis = 0))
    plt.title('Average Data')
    plt.show() # Show the plot

References

Project Jupyter
Anaconda
Programming with Python - Storing Multiple Values in Lists Software Carpentry lesson
Programming with Python - Repeating Actions with Loops Software Carpentry lesson
Programming with Python - Analyzing Data from Multiple Files