Installation

  1. Prepare for the Unix/Bash Shell section by following these instructions: https://carpentries.github.io/workshop-template/#shell
  • Note that if you’re on a Mac, you don’t have to do anything, but if you’re on a windows computer, you do need to download additional software.
  • If you watch the video, she will refer to a different website (for download link and the text editor link), which was confusing to me at first, but everything she’s referring to in the video can be found on the same page as the video (https://carpentries.github.io/workshop-template/#shell).
  1. Download Anaconda Navigator (for Python) using these instructions: https://carpentries.github.io/workshop-template/#python
  • A download is required for all operating systems for this
  • This includes everything you need to run Python on your computer

Unix Shell

Useful Commands

Command What it does
pwd Print current working directory
cd [path] Change current directory to the directory specified in [path]
ls [path] List all contents in the directory specified in [path]. If no path is specified it prints the current directory’s contents.
cp [file to copy] [new name] Copy a file from the name/location specified in [file to copy] to the name/location specified in [new name]
mkdir [name] Create a new fold of the name [name]
mv [old] [new] Move a file specified in [old] to a new name or location specified in [new]. This is both how you move and rename files
rm [file] Removes file specified in [file]
. Current directory or folder
.. Directory or folder above the current one
man [command] Bring up the help page for the command specified in [command]
clear Clear screen

R vs Python

Key Differences

R Python
Designed for CSV/text file, with recent packages being developed for other formats Many formats can be used easily
Optimized for statistical analysis of large datasets and data exploration Optimized for general use (machine learning, data analysis, and web applications)
Plotting has been strongly developed to communicate statistical analysis Many plotting libraries with lots of flexibility
Uses R Studio (Note that you must install R separate from R studio)

What to Use

  • Both are GREAT languages to learn
  • Use whatever the people around you are already using, it’ll be easier to learn and collaborate!
  • If you’re thinking you’ll do a lot of statistical analysis, R might be better.
  • If you’re going to be working with a lot of NETCDF files, Python might be better.

There are ways to use them together, and once you learn one, it’ll be significantly easier to learn the other, so don’t stress about this choice!


Introduction to Python

Included in your Anaconda download:

  1. Anaconda Navigator - Central hub for all things Python
  2. Jupyter (Notebook or Lab) - Combines Markdown and Code
  3. Spyder - Code IDE

Basic Rules of Python

  • Case sensitive
  • Tabs matter
  • Variables are assigned with =
    • Cannot start with a number
    • Can include letters, digits, and underscores
  • Some words cannot be used as variables
    • and, if, else, break, import, and more

Our First Python Code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Sep 10 11:58:43 2021

@author: smurphy
"""

weight_kg = 60

# Defining variables
## Includes letters, digits, and underscores
## Case sensitive
## 0weight, 1weight doesnt work, can't start with a number

# floating point number
floatingnumber = 60.2
floatingnumber_2 = 60.0

# integer
integernumber = 60

# strings
stringnumber = '20'

weight_lb = 2.2 * weight_kg

print('The weight in kg is ', weight_kg)

print(type(weight_kg))

print(type(stringnumber))

Vocabulary

Some important vocabulary. These definitions are taken directly from Summary of Basic Commands – The Unix Shell and Glossary – Programming with Python.

  • absolute path - A path that refers to a particular location in a file system. Absolute paths are usually written with respect to the file system’s room directory, and begin with either ‘/’ (on Unix) or ’' (on Microsoft Windows).
  • argument - A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
  • assertion - An expression which is supposed to be true at a particular point in a program. Programmers typically put assertions in their code to check for errors; if the assertion fails (i.e., if the expression evaluates as false), the program halts and produces an error message.
  • assign - To give a value a name by associating a variable with it.
  • body (of a function) - the statements that are executed when a function runs.
  • case-insensitive - Treating text as if upper and lower case characters of the same letter were the same.
  • case-sensitive - Treating text as if upper and lower case characters of the same letter are different.
  • command-line interface - A user interface based on typing commands.
  • comment - A remark in a program that is intended to help human readers understand what is going on, but is ignored by the computer. Comments in Python, R, and the Unix shell start with a ‘#’ character and run to the end of the line; comments in SQL start with ‘–’, and other languages have other conventions.
  • conditional statement - A statement in a program that might or might not be executed depending on whether a test is true or false.
  • comma-separated values (CSV) - A common textual representation for tables in which the values in each row are separated by commas.
  • current working directory - The directory that relative paths are calculated from; equivalently, the place where files referenced by name only are searched for. Every process has a current working directory. The current working directory is usually referred to using the shorthand notation ‘.’(pronounced “dot”).
  • delimiter - A character or characters used to separate individual values, such as the commas between columns in a CSV file.
  • file system - A set of files, directories, and I/O devices (such as keyboards and screens). A file system may be spread across many physical devices, or many file systems may be stored on a single physical device; the operating system manages access.
  • filename extension - The portion of a file’s name that comes after the final “.” character. By convention this identifies the file’s type.
  • filter - A program that transforms a stream of data. Many Unix command-line tools are written as filters: they read data from standard input, process it, and write the result to standard output.
  • floating-point number - A number containing a fractional part and an exponent.
  • for loop - A loop that is executed once for each value in some kind of set, list, or range.
  • function - A named group of instructions that is executed when the function’s name is used in the code.
  • graphical user interface - A user interface based on selecting items and actions from a graphical display, usually controlled by using a mouse.
  • home directory - The default directory associated with an account on a computer system. By convention, all of a user’s files are stored in or below her home directory.
  • import - To load a library into a program.
  • in-place operators - An operator such as ‘+=’ that provides a shorthand notation for the common case in which the variable being assigned to is also an operand on the right hand side of the assignment. For example, the statement ‘x += 3’ means the same thing as ‘x = x + 3’.
  • index - A subscript that specifies the location of a single value in a collection, such as a single pixel in an image.
  • inner loop - A loop that is inside another loop.
  • integer - A whole number, such as -12343.
  • library - A family of code units (functions, classes, variables) that implement a set of related tasks.
  • local variable - A variable defined inside of a function, that exists only in the scope of that function, meaning it cannot be accessed by code outside of the function.
  • loop - A set of instructions to be executed multiple times. Consists of a loop body and (usually) a condition for exiting the loop.
  • loop body - The set of statements or commands that are repeated inside a for loop or while loop.
  • loop variable - The variable that keeps track of the progress of the loop.
  • operating system - Software that manages interactions between users, hardware, and software processes. Common examples are Linux, macOS, and Windows.
  • option - A way to specify an argument or setting to a command-line program. By convention Unix applications use a dash followed by a single letter, such as ‘-v’, or two dashes followed by a word, such as ‘–verbose’, while DOS applications use a slash, such as ‘/V’. Depending on the application, an option may be followed by a single argument, as in ‘-o /tmp/output.txt’.
  • outer loop - A loop that contains another loop.
  • parameter - A variable named in a function’s declaration that is used to hold a value passed into the call.
  • parent directory - The directory that “contains” the one in question. Every directory in a file system except the root directory has a parent. A directory’s parent is usually referred to using the shorthand notation ‘..’ (pronounced “dot dot”).
  • path - A description that specifies the location of a file or directory within a file system.
  • pipe - A connection from the output of one program to the input of another. When two or more programs are connected in this way, they are called a “pipeline”.
  • regular expression - A pattern that specifies a set of character strings. REs are most often used to find sequences of characters in strings.
  • relative path - A path that specifies the location of a file or directory with respect to the current working directory). Any path that does not begin with a separator character (‘/’ or ’') is a relative path.
  • root directory - The top-most directory in a file system. Its name is ‘/’ on Unix (including Linux and macOS) and ’' on Microsoft Windows.
  • shell/command shell - A command-line interface such as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that allows a user to interact with the operating system.
  • shell script - A set of shell commands stored in a file for re-use. A shell script is a program executed by the shell; the name “script” is used for historical reasons.
  • string - Short for “character string”, a sequence of zero or more characters.
  • syntax - The rules that define how code must be written for a computer to understand.
  • syntax error - A programming error that occurs when statements are in an order or contain characters not expected by the programming language.
  • sub-directory - A directory contained within another directory.
  • tab completion - A feature provided by many interactive systems in which pressing the Tab key triggers automatic completion of the current word or command.
  • variable - A name in a program that is associated with a value or a collection of values.
  • while loop - A loop that keeps executing as long as some condition is true.
  • wildcard - A character used in pattern matching. In the Unix shell, the wildcard ‘*’ matches zero or more characters, so that ’*.txt’ matches all files whose names end in ‘.txt’.