Installation
- Prepare for the Unix/Bash Shell section by following these
instructions: https://carpentries.github.io/workshop-template/#shell
- Note that if you’re on a Mac, you don’t have to do anything, but if
you’re on a windows computer, you do need to download additional
software.
- If you watch the video, she will refer to a different website (for
download link and the text editor link), which was confusing to me at
first, but everything she’s referring to in the video can be found on
the same page as the video (https://carpentries.github.io/workshop-template/#shell).
- Download Anaconda Navigator (for Python) using these instructions:
https://carpentries.github.io/workshop-template/#python
- A download is required for all operating systems for this
- This includes everything you need to run Python on your
computer
Unix Shell
Useful Commands
pwd |
Print current working directory |
cd [path] |
Change current directory to the directory specified in [path] |
ls [path] |
List all contents in the directory specified in [path]. If no path
is specified it prints the current directory’s contents. |
cp [file to copy] [new name] |
Copy a file from the name/location specified in [file to copy] to
the name/location specified in [new name] |
mkdir [name] |
Create a new fold of the name [name] |
mv [old] [new] |
Move a file specified in [old] to a new name or location specified
in [new]. This is both how you move and rename files |
rm [file] |
Removes file specified in [file] |
. |
Current directory or folder |
.. |
Directory or folder above the current one |
man [command] |
Bring up the help page for the command specified in [command] |
clear |
Clear screen |
R vs Python
Key Differences
Designed for CSV/text file, with recent packages being developed for
other formats |
Many formats can be used easily |
Optimized for statistical analysis of large datasets and data
exploration |
Optimized for general use (machine learning, data analysis, and web
applications) |
Plotting has been strongly developed to communicate statistical
analysis |
Many plotting libraries with lots of flexibility |
Uses R Studio (Note that you must install R separate from R
studio) |
|
What to Use
- Both are GREAT languages to learn
- Use whatever the people around you are already using, it’ll be
easier to learn and collaborate!
- If you’re thinking you’ll do a lot of statistical analysis, R might
be better.
- If you’re going to be working with a lot of NETCDF files, Python
might be better.
There are ways to use them together, and once you learn
one, it’ll be significantly easier to learn the other, so don’t stress
about this choice!
Introduction to Python
Included in your Anaconda download:
- Anaconda Navigator - Central hub for all things Python
- Jupyter (Notebook or Lab) - Combines Markdown and Code
- Spyder - Code IDE
Basic Rules of Python
- Case sensitive
- Tabs matter
- Variables are assigned with
=
- Cannot start with a number
- Can include letters, digits, and underscores
- Some words cannot be used as variables
- and, if, else, break, import, and more
Our First Python Code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Fri Sep 10 11:58:43 2021
@author: smurphy
"""
weight_kg = 60
# Defining variables
## Includes letters, digits, and underscores
## Case sensitive
## 0weight, 1weight doesnt work, can't start with a number
# floating point number
floatingnumber = 60.2
floatingnumber_2 = 60.0
# integer
integernumber = 60
# strings
stringnumber = '20'
weight_lb = 2.2 * weight_kg
print('The weight in kg is ', weight_kg)
print(type(weight_kg))
print(type(stringnumber))
Vocabulary
Some important vocabulary. These definitions are taken directly from
Summary
of Basic Commands – The Unix Shell and Glossary
– Programming with Python.
- absolute path - A path that refers to a particular
location in a file system. Absolute paths are usually written with
respect to the file system’s room directory, and begin with either ‘/’
(on Unix) or ’' (on Microsoft Windows).
- argument - A value given to a function or program
when it runs. The term is often used interchangeably (and
inconsistently) with parameter.
- assertion - An expression which is supposed to be
true at a particular point in a program. Programmers typically put
assertions in their code to check for errors; if the assertion fails
(i.e., if the expression evaluates as false), the program halts and
produces an error message.
- assign - To give a value a name by associating a
variable with it.
- body (of a function) - the statements that are
executed when a function runs.
- case-insensitive - Treating text as if upper and
lower case characters of the same letter were the same.
- case-sensitive - Treating text as if upper and
lower case characters of the same letter are different.
- command-line interface - A user interface based on
typing commands.
- comment - A remark in a program that is intended to
help human readers understand what is going on, but is ignored by the
computer. Comments in Python, R, and the Unix shell start with a ‘#’
character and run to the end of the line; comments in SQL start with
‘–’, and other languages have other conventions.
- conditional statement - A statement in a program
that might or might not be executed depending on whether a test is true
or false.
- comma-separated values (CSV) - A common textual
representation for tables in which the values in each row are separated
by commas.
- current working directory - The directory that
relative paths are calculated from; equivalently, the place where files
referenced by name only are searched for. Every process has a current
working directory. The current working directory is usually referred to
using the shorthand notation ‘.’(pronounced “dot”).
- delimiter - A character or characters used to
separate individual values, such as the commas between columns in a CSV
file.
- file system - A set of files, directories, and I/O
devices (such as keyboards and screens). A file system may be spread
across many physical devices, or many file systems may be stored on a
single physical device; the operating system manages access.
- filename extension - The portion of a file’s name
that comes after the final “.” character. By convention this identifies
the file’s type.
- filter - A program that transforms a stream of
data. Many Unix command-line tools are written as filters: they read
data from standard input, process it, and write the result to standard
output.
- floating-point number - A number containing a
fractional part and an exponent.
- for loop - A loop that is executed once for each
value in some kind of set, list, or range.
- function - A named group of instructions that is
executed when the function’s name is used in the code.
- graphical user interface - A user interface based
on selecting items and actions from a graphical display, usually
controlled by using a mouse.
- home directory - The default directory associated
with an account on a computer system. By convention, all of a user’s
files are stored in or below her home directory.
- import - To load a library into a program.
- in-place operators - An operator such as ‘+=’ that
provides a shorthand notation for the common case in which the variable
being assigned to is also an operand on the right hand side of the
assignment. For example, the statement ‘x += 3’ means the same thing as
‘x = x + 3’.
- index - A subscript that specifies the location of
a single value in a collection, such as a single pixel in an image.
- inner loop - A loop that is inside another
loop.
- integer - A whole number, such as -12343.
- library - A family of code units (functions,
classes, variables) that implement a set of related tasks.
- local variable - A variable defined inside of a
function, that exists only in the scope of that function, meaning it
cannot be accessed by code outside of the function.
- loop - A set of instructions to be executed
multiple times. Consists of a loop body and (usually) a condition for
exiting the loop.
- loop body - The set of statements or commands that
are repeated inside a for loop or while loop.
- loop variable - The variable that keeps track of
the progress of the loop.
- operating system - Software that manages
interactions between users, hardware, and software processes. Common
examples are Linux, macOS, and Windows.
- option - A way to specify an argument or setting to
a command-line program. By convention Unix applications use a dash
followed by a single letter, such as ‘-v’, or two dashes followed by a
word, such as ‘–verbose’, while DOS applications use a slash, such as
‘/V’. Depending on the application, an option may be followed by a
single argument, as in ‘-o /tmp/output.txt’.
- outer loop - A loop that contains another
loop.
- parameter - A variable named in a function’s
declaration that is used to hold a value passed into the call.
- parent directory - The directory that “contains”
the one in question. Every directory in a file system except the root
directory has a parent. A directory’s parent is usually referred to
using the shorthand notation ‘..’ (pronounced “dot dot”).
- path - A description that specifies the location of
a file or directory within a file system.
- pipe - A connection from the output of one program
to the input of another. When two or more programs are connected in this
way, they are called a “pipeline”.
- regular expression - A pattern that specifies a set
of character strings. REs are most often used to find sequences of
characters in strings.
- relative path - A path that specifies the location
of a file or directory with respect to the current working directory).
Any path that does not begin with a separator character (‘/’ or ’') is a
relative path.
- root directory - The top-most directory in a file
system. Its name is ‘/’ on Unix (including Linux and macOS) and ’' on
Microsoft Windows.
- shell/command shell - A command-line interface such
as Bash (the Bourne-Again Shell) or the Microsoft Windows DOS shell that
allows a user to interact with the operating system.
- shell script - A set of shell commands stored in a
file for re-use. A shell script is a program executed by the shell; the
name “script” is used for historical reasons.
- string - Short for “character string”, a sequence
of zero or more characters.
- syntax - The rules that define how code must be
written for a computer to understand.
- syntax error - A programming error that occurs when
statements are in an order or contain characters not expected by the
programming language.
- sub-directory - A directory contained within
another directory.
- tab completion - A feature provided by many
interactive systems in which pressing the Tab key triggers automatic
completion of the current word or command.
- variable - A name in a program that is associated
with a value or a collection of values.
- while loop - A loop that keeps executing as long as
some condition is true.
- wildcard - A character used in pattern matching. In
the Unix shell, the wildcard ‘*’ matches zero or more characters, so
that ’*.txt’ matches all files whose names end in ‘.txt’.