Welcome to The Carpentries Etherpad!

This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.

Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try etherpad.wikimedia.org).

Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html

All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/


Etherpad
https://pad.carpentries.org/2019-03-08-UCSF-Python

Please fill out the pre-workshop survey
https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id=2019-03-08-UCSF-Python

Courses Websitec
https://courses.ucsf.edu/course/index.php?categoryid=499

# Day 1 BASH Shell:
    
Download the data-shell.zip file for the Unix section and unzip it (you can leave it in your downloads folder)
https://swcarpentry.github.io/shell-novice/setup.html

"whoami" unix command
Geoff recommended to create a separate account for "system administrator" as root user, then log into your own account when you are doing your usual tasks to minimize risk of messing up things

pwd = path to working directory

ls = list directory (will show contents of the pwd)
ls -F = distinguishes between directories and files
    `-F` is a "command flag" which changes the behavior of the command
    Also sometimes called an option
    If you have a long list e.g. in Downloads, d* = list only files beginning with d
    
    ls -a = how you can see everything
    . = refers to where you are
    .. = refers to directory above you
    ~ = refers to the home directory absolute path
    
    ls -l = table-like listing which shows a bunch of other information about files/directories
    
Flags can be combined, so `ls -a -l` is a command to show ALL files in that tabular format.

cd = change directory
    `cd Downloads` means change into the directory called "Downloads" which is a sub-directory of the pwd
    cd without any argument takes you to your home directory
    `cd ..` takes you to the parent of the present working directory.
    e.g. `cd ../writing` takes you UP one directory and then back down into the writing directory.  This is called a "Relative Path" because it's relative to your present working directory.
    
    Unix is a tree, branches downwards

Tab completion allows you to hit the <Tab> key while you're typing a command or a directory name (or a few other things) and BASH will fill in the rest IF it's unique/unambiguous.
If your screen flashes and does not tab-complete, Unix is telling you that there are several things starting with the key you pressed. Press <tab> <tab> quickly to ask Unix for all the options - then you can pick! \
<Up>/<Down> keys will move through your history and pull up previous lines

CTRL+A allows you to hit the <Control> key followed by <A> key to jump to the beginning of you current command

   e.g. `cd ../writing` takes you UP one directory and then back down into the writing directory.  This is called a "Relative Path" because it's relative to your present working directory.
   
Exercise #1:
Start in the home directory and navigate to the writing folder (subdirectory of data-shell) in three different ways:
    1) step by step, checking where you are at each step
    2) use a relative path
    3) Use and absolute path
   
   
nano is a text editor that works on the terminal, rather than opening up a window

other text editors: vi/vim or emacs but will need to get used to using it first
   
cat #shows the content of a file

example     rm thesis.txt = rm means remove

   
   rm = remove or delete (CAREFUL!  don't do this unless you're SURE)
   cp = copy
   mv = change name , move  (Also be careful with this; you can overwrite files without meaning to.)
   mkdir = make directory
   
   You can just type ls Thesis - so ls and the name of the directory to see what's in it without going into it
   
   
Come back at 11:20, thanks!


How to color code your shell: 
command prompt ls output color coding: 
> nano ~/.bash_profile
add:




Ctrl-O for save
Ctrl-X for exit
> source ~/.bash_profile
> ls

rm -r = remove a directory and everything inside of it (-r stands for "recursive")

Finding things commands:
grep = global regular expressions parser (duh) (search the contents of a file for a particular phrase)
find = find files on your computer based on various criteria

man grep = pull up manual for how to us grep, view full documentation
man - manual
when in the manual and want to exit , hit q for Quit
to stop process from running and get back to command line, ctrl + c
grep -E '^.o' haiku.txt  means find all instances of o as the second letter within file haiku.txt

Exercise:
    Use grep to identify a word or phrase that shows up in all directories under 'data-shell/writing'
    HINT: the '-l' (dash lowercase-L) flag lists the names of the files without showing their contents
    
wc = word count (will give you lines, words, characters from a given file or list of files)
wc -l *.pdb = (dash lowercase-L) will list the number of lines in each file that ends with ".pdb"
wc -l *.pdb > lengths.txt = save all of the file lengths of .pdb files to a new file named lengths.txt.


# Python Day 1

((( Archived Jupyter notebook can be found at https://gboushey.github.io/2019-03-08-UCSF-Python/files/day-1.ipynb  )))

$ # Close your terminals and your jupyter notebook windows
$ # Download the gapminder data (link on pad)
$ # Unzip the gapminder data on your Desktop
$ # Open a terminal window and navigate into that directory
$ # "~/Desktop/data/"

https://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip

If you can't run the code on a line in jupyter notebook, you may have accidentally set the cell type to something other than "code"
To fix this, go to Cell->Cell Type->Code

variable names can't have spaces. use an underscore or dash (underscores are a more common convention, but either will work)

when naming variables, underscores, spacing, and capitalization matters
ie: my name is not the same as My_Name or my_name ; python would bounce back you have a syntax error or some other error to check your variable name

naming conventions:
    my.name will not work in python. periods mean something different
    try to start with a letter, no dashes, no spaces, ok to combine with number

You can enter more than one line of code in a cell in a jupyter notebook

python reads code in a series top down

when coding, try to explicitly describe what it should do, then see the result, you want to predict that result first, think of how to test that the code is actually doing what you want it to

to see the variables you can use
dir()
whos will list the variables: name, type, and value (you can use %whos - the % means it is an ipython or jupyter notebook shortcut, may not be avialable in all python code)

variables in python have a specific type: integer, string (text), float, boolean (lots of others)
a "type error" happens when you try to do an operation on data types that isn't available in python (like adding an integer with a string)

running python code "Some text" + "Some other text" would concatonate text in quotes together

if variable is an integer but you want to change it to a string, use function str()
To change a string to an int, use int()
note that the string must make sense as an int! int("100") works, int("one hundred") doesn't

print(a, b) will put a space between a and b. it's equivalent to print(a + " " + b).

python is zero indexed .... when you create a list, python considers the first item as zero
This varies language to language - R, for example, indexes at "1" rather than "0"

use del to delete a variable or an element of a list by index
del my_var deletes the variable
del my_list[5] deletes the item at index 5 in a list
-1 access the last element of a list: my_list[-1] will get the last element 

You can multiply a list by an integer n, and get n copies of that list
['Carl'] * 3 will generate ['Carl', 'Carl', 'Carl']

Anyone who is interested can later read about "Parhyale hawaiensis, a marine amphipod crustacean" - both hard to pronounce, and hard to spell! (: ~ Rebecca

With a single line of python code, write a line that will print the first character in the last list item:
    One solve is:
        challenge_list[-1][0]
        If we know the length of challenge_list, we could also specify the last item that way.


Codes in python we can reuse = libraries, packages, modules etc
As a jupyter shortcut, use ? followed by the library/method. ?math.log will print information about the method, input parameters, output
dir(math) will show the function names available on the math library

when citing python in scientific paper, remember cite version #, what library, what third party, and ultimately you can share your code

libraries helpful for tabular data:
    pandas, conventionally shorten as pd
    matplotlib.plyplot and Byron likes to shorten as plt
    seaborn and Byron likes to shorten as sns
    
You can view csv and other text files in Jupyter Notebook (click on the file name in your notebook "Home" window). Keep in mind, this will load the whole file, so be careful about opening very large files. In that case, you're better off using unix and using the head utility to view the first few lines

data.loc[0] will give the first row of the dataframe while data.loc[0:2] will give row 0 and row 1

plt.xticks(rotation=90) will rotate the labels

# Day 2 

Continuing with Python

((( Archived Jupyter notebook can be found at https://gboushey.github.io/2019-03-08-UCSF-Python/files/day-2.ipynb  )))

Github

** in terminal, type history to see all the commands you wrote thus far

python sandwich demo: https://github.com/jgu1/python_sandwich

What to do next? A few students asked me how I went on to teach myself R and Python after having attended a SWC workshop - here are some suggestions: https://docdro.id/OqryuuT
The PDF books referenced in that file aren't included, but feel free to reach out to me via email (rebecca.jaszczak@ucsf.edu) and I can help.

Message from Jialiang (helper): please do not hesitate to contact me at jialiang.gu@ucsf.edu if you find another coding issue in your daily life, we can quickly chat and solve, we can probably collaborate over dataset, too.