Welcome to The Carpentries Etherpad!
This pad is synchronized as you type, so that everyone viewing this page sees the same text. This allows you to collaborate seamlessly on documents.
Use of this service is restricted to members of The Carpentries community; this is not for general purpose use (for that, try etherpad.wikimedia.org).
Users are expected to follow our code of conduct: https://docs.carpentries.org/topic_folders/policies/code-of-conduct.html
All content is publicly available under the Creative Commons Attribution License: https://creativecommons.org/licenses/by/4.0/
Etherpad
https://pad.carpentries.org/2019-03-08-UCSF-Python
Please fill out the pre-workshop survey
https://www.surveymonkey.com/r/swc_pre_workshop_v1?workshop_id=2019-03-08-UCSF-Python
Courses Websitec
https://courses.ucsf.edu/course/index.php?categoryid=499
# Day 1 BASH Shell:
Download the data-shell.zip file for the Unix section and unzip it (you can leave it in your downloads folder)
https://swcarpentry.github.io/shell-novice/setup.html
"whoami" unix command
Geoff recommended to create a separate account for "system administrator" as root user, then log into your own account when you are doing your usual tasks to minimize risk of messing up things
pwd = path to working directory
ls = list directory (will show contents of the pwd)
ls -F = distinguishes between directories and files
`-F` is a "command flag" which changes the behavior of the command
Also sometimes called an option
If you have a long list e.g. in Downloads, d* = list only files beginning with d
ls -a = how you can see everything
. = refers to where you are
.. = refers to directory above you
~ = refers to the home directory absolute path
ls -l = table-like listing which shows a bunch of other information about files/directories
Flags can be combined, so `ls -a -l` is a command to show ALL files in that tabular format.
cd = change directory
`cd Downloads` means change into the directory called "Downloads" which is a sub-directory of the pwd
cd without any argument takes you to your home directory
`cd ..` takes you to the parent of the present working directory.
e.g. `cd ../writing` takes you UP one directory and then back down into the writing directory. This is called a "Relative Path" because it's relative to your present working directory.
Unix is a tree, branches downwards
Tab completion allows you to hit the <Tab> key while you're typing a command or a directory name (or a few other things) and BASH will fill in the rest IF it's unique/unambiguous.
If your screen flashes and does not tab-complete, Unix is telling you that there are several things starting with the key you pressed. Press <tab> <tab> quickly to ask Unix for all the options - then you can pick! \
<Up>/<Down> keys will move through your history and pull up previous lines
CTRL+A allows you to hit the <Control> key followed by <A> key to jump to the beginning of you current command
e.g. `cd ../writing` takes you UP one directory and then back down into the writing directory. This is called a "Relative Path" because it's relative to your present working directory.
Exercise #1:
Start in the home directory and navigate to the writing folder (subdirectory of data-shell) in three different ways:
1) step by step, checking where you are at each step
2) use a relative path
3) Use and absolute path
nano is a text editor that works on the terminal, rather than opening up a window
other text editors: vi/vim or emacs but will need to get used to using it first
cat #shows the content of a file
- (short for “concatenate“) command is one of the most frequently used command in Linux/Unix like operating systems. cat command allows us to create single or multiple files, view contain of file, concatenate files and redirect output in terminal or files
example rm thesis.txt = rm means remove
rm = remove or delete (CAREFUL! don't do this unless you're SURE)
cp = copy
mv = change name , move (Also be careful with this; you can overwrite files without meaning to.)
mkdir = make directory
You can just type ls Thesis - so ls and the name of the directory to see what's in it without going into it
Come back at 11:20, thanks!
How to color code your shell:
command prompt ls output color coding:
> nano ~/.bash_profile
add:
- alias ls='ls -G'
- export CLICOLOR=1
- export LSCOLORS=Gxfxcxdxbxegedabagacad
- export PS1="\e[0;33m[\u@\h \W]\$ \e[m "
Ctrl-O for save
Ctrl-X for exit
> source ~/.bash_profile
> ls
rm -r = remove a directory and everything inside of it (-r stands for "recursive")
Finding things commands:
grep = global regular expressions parser (duh) (search the contents of a file for a particular phrase)
find = find files on your computer based on various criteria
man grep = pull up manual for how to us grep, view full documentation
man - manual
when in the manual and want to exit , hit q for Quit
to stop process from running and get back to command line, ctrl + c
grep -E '^.o' haiku.txt means find all instances of o as the second letter within file haiku.txt
Exercise:
Use grep to identify a word or phrase that shows up in all directories under 'data-shell/writing'
HINT: the '-l' (dash lowercase-L) flag lists the names of the files without showing their contents
wc = word count (will give you lines, words, characters from a given file or list of files)
wc -l *.pdb = (dash lowercase-L) will list the number of lines in each file that ends with ".pdb"
wc -l *.pdb > lengths.txt = save all of the file lengths of .pdb files to a new file named lengths.txt.
# Python Day 1
((( Archived Jupyter notebook can be found at https://gboushey.github.io/2019-03-08-UCSF-Python/files/day-1.ipynb )))
$ # Close your terminals and your jupyter notebook windows
$ # Download the gapminder data (link on pad)
$ # Unzip the gapminder data on your Desktop
$ # Open a terminal window and navigate into that directory
$ # "~/Desktop/data/"
https://swcarpentry.github.io/python-novice-gapminder/files/python-novice-gapminder-data.zip
If you can't run the code on a line in jupyter notebook, you may have accidentally set the cell type to something other than "code"
To fix this, go to Cell->Cell Type->Code
variable names can't have spaces. use an underscore or dash (underscores are a more common convention, but either will work)
when naming variables, underscores, spacing, and capitalization matters
ie: my name is not the same as My_Name or my_name ; python would bounce back you have a syntax error or some other error to check your variable name
naming conventions:
my.name will not work in python. periods mean something different
try to start with a letter, no dashes, no spaces, ok to combine with number
You can enter more than one line of code in a cell in a jupyter notebook
python reads code in a series top down
when coding, try to explicitly describe what it should do, then see the result, you want to predict that result first, think of how to test that the code is actually doing what you want it to
to see the variables you can use
dir()
whos will list the variables: name, type, and value (you can use %whos - the % means it is an ipython or jupyter notebook shortcut, may not be avialable in all python code)
variables in python have a specific type: integer, string (text), float, boolean (lots of others)
a "type error" happens when you try to do an operation on data types that isn't available in python (like adding an integer with a string)
running python code "Some text" + "Some other text" would concatonate text in quotes together
if variable is an integer but you want to change it to a string, use function str()
To change a string to an int, use int()
note that the string must make sense as an int! int("100") works, int("one hundred") doesn't
print(a, b) will put a space between a and b. it's equivalent to print(a + " " + b).
python is zero indexed .... when you create a list, python considers the first item as zero
This varies language to language - R, for example, indexes at "1" rather than "0"
use del to delete a variable or an element of a list by index
del my_var deletes the variable
del my_list[5] deletes the item at index 5 in a list
-1 access the last element of a list: my_list[-1] will get the last element
You can multiply a list by an integer n, and get n copies of that list
['Carl'] * 3 will generate ['Carl', 'Carl', 'Carl']
Anyone who is interested can later read about "Parhyale hawaiensis, a marine amphipod crustacean" - both hard to pronounce, and hard to spell! (: ~ Rebecca
With a single line of python code, write a line that will print the first character in the last list item:
One solve is:
challenge_list[-1][0]
If we know the length of challenge_list, we could also specify the last item that way.
Codes in python we can reuse = libraries, packages, modules etc
- bring in these using command "import"
- these can be found in references or google, if you want to read more about its documentation, can google example for math, visit: https://docs.python.org/3/library/math.html or use command help
As a jupyter shortcut, use ? followed by the library/method. ?math.log will print information about the method, input parameters, output
dir(math) will show the function names available on the math library
when citing python in scientific paper, remember cite version #, what library, what third party, and ultimately you can share your code
libraries helpful for tabular data:
pandas, conventionally shorten as pd
matplotlib.plyplot and Byron likes to shorten as plt
seaborn and Byron likes to shorten as sns
You can view csv and other text files in Jupyter Notebook (click on the file name in your notebook "Home" window). Keep in mind, this will load the whole file, so be careful about opening very large files. In that case, you're better off using unix and using the head utility to view the first few lines
data.loc[0] will give the first row of the dataframe while data.loc[0:2] will give row 0 and row 1
plt.xticks(rotation=90) will rotate the labels
# Day 2
Continuing with Python
((( Archived Jupyter notebook can be found at https://gboushey.github.io/2019-03-08-UCSF-Python/files/day-2.ipynb )))
Github
- how to configure username:
- git config --global user.name "whatever your name is"
- how to configure email:
- git config --global user.email "insert email here"
- note: this email will be seen publicly seen on web, and needs to match the email you used to sign up for github
- how github interacts with your local version:
- local <--> git <--> github
- -->add
- -->commit
- workflow:
- nano ________(whatever file you are using)___
- edit it
- git add (name of file)
- git commit -m "brief note of change"
- git push origin master
- git status --> use this to check status throughout workflow process
- how to bring in changes from remote file to local file
- how to fix merge errors between local and remote versions
- 1. git add (file name)
- 2. git commit -m "insert comment about change"
- 3. git push origin master, reading error message can help you troubleshoot
- 4. open file in nano to see what has changed, resolve conflict
- 5. try again
- .gitignore : use if you dont want to push a massive data file, config, use for things you want to keep out of the repository because it doesn't belong there
** in terminal, type history to see all the commands you wrote thus far
python sandwich demo: https://github.com/jgu1/python_sandwich
What to do next? A few students asked me how I went on to teach myself R and Python after having attended a SWC workshop - here are some suggestions: https://docdro.id/OqryuuT
The PDF books referenced in that file aren't included, but feel free to reach out to me via email (rebecca.jaszczak@ucsf.edu) and I can help.
Message from Jialiang (helper): please do not hesitate to contact me at jialiang.gu@ucsf.edu if you find another coding issue in your daily life, we can quickly chat and solve, we can probably collaborate over dataset, too.