diff --git a/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.ipynb b/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.ipynb index 5adfeecc..72a076d4 100644 --- a/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.ipynb +++ b/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.ipynb @@ -621,7 +621,7 @@ }, { "cell_type": "code", - "execution_count": 35, + "execution_count": null, "metadata": {}, "outputs": [ { @@ -638,7 +638,7 @@ "y = \"Morning\"\n", "\n", "print(x + y)\n", - "print(3 * x )" + "print(3 * x)" ] }, { @@ -1011,7 +1011,452 @@ "To do so, we can use the functions **all** and **any**. These functions will test all the booleans we're working on at the same time.\n", "\n", "- `all(bools)` will tell us if all the booleans in `bools` are `True`. If they are all true, Python will return `True`. If even one of them is not true, Python will return `False`. \n", - "- `any(bools)` will tell us if we have any booleans that are `True`. If even one of our booleans is true, Python will return `True` when running this command." + "- `any(bools)` will tell us if we have any booleans that are `True`. If even one of our booleans is true, Python will return `True` when running this command.\n", + "\n", + "## Part 4: Collections\n", + "\n", + "### Lists\n", + "\n", + "Just as within R, Python allows us to store data in **lists**. We can create lists easily using the following syntax:\n", + "\n", + "`[item1, item2, ..., itemN]`\n", + "\n", + "Each item can be of any type, meaning we can have integers, floats, strings, and booleans all in our list. \n", + "\n", + "Let's create some lists below and check out their type!" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2, 3, 5, 7, 11]\n" + ] + }, + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# we'll create a list of the first 5 prime numbers\n", + "# and store the list in \"prime\"\n", + "\n", + "prime = [2, 3, 5, 7, 11]\n", + "\n", + "print(prime)\n", + "type(prime)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### What can we do with our lists?\n", + "\n", + "To access a specific item of a list, we can **index** the list. The syntax of indexing a list called `mylist` is as follows:\n", + "\n", + "`mylist[i]`\n", + "\n", + "where `i` is an integer. What we are doing here is selecting an element of the **collection** `mylist`. Let's try this with the list we made above." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "3" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prime[1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that `prime[1]` returned the second number in our list rather than the first (2). This is because **Python starts counting at ZERO!** Thus to access the first element of our list, we must write `mylist[0]`. \n", + "\n", + "We can also determine how many items are in a list using the function `len`. " + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(prime)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "What do you think will happen if we try to index a number higher than the number of items in a list? Try it below!" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "ename": "IndexError", + "evalue": "list index out of range", + "output_type": "error", + "traceback": [ + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", + "\u001b[0;31mIndexError\u001b[0m Traceback (most recent call last)", + "Cell \u001b[0;32mIn[4], line 1\u001b[0m\n\u001b[0;32m----> 1\u001b[0m \u001b[43mprime\u001b[49m\u001b[43m[\u001b[49m\u001b[38;5;241;43m6\u001b[39;49m\u001b[43m]\u001b[49m\n", + "\u001b[0;31mIndexError\u001b[0m: list index out of range" + ] + } + ], + "source": [ + "prime[6]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can us the keyword `in` to check if a list contains an specific entry." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "7 in prime" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "3.14 in prime" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are a few other common operations we might want to do with lists. We can:\n", + "\n", + "- Reverse the list with `mylist.reverse()`\n", + "- Sort the list with `mylist.sort()`\n", + " - Note that to do this, all of the elements in our list need to be either numbers (integers or floats), or all strings. If our list is all strings, `mylist.sort()` will sort the items in alphabetical order. \n", + "- Append the list to add another element to the end using `mylist.append(element)`\n", + " - Note that is we append the list with another list, we add the list itself to the end rather than the numbers or strings in that list. We can *combine* lists instead using `mylist.extend()`. \n", + "\n", + "#### Types of Lists\n", + "\n", + "As mentioned before, our lists don't have to contain only one type of variable. Let's do this to the list `prime` by replacing the value `7` with `7.0`. " + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "prime = [2, 3, 5, 7.0, 11]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Not only can we have different types of numbers within a list, we can have strings as well! `" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "['two', 3, 5, 'seven', 11]\n" + ] + }, + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "prime = [\"two\", 3, 5, \"seven\", 11]\n", + "print(prime)\n", + "type(prime)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can also look at the types of individual elements within the list by using the indexing we learned earlier as well as the `type()` function. " + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "type(prime[0]) = , type(prime[1]) = \n" + ] + } + ], + "source": [ + "print(f\"type(prime[0]) = {type(prime[0])}, type(prime[1]) = {type(prime[1])}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Although this is technically not a problem to do (i.e., Python will let you make lists with strings and numbers), be warned that you may run into some additional complications. For example, how would you sort a list that has both letters and numbers? Therefore, we suggest being extra careful. \n", + "\n", + "\n", + "## The `range` function\n", + "\n", + "One function we use frequently is the `range` function. There are three main versions of this.\n", + "\n", + "1. `range(n)` goes from $0$ to $n-1$. \n", + "2. `range(a, n)` goes from $a$ to $n-1$.\n", + "3. `range(a, n, d)` goes from $a$ to $n-1$, at the interval `d`. \n", + "\n", + "When use the `range` function to define an object, the type of that object will be `range`. For example," + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "range(5, 10)\n" + ] + }, + { + "data": { + "text/plain": [ + "range" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "r = range(5, 10, 1)\n", + "\n", + "print(r)\n", + "type(r)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Linking this back to what we were looking at before, we can turn `range`s into lists!`" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[5, 6, 7, 8, 9]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(r)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Tuples\n", + "\n", + "Tuples are another concept we will look at. These are very similar to lists, as they hold a collection of items.\n", + "\n", + "The main differences between tuples and lists are:\n", + "\n", + "- Tuples are created using parentheses instead of square brackets -- ( ) instead of [ ]. \n", + "- Tuples are *immutable*, meaning they can't be changed after they are created.\n", + "- Tuples are connected to multiple return values, which we will see later. \n", + "\n", + "Lists can be converted to tuples by using the `tuple()` function on a list. Tuples can be converted to lists by using the `list()` function.\n", + "\n", + "As we did with lists, we can pick out a specific item in our tuple by using `name_of_tuple[N]`, where `N` is an integer. Don't forget, Python starts counting at **zero**!\n", + "\n", + "## Lists vs Tuples\n", + "\n", + "Which one should we use? It depends on many factors.\n", + "\n", + "- What types of objects you are storing;\n", + "- Whether we want to reorder elements;\n", + "- Whether we want to add elements later on.\n", + "\n", + "Let's look at an example to understand this. \n", + "\n", + "Say we have an individual, Jerry, and some information on him: his age, weight, height, and favourite ice cream flavour. In this case, we would want to use a *tuple*, because the order between the elements is meaningless (i.e. we could have put Jerry's height before his weight), and adding more data would require a reinterpretation of the whole data structure (i.e. we don't know what this new element would mean!). " + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "('Jerry', 2024, 80, 183, 'vanilla')\n" + ] + } + ], + "source": [ + "jerry_2024 = (\"Jerry\", 2024, 80, 183, \"vanilla\")\n", + "print(jerry_2024)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "However, if we are looking at Jerry's favourite ice cream flavour over his whole life, a *list* would be a lot more useful! Adding a new element at the end, say Jerry's favourite ice cream flavour the next year, would make sense and would not change the meaning of the dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "jerry_2023_2025 = [\"strawberry\", \"vanilla\", \"mint\"]\n", + "print(jerry_2023_2025)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Some things might be best as tuples *and* lists! For example, what if we wanted Jerry's age, weight, height, and favourite ice cream flavour *over many years*?" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[(2023, 'strawberry'), (2024, 'vanilla'), (2025, 'mint')]\n" + ] + } + ], + "source": [ + "jerry = [(2023, \"strawberry\"), (2024, \"vanilla\"), (2025, \"mint\")]\n", + "print(jerry)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Overall, we suggest *always* using a list unless you *need* to use a tuple. You would need to use a tuple if:\n", + "\n", + "- the order of each element *cannot* change;\n", + "- the actual values of each element *cannot* change;\n", + "- or you want to use the collection as a key in a `dict`, which we will see soon. " ] }, { @@ -1033,6 +1478,11 @@ "To install a package, we simply run the code `import package`. To access and open the package that we've installed in our session, we run the code `package.function_name`. Let's do this below with the package `sys`, which helps Python work with our computer." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, { "cell_type": "code", "execution_count": 19, diff --git a/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.qmd b/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.qmd index 7184f363..6464788e 100644 --- a/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.qmd +++ b/project/docs/1_Getting_Started/getting_started_intro_to_python/getting_started_intro_to_python.qmd @@ -3,17 +3,7 @@ title: 0.4 - Introduction to Python author: COMET Team
_Jane Platt_ date: TBD description: 'This notebook introduces you to the fundamental concepts in Python. It might be a little complex for a start, but it covers basically all of the fundamental syntax you need to know in later notebooks. Don''t get overwhelmed! Remember, you can always review this later!' -categories: - - basics - - getting started - - data types - - data structures - - introduction - - dataframes - - variables - - operations - - functions - - python +categories: [basics, getting started, data types, data structures, introduction, dataframes, variables, operations, functions, Python] format: html: default ipynb: @@ -42,400 +32,537 @@ jupyter: python3 In this notebook, we will be introducing **Python**, which is a programming language designed for use in data science and analysis. If you are familiar with other programming languages, such as R, this will likely be very familiar to you - if this is your first time coding, don't be intimidated! Try to play around with the examples and exercises as you work through this notebook; it's easiest to learn to program in Python by trying things for yourself. -## Part 1: Basic Programming Concepts +## Part 0: Basics -To begin, we'll go over the most basic concepts used in Python. Specifically, we will look at variable assignment and functions. Later we will go over the different things that can go into functions and variables, but it will be important to first grasp how they work. +This section will go over some Python basics. If you have already coded before, say in Stata or R, then this section will be very straightfoward. If not, don't worry! The point of this section is simply to learn some basic elements of coding get a bit more comfortable with the language. -### 1.1 Variable Assignment +### 0.1: Variables -Variable assignment is fairly self explanatory: we will assign a value to a variable that we can then use for our analysis. +In all coding languages, we will often work with **variables**. These variables store information of many different types. For example, as we saw before, they can hold strings, integers, real numbers... -To assign a value to a variable, we put the variable name on the left and the value on the right, as such: +To create a variable, we need to use an assignment statement: ```{python} -variable = "value" +new_variable = 1 ``` -Now, Python will remember this assignment for the rest of our Python session (the time we're running code without closing Python). If you quit and close the session, Python will not remember that we assigned the value `value` to `variable`. - -Now if we run `variable`, it'll output `value` just like we told it to! +If we reassign a new value to our variable, the original value will be overwritten and lost. ```{python} -variable +new_variable = 10 ``` -If we don't assign a value to our object, Python will return an error. Try it for yourself below. +We can use the variable in the creation of other variables as well! ```{python} -x +number_100 = new_variable * new_variable ``` -This is because we haven't *defined* our variable. When Python reads our code, it reads from right to left. So it will first read the value we picked (originally `value`), and then the computed value is stored in our object (`variable`). In the previous case, we didn't define `x` (there is no value attached, no equal sign), so Python was not able to interpret what we meant by "x". - -We can change how we define different variables as much as we would like. For example, if we were to run: +This also works for strings and is called **concatenating**. See below: ```{python} -variable = 10 +string_var = "Good" +string_var + "Morning!" ``` -we've now changed our variable to store the value `10`. Notice that we can change the type of variable we are working with (from string to numeric). Now if we run `variable`, we will see a different output: +There are some basic rules for naming your variables: -```{python} -variable -``` +- Spaces are not allowed +- Only letters, numbers, and underscores are allowed +- Variable names cannot start with a number -Python only keeps the most recent assignment, so if you change the value your variable holds, Python won't keep track of what you had put before. That's why it is useful to keep track of what you're working on using **comments** within your code. +## Part 1: Manipulating Data -### 1.2 Comments +To begin, we'll go over the most basic concepts used in Python. We'll look at how data is stored in objects, what we can do to these objects, and how to work with larger data structures. -Comments are a common concept found in all coding languages. These are little notes that we are able to leave in our code that won't affect our outputs, but that allow us to keep track of what we've been doing. +### 1.1: Object Types -To leave a comment in Python, just include `#` at the beginning of your comment. Python ignores everything that comes *after* a #. See some examples below: +As always when learning a new language, it's important to get a good grasp of the different **object types** in Python and how to use them. Whenever we work with Python, we will be manipulating different kinds of information, which is referred to as an "object". In Python, objects can contain both data and functions we can use on our data. These functions are sometimes called **methods**. -```{python} -first_name = "your first name" # Enter your first name here! -last_name = "your last name" # Enter your last name here! +Python has 3 main data types. The **type** of data will determine what kinds of methods and functions we can perform. Therefore, it will be important to keep track of what type of data we are working with! -# colour = "pick a colour" +The three types of data in Python are: -print(first_name + ' ' + last_name) -``` +1. **Numbers**, which can either be *integers* (known as `int` in Python) or *floating point numbers* (known as `float` in Python). Integers can only take on the values of integers, and floats can be any real number. The easiest way to tell the difference between these two types of numbers is the presence of a decimal point. Integers do not have decimals, floats do. +2. **Strings**, which store all text data (known as `str` in Python). For data to be stored as a string, it must be written within quotation marks. Both `" "` and `' '` work! +3. **Booleans**, which simply denote true or false. They will be useful later on, when we start working with operations. +4. **Methods**, which are *functions* we can use on our data. -Notice how the comment we left to the right of the first two lines do not affect the code. We are able to print `first_name` and `last_name` without any issues. Try printing `colour`, though. +To figure out what type of object we are working with, we can use the `type()` function. We will see some examples in a bit. -print(colour) +We can take a look at what an object holds, in terms of data and methods, by typing `.` after the object's name and hitting `tab`. This is typically called *tab completion* or *introspection*. Doing this will list out a few different options that you can scroll through. See below: -Notice how this doesn't work? Recall that Python doesn't read anything to the right of the `#`. Because we put the `#` at the beginning of the line, Python did not read the line where we created the variable `colour`. If we wanted to create this variable, we would have to remove the `#`. Try doing that and see what pops up when you run `print(colour)`! +![Introspection](media/introspection.png) -## Part 2: Objects and Types +#### Working with Numbers -As always when learning a new language, it's important to get a good grasp of the different **object types** in Python and how to use them. Whenever we work with Python, we will be manipulating different kinds of information, which is referred to as an "object". In Python, objects can contain both data and functions we can use on our data. These functions are sometimes called **methods**. - -We can take a look at what an object holds, oth in terms of data and methods, by typing `.` after the object's name and hitting `tab`. This is typically called *tab completion* or *introspection*. Doing this will list out a few different options that you can scroll through. See below: +The first thing we can do in Python is basic math. When our variables are numbers (either `int` or `float`), we can easily use Python to add, subtract, and divide numbers (there are many more things Python can do!). Whenever possible, Python will return an integer type for operations between integers, but any operation involving a float will result in a float. -![](media/introspection.png) +Python also allows us to do multiple operations in one line. As you would expect, it follows the standard order of operations PEMDAS (parentheses, exponents, multiplication, division, addition, subtraction). -Try doing this yourself below. Try to find the method `split`. Once you have found it, you can use the method by adding parentheses after it. +If we want to use other math functions, such as taking the sine or cosine of a number, we need to use the `math` package. We will look at this more in detail later, but to use a package, we write the package name followed by a `.` and then we can click [TAB] to look through the available functions. So, if we wanted to take sin(2), we would do the following: ```{python} -full_name = "enter your name" -full_name -``` +import math # don't forget to import the package first! -Can you tell what this method does? +math.sin(2) +``` -Another common thing we will want to do with variables and objects is find out their **type**. The **type** of data will determine what kinds of methods and functions we can perform. We will go over each type in depth at a later stage. To figure out what type of object we are working with, we can use the `type()` function. This is a simple function where the input (what goes in between the parentheses) is the object we want to study. Let's try below with the variable `first_name`. +The last mathematical operation we will look at is floor and modulus division. This relates to division concepts we learned when initially going through division: quotients and remainders. Let's say we are dividing `x` by `y`, two numbers. -```{python} -type(first_name) -``` +- Floor division: `x // y` will return the quotient, or the number of times the divisor goes into the dividend. +- Modulus division: `x % y` will return the remainder. -The variable `first_name` is a **string** (`str`). We will look at what that means a bit later. +#### Working with Strings -### 2.1. Numbers +Some of the arithmetic operations we learned above can also be used for strings. +- We can put two strings together by using `+` +- We can repeat a string `n` times by writing `n*variable` -The first type of data we will look at is **numbers**. Python has two types of numbers. +Note that if we were to try to use `*`, `-`, or `/` with strings, Python would return an error. -1. Integers, known as `int` in Python, can only take the values of integers: `{... -157, ..., -1, 0, 14, 267, ...}`. -2. Floating Point Numbers, known as `float`, can be any real number, such as `-1.5435, 3.14, 12, -10234.4573835`. +There are a ton of different non-arithmetic operations we can perform on strings. We won't be able to look at the all in this notebook, but again if you want to take a look, simply type a `.` after the variable and press [TAB]. You can then scroll through some of the methods. -The easiest way to tell the difference between these two types of numbers is the presence of a decimal point. Integers do not have decimals, floats do. Below, we will create some `int` and `float` variables. +We'll list out a few of the essential methods below. These are ones that you will frequently see when working with textual data. ```{python} -x_int = 1 -x_float = 1.0 -z_int = 123 -z_float = 1230.5 -z_float2 = 1_230.5 # Note that if we want to use commas to separate larger numbers - # (e.g. to indicate the thousands), we use an underscore _ instead of a comma. +x = "Good Morning" + +# Again, you don't need to write print() each time. We are only doing this so that you can see each output and compare +print(x.lower()) # makes everything lower case. don't forget the extra parentheses!! +print(x.upper()) # makes everything upper case. don't forget the extra parentheses!! +print(x.count("o")) # counts how many times a particular string appears +print(x.count("ing")) ``` -### 2.2 Strings +The next operation we'll look at involves repeating strings. -Strings store all text data. For data to be stored as a string, it must be written within quotation marks. For example, +Let's say we have a dataset that contains many dates and the weather, and we want to write a sentence saying "Month Day, Year was X", where "X" would be the weather. We can do this easily with **string formatting**, which allows us to use a generic **placeholder** to replace each individual value of the variable. To do so, we would replace the specific value we are thinking of with the variable name in squiggly brackets `{}` when we are writing out the sentence. Let's take a look at an example. ```{python} -"this is a string" # quotation marks! -'this is also a string' # ' ' also works! -this is not a string # do you see the syntax warning? +month = "January" +day = "24th" +year = "2025" +X = "sunny" + +# we include f at the beginning for Python to interpolate what is between the `{}` +sentence = f"{month} {day}, {year} was {X}." +print(sentence) ``` -As we saw above when learning the `type()` function, Python identifies strings as `str`. You can see this again below: +We can also use string formatting for calculations. To do so, we'd substitute the calculation we want to do into the `{}`. + +We can do this for basic math as such: ```{python} -type(first_name) +print(f"{5}**2 = {5**2}") ``` -### 2.3 Booleans +And we can do this within sentences as well. Let's say we want Canada's 2024 GDP ($2.515 trillion) in billions. Instead of typing out the number, we could just write + +```{python} +GDP = 2.515 +string = f"Canada's GDP was ${GDP * 1_000} billion in 2024" +print(string) +``` -Booleans are the last data type we will look at. Booleans simply denote true or false. They will be useful later on, when we start working with operations. For now, let's create some booleans and see what they look like. +If we want to reuse a string we created a format for earlier, we can use the method `format`. To do this though, we do **not** put `f` before the string. For example, ```{python} -x = True # notice how we aren't putting True in quotation marks. What do you think would happen if we did? -y = False +sentence = "{month} {day}, {year} was {X}." + +sentence.format(month = "February", day = "28th", year = 2016, X = "rainy") ``` +There's a lot more we can do with string formatting. If you want to learn more, you can find information [here](https://docs.python.org/3/library/string.html). + +#### Working with Booleans + +Most of the time, booleans will be created through comparison operations. For example, we might want a variable that evaluates if someone is older than 18. We would thus create a boolean which is `True` if the age of the individual is greater than 18. + +For two variables `x` and `y`, we can do the following comparisons: + +- Greater than: `x > y ` +- Less than: `x < y` +- Equal to: `x == y` +- Greater than or equal to: `x >= y` +- Less than or equal to: `x <= y` + +Sometimes, we will prefer to determine if a statement is **not true** or **not false**. This is called **negating** a statement. We can negate a boolean in Python by adding the statement `not` before `True` or `False`. + +We might also want to do multiple comparisons. For example, we might want to look at all individuals who are employed **and** have children, or we might want to individuals with cars **or** bicycles. We can do this in Python using the *mathematical* **ands** and **ors**. This means that + +- `a` and `b` are `True` only when **both** `a` **AND** `b` are `True` +- `a` or `b` is `True` when **at least one** of `a` or `b` is `True` + +Using the examples above: +- The statement "We are studying individuals who are employed *and* have children" means we will only include individuals that are both employed *AND* who have children +- The statement "We are studying people with cars *or* bicycle" means we will look at everyone who has a car, everyone who has a bicycle, and everyone who has a car and a bicycle. + +We can also process *all* our booleans at once! + +To do so, we can use the functions **all** and **any**. These functions will test all the booleans we're working on at the same time. + +- `all(bools)` will tell us if all the booleans in `bools` are `True`. If they are all true, Python will return `True`. If even one of them is not true, Python will return `False`. +- `any(bools)` will tell us if we have any booleans that are `True`. If even one of our booleans is true, Python will return `True` when running this command. + +#### Switching between Object Types + +Sometimes, we will want to transform our variables from one type to another. + +To convert a variable to a string, we use `str()`. + +To convert a variable to an integer, we use `int()`. + +To convert a variable to a float, we use `float()`. + +### 1.2: Collections + +Often, when we are working with many different variables, we will need to store all of our objects in a larger, more complex form, called **collections** in Python. Typically, we will store these objects in either a `list[]` or a `tuple()`. These two object types contain *ordered* collections of items. + +The main differences between tuples and lists are: + +- Tuples are created using parentheses instead of square brackets -- ( ) instead of [ ]. +- Tuples are *immutable*, meaning they can't be changed after they are created. +- Tuples are connected to multiple return values, which we will see later. + +Lists can be converted to tuples by using the `tuple()` function on a list. Tuples can be converted to lists by using the `list()` function. + +#### Working with Lists + +Let's take a look at some of the main operations we can do with lists. + +- Indexing: we can **index** a list called `mylist` by using the command `mylist[i]`, where `i` is an integer. What we are doing here is selecting an element of the **collection** `mylist`. It's important to note that **Python starts counting at ZERO!** Thus to access the first element of our list, we must write `mylist[0]`. + - We can combine indexing with the `type()` function to access the type of a specific entry within a list. +- Counting: we can determine how many items are in a list using the function `len`. +- Containing: we can us the keyword `in` to check if a list contains a specific entry. +- Reversing: we can reverse the list with `mylist.reverse()`. +- Sorting: we can sort the list with `mylist.sort()`. + - Note that to do this, all of the elements in our list need to be either numbers (integers or floats), or all strings. If our list is all strings, `mylist.sort()` will sort the items in alphabetical order. +- Appending: add another element to the end of the list using `mylist.append(element)`. + - Note that is we append the list with another list, we add the list itself to the end rather than the numbers or strings in that list. We can *combine* lists instead using `mylist.extend()`. +- Range: the `range()` function is one used frequently in three ways: + - `range(n)` goes from $0$ to $n-1$. + - `range(a, n)` goes from $a$ to $n-1$. + - `range(a, n, d)` goes from $a$ to $n-1$, at the interval `d`. + We can also save objects as ranges. + +Although Python will let you make lists with multiple object types (e.g. strings and numbers), be warned that you may run into some additional complications. For example, how would you sort a list that has both letters and numbers? Therefore, we suggest being extra careful. + +#### Working with Tuples + +As with lists, we can index tuples. We write the function out the same, and remember: **Python starts counting at ZERO!** + +#### Lists or Tuples? + +Which one should we use? It depends on many factors. + +- What types of objects you are storing; +- Whether we want to reorder elements; +- Whether we want to add elements later on. + +Let's look at an example to understand this. + +Say we have an individual, Jerry, and some information on him: his age, weight, height, and favourite ice cream flavour. In this case, we would want to use a *tuple*, because the order between the elements is meaningless (i.e. we could have put Jerry's height before his weight), and adding more data would require a reinterpretation of the whole data structure (i.e. we don't know what this new element would mean!). + ```{python} -type(x) +jerry_2024 = ("Jerry", 2024, 80, 183, "vanilla") +print(jerry_2024) ``` +However, if we are looking at Jerry's favourite ice cream flavour over his whole life, a *list* would be a lot more useful! Adding a new element at the end, say Jerry's favourite ice cream flavour the next year, would make sense and would not change the meaning of the dataset. + ```{python} -x +jerry_2023_2025 = ["strawberry", "vanilla", "mint"] +print(jerry_2023_2025) ``` +Some things might be best as tuples *and* lists! For example, what if we wanted Jerry's age, weight, height, and favourite ice cream flavour *over many years*? + ```{python} -y +jerry = [(2023, "strawberry"), (2024, "vanilla"), (2025, "mint")] +print(jerry) ``` -## 3. Operations +Overall, we suggest *always* using a list unless you *need* to use a tuple. You would need to use a tuple if: -We'll now take a look at different operations or functions we can run in Python. Many functions are type-dependent, meaning they only work for certain variable types, so we will go through each type and their associated functions. +- the order of each element *cannot* change; +- the actual values of each element *cannot* change; +- or you want to use the collection as a key in a `dict`, which we will see soon. -### 3.1. Operations on Numbers +#### `zip` and `enumerate` -#### Python as a calculator +`zip` and `enumrate` are two useful functions that combine lists and tuples. -The first thing we can do in Python is basic math. When our variables are numbers (either `int` or `float`), we can easily use Python to add, subtract, and divide numbers (there are many more things Python can do!). Let's give this a try below. +The `zip` function allows us to make a list of tuples, where each entry is a tuple. This is best understood through an example. + +Let's say we have two lists, one containing years and the other containing GDPs for each year. ```{python} -a = 12 -b = 2 - -print(a + b) # We ask Python to print all the answers so we can see everything. -print(a - b) # Otherwise, Python would only show us the most recent result. -print(a * b) -print(a / b) -print(a ** b) # Note that Python uses ** for exponentiation. +gdp = [2.161, 2.142, 2.515] +year = [2022, 2023, 2024] +z = zip(year, gdp) +# let's see what's inside by converting the zip to a list +list(z) ``` -Did you notice that Python returned an integer for almost every operation? It did however convert the answer to a float for the division operation. Whenever possible, Python will return an integer type for operations between integers, but any operation involving a float will result in a float. See below +Notice now we have a list where each item is a tuple! Each tuple contains one entry from the two collections we passed into the zip function. Note that the first entry from `gdp` is matched with the first entry of `year`, and so forth. We can access an element of the zip and then unpack the resulting tuple directly into variables. + +The `enumerate` function assigns an index to each collection we put in the function. See the example below: ```{python} -a = 12.0 -b = 2 - -print(a + b) -print(a - b) -print(a * b) -print(a / b) -print(a ** b) +e = enumerate(["a", "b", "c"]) +list(e) ``` -Python also allows us to do multiple operations in one line. As you would expect, it follows the standard order of operations PEMDAS (parentheses, exponents, multiplication, division, addition, subtraction). We'll try this below - what do you think c and d will be? You can check your answers by using the `print()` function. +See that the first element in the list is the *index* of the second element in the tuple, which is -```{python} -a = 4 -b = 7 +#### Associative Collections - Dictionaries -c = a - b / b -d = (a - b) / b -``` +Dictionaries associate keys with values, similarly to how our dictionaries associate words to definitions. To create a dictionary, do the following: -#### Other math functions +`example_dictionary = {key1: value1, key2: value2, key3: value3}`. Note the use of curly brackets here. What is crucial to recall when using dictionaries is that the syntax relies on pairs of keys and values, each separated by commas. They key is typically a string, whereas the value will be anything. -If we want to use other math functions, such as taking the sine or cosine of a number, we need to use the `math` package. As we learned earlier, to use a package, we write the package name (or our nickname for it) followed by a `.` and then we can click [TAB] to look through the available functions. So, if we wanted to take sin(2), we would do the following: +To find the value of a particular key, we use `d[k]`, where `d` is the dictionary and `k` is the particular key we want to know the value for. We can add new items to the dictionary using `d[new_key] = new_value`. -```{python} -import math # don't forget to import the package first! +There are a few common functions we will do with dictionaries. -math.sin(2) -``` +- `update()` adds new key-value pairs to the dictionary, replacing any duplicate keys with values from newdict, or combines two dictionaries +- `keys()` checks if a key is in the dictionary or loops through keys in the dictionary +- `values()` checks in a value is in the dictionary or loops through values in the dictionary +- `items()` loops through the key and value pairs in a dictionary +- `len(dict)` gives us the number of key-value pairs in the dictionary +- `list(dict.keys())` lists out all the keys +- `list(dict.values())` lists out all the values +- `get()` gets the value associated with the first key. -#### Floor and Modulus Division +## Part 2: Flow Control -The last mathematical operation we will look at is floor and modulus division. This relates to division concepts we learned when initially going through division: quotients and remainders. Let's say we are dividing `x` by `y`, two numbers. +Flow control statements tell Python which lines of code to run or not to run and when. To do this, we will use **booleans** which we discussed before. -- Floor division: `x // y` will return the quotient, or the number of times the divisor goes into the dividend. -- Modulus division: `x % y` will return the remainder. +In general, flow statements will look like this: -Let's try it below. Can you figure out what the operations will return if x is 44 and y is 7? +`In this case:` +` Do this action` -```{python} -x = 44 -y = 7 +Typically, to define the case(s), we use our booleans to set a condition, followed by a colon (:). Then, on an indented line, we'd write out the action we want completed. If the condition is evaluated as `True`, Python will execute the action, whereas if the condition is evaluated as `False`, Python will skip the action line. It's important to remember the indentations, because this will determine if the code will crash. If you do not indent the action, Python will assume their is no action to do following the condition, and the flow statement will crash. -#print(x // y) #uncomment to check your answers -#print(x % y) -``` +### 2.1: Types of Flow Control Statements -### 3.2. Operations on Strings +|Statement | Meaning| +|---------|------------| +|`if` |run *if* the condition is fulfilled| +|`elif`| run *if* no previous conditions were met *and* this condition is met (i.e., else if| +|`else` |run if no condition is met -- no need to specify a condition| +|`while`|run *while* the the condition is true| +|`for` |run the code in a loop *for* this many times| +|`try` |*try* this and run the *except* action if there is an error| -#### Arithmetics +### 2.1.1: `if` statements -Some of the arithmetic operations we learned above can also be used for strings. -- We can put two strings together by using `+` -- We can repeat a string `n` times by writing `n*variable` +The most basic flow control statement we will see is the `if` statement. + +`if` statements begin with a statement that will be evaluated as true or false. **If** the statement is *true*, the Python will do the action, and **if** the statement is *false*, Python will skip this action. -Let's try this below! +Let's look at this with a very simple example. For this example, we want to create a statement that evaluates whether someone is a student or not. ```{python} -x = "Good" -y = "Morning" +student = input('Are you a student? (Yes or No)') # here we are creating a variable student to hold the user's input in a string -print(x + y) -print(3 * x ) +if student == 'Yes' or student == 'yes': # we are using an if statement: if the variable student has the input yes + print('Good luck in your classes!') # print a response ``` -Note that if we were to try to use `*`, `-`, or `/` with strings, Python would return an error. You can see for yourself below. +This statement will only produce an output if the user is a student. If they are not and type anything other than 'yes', the program ends. We can change this though, using what is called an `else` statement. -```{python} -#x * y # uncomment and run these lines to see the error message -#x - y -#x / y -``` +#### 2.1.2: `else` statements -#### String methods (operations) +`else` statements are slightly different from `if` statements. They don't require a condition: this means they run when nothing else works, no matter what. Following the example we did above, we have the block -There are a ton of different operations we can perform on strings. We won't be able to look at the all in this notebook, but again if you want to take a look, simply type a `.` after the variable and press [TAB]. You can then scroll through some of the methods. +`if someone is a student:` +` perform this action` -We'll list out a few of the essential methods below. These are ones that you will frequently see when working with textual data. +We can now add to this an `else` statement + +`else:` +` perform this action`. + +What this is doing is telling Python that if the user inputs anything other than 'yes' in the variable `student` (as we defined before), we'd like Python to perform a new action. Let's see it in action! ```{python} -x = "Good Morning" +# we'll use the same code from above +student = input('Are you a student? (Yes or No)') # here we are creating a variable student to hold the user's input in a string -# Again, you don't need to write print() each time. We are only doing this so that you can see each output and compare -print(x.lower()) # makes everything lower case. don't forget the extra parentheses!! -print(x.upper()) # makes everything upper case. don't forget the extra parentheses!! -print(x.count("o")) # counts how many times a particular string appears -print(x.count("ing")) +if student == 'Yes' or student == 'yes': # we are using an if statement: if the variable student has the input yes + print('Good luck in your classes!') # print a response +else: + print('What do you do?') ``` -#### String Formatting +#### 2.1.3: `elif` statements -The next operation we'll look at involves repeating strings. +Sometimes, we might not want the second condition to be run no matter the input: we might want the second condition to be run *specifically if another condition is satisfied*. In this case, we will want to use an *else if* statement, which is written as `elif` in Python. -Let's say we have a dataset that contains many dates and the weather, and we want to write a sentence saying "Month Day, Year was X", where "Month" would be the month, "Day" would be the date, "Year" would be the year and "X" would be the weather. We can do this easily with **string formatting**, which allows us to use a generic **placeholder** to replace each individual value of the variable. To do so, we would replace the specific value we are thinking of with the variable name, in squiggly brackets `{}`, when we are writing out the sentence. Let's take a look at an example. +The `elif` statement will always come after an `if` statement and before `else` statements. You can include as many `elif` statements as you would like. Combining the three sections before, `elif` statements will look like this: -```{python} -month = "January" -day = "24th" -year = "2025" -X = "sunny" +`if condition A is True:` +` perform action A` +`elif condition B is True:` +` perform action B` +`else:` +` perform action C` -sentence = f"{month} {day}, {year} was {X}." -print(sentence) -``` +Remember, `else` statements do not take conditions! -The `f` at the beginning of the string allows Python to interpolate what is between the `{}`. +>**Note**: `elif` statements are read from top to bottom. This means that Python will only evalue the first `elif` statement that is true. All of the following `elif` statements will be **ignored**, even if they are true. This means that only **one** action is executed when using `elif` statements. -We can also use string formatting for calculations. To do so, we'd substitute the calculation we want to do into the `{}`. +#### 2.1.4: `while` loops -We can do this for basic math as such: +`while` loops are commands that are run repeatedly until a certain statement is evaluated as true. They are run as follows: -```{python} -print(f"{5}**2 = {5**2}") -``` +`while this condition is True:` +` do this action` -And we can do this within sentences as well. Let's say we want Canada's 2024 GDP ($2.515 trillion) in billions. Instead of typing out the number, we could just write +What Python will do is evaluate the condition, and if the condition is true, it will run the action. Then, Python returns to the condition and evaluates that same condition again, continuining to do the action until the condition is evaluated as false. + +Let's look at an example of this. We'll ask users to guess a number, and loop "guess again" until they get the number right. ```{python} -GDP = 2.515 -string = f"Canada's GDP was ${GDP * 1_000} billion in 2024" -print(string) -``` +# let's set the number to 8 +secret_number = str(8) -If we want to reuse a string we created a format for earlier, we can use the method `format`. To do this though, we do **not** put `f` before the string. For example, +guess = input('pick a number from 1 to 10') -```{python} -sentence = "{month} {day}, {year} was {X}." +# now, we check the users answer! +while guess != secret_number: # while the guess is not equal to the secret number + guess = input('guess again!') # allow the user to change the number -sentence.format(month = "February", day = "28th", year = 2016, X = "rainy") +print('Correct!') ``` -There's a lot more we can do with string formatting. If you want to learn more, you can find information [here](https://docs.python.org/3/library/string.html). +>**Note**: Beware of infinite loops! These arise when the condition in the `while` loop *never* evaluates as False. To stop these loops from running forever, you'll want to interrupt the kernel. Click on **Kernel** and then **Interrupt** the kernel. -### 3.3. Operations on Booleans +#### 2.1.5: `for` loops using `range()` -#### Comparisons +Let's say we want to repeat a loop a specific number of times. One way to do this is using the `while` loops we were discussing before. See below: -Most of the time, booleans will be created through comparison operations. For example, we might want a variable that evaluates if someone is older than 18. We would thus create a boolean which is `True` if the age of the individual is greater than 18. +```{python} +# this loop will print out numbers from 0 to 4 +i = 0 # this counts how many loops have been completed -For two variables `x` and `y`, we can do the following comparisons: +while i < 5: + print(i) + i = i + 1 +``` -- Greater than: `x > y ` -- Less than: `x < y` -- Equal to: `x == y` -- Greater than or equal to: `x >= y` -- Less than or equal to: `x <= y` +There is a simpler way to do this, however, using `for` and `range()`, we can do this in a simpler way! In general, this will look like this: -We'll try this below with a simple example. Feel free to change the values to see for yourself! +`for i in range(j): ` +` do this action` -```{python} -x = 1 -y = 2 +where `i` is a generic variable for counting and `j` is the number of times you want to repeat the action. Remember, **Python starts at zero**, so the starting value of `i` is always 0, and `i` increases by once until it reaches `j`. `i` and `j` are placeholders. You can generall put anything there. Let's look at an example below: -print("x > y is", x > y) -print("x < y is", x < y) -print("x <= y is", x <= y) +```{python} +for num in range(4): + print(num) ``` -#### Negation +### 2.1.6: `continue`, `break`, `try`, and `except` -Sometimes, we will prefer to determine if a statement is **not true** or **not false**. This is called **negating** a statement. We can negate a boolean in Python by adding the statement `not` before `True` or `False`. +Four other flow commands we will work with are: -```{python} -print(not True) -print(not False) -``` +- `continue`: this immediately restarts the loop +- `break`: this immediately ends and exits the loop +- `try`: this is useful when we are running code that may have an error and we want to test the workflow +- `except`: this is used with `try` to return an output if the `try` statement returns an error -#### Multiple Comparisons using **and** / **or** +## Part 3: Functions -Sometimes, we will want to do multiple comparisons. For example, we might want to look at all individuals who are employed **and** have children, or we might want to individuals with cars **or** bicycles. We can do this in Python using the *mathematical* **ands** and **ors**. This means that +Functions are another important part of any coding language. We touched upon some basic functions above (like adding two numbers), but here, we will dive into more complex functions that we can use with our data. There are 3 big reasons why we might want to use a function. -- `a` and `b` are `True` only when **both** `a` **AND** `b` are `True` -- `a` or `b` is `True` when **at least one** of `a` or `b` is `True` +1. Reusability: We might have a set of tasks we need to do multiple times. It is easier to write a function to do these tasks and have the function run over the data rather than manually redoing each task. +2. Organization: It will be useful to keep different operations organized and separated to keep track. +3. Sharing: It is easier to share functions than to share each individual step of code. Functions can be run on different datasets, so this allows others to use them. -Using the examples above: -- The statement "We are studying individuals who are employed *and* have children" means we will only include individuals that are both employed *AND* who have children -- The statement "We are studying people with cars *or* bicycle" means we will look at everyone who has a car, everyone who has a bicycle, and everyone who has a car and a bicycle. +### 3.1: How to Write a Function -Let's test these out below. Try thinking of the answers yourself before running the code! +Functions will all follow the same syntax. ```{python} -True and False # do you think this will be true or false? +def function_name(inputs): + # step 1 + # step 2 + # ... + return outputs ``` -```{python} -True and True -``` +Here, `def` tells Python that we are defining a new function, and `return` tells Python what to show us. + +Here is a simple example in which we calculate the average of some numbers. Note that we are naming the function `mean`, the inputs will be a list `numbers`, and the function has 3 steps. It will return `answer`. ```{python} -True or False +def mean(numbers): + total = sum(numbers) + N = len(numbers) + answer = total / N + + return answer ``` +We can then **call** the `mean` function by doing the following: + ```{python} -False or False +x = [1, 2, 3, 4] +avg = mean(numbers) ``` +Here, we are assigning values to `x` which will represent the input, `numbers`, and calling the output `avg`. + +It's important to note that we can have many inputs in a function. +Additionally, it will be important to keep track of our *indentations*. This will determine what is and what is not a part of our function. + +### 3.2: Variable Scope + +Notice how when we defined the function, the input was `numbers` and the output was `answer`, whereas when we called the function the inputs were `x` and the output `avg`. This has to do with `variable scope`. + +In Python, functions have scopes for variables. This means that regardless what we call the input or the output (which we called `x` and `avg`), the function itself reads the input as `numbers` and the output as `answer`. It also means that the `numbers` and `answer` we refer to in the function **only exist within the function**. See below: + ```{python} -# we can chain multiple comparisons! -# order of operations will still apply -True and (False or True) +#print(numbers) +#print(answer) ``` -#### **all** and **any** +What this tells us is that if we want to use the output of a function, we need to save it under a new variable. We cannot manipulate `answer` without assigning it to a new variable. For example, we assigned it to `avg`. We don't have to do this - we can still see the output without assigning it to something else. However, if we wanted to use the answer for another operation, we would need to save it. -As we saw, we can use **and** and **or** to process two booleans at a time. We can also process *all* our booleans at once! +This also applies to the other intermediate steps within the function. For example, we can't access `total` and `N` - Python will return an error. -To do so, we can use the functions **all** and **any**. These functions will test all the booleans we're working on at the same time. +This means that we can reuse names. `total` can be used for both a part of the function, and as the name we assign to the output. -- `all(bools)` will tell us if all the booleans in `bools` are `True`. If they are all true, Python will return `True`. If even one of them is not true, Python will return `False`. -- `any(bools)` will tell us if we have any booleans that are `True`. If even one of our booleans is true, Python will return `True` when running this command. +```{python} +def mean(numbers): + total = sum(numbers) + N = len(numbers) + answer = total / N + + return answer + +y = [10, 11, 15, 523] +total = mean(y) +``` -## 3. Packages +## Part 4: Packages Similarly to R, Python has a host of packages that we can use that contain different functions and tools. Some examples of packages are: - pandas, which implements the tools necessary to do scalable data analysis. - - matplotlib, which contains visualization tools. - - requests and urllib, which allow Python to interface with the internet. We'll be using packages all throughout the Python modules, so it will be important to learn how to install them and how to open them. To install a package, we simply run the code `import package`. To access and open the package that we've installed in our session, we run the code `package.function_name`. Let's do this below with the package `sys`, which helps Python work with our computer. + ```{python} import sys sys.version # We want to find the Python version our computer is using @@ -446,18 +573,15 @@ Some packages have fairly long names, so Python has allowed us to abbreviate the Typically, people use the following nicknames for packages: - import pandas as pd - - import numpy as np - - import matplotlib as mpl - - import datetime as dt In theory, you can abbreviate the packages to any nickname you'd like, but for simplicity and comprehensibility, we recommend using the common nicknames listed above. -## Part 3: Dealing with Errors and Getting Help +## Part 5: Dealing with Errors and Getting Help -### 3.1. Errors +### 5.1: Errors Sometimes in our analysis we can run into errors in our code. This happens to everyone - don't worry - it's not a reason to panic. Understanding the nature of the error we are confronted with can be a helpful first step to finding a solution. There are two common types of errors: @@ -467,7 +591,7 @@ Sometimes in our analysis we can run into errors in our code. This happens to ev Now that we have all of these terms and tools at our disposal, we can begin to load in data and operate on it using what we’ve learned. -### 3.2. Getting Help +### 5.2: Getting Help If you are ever running a function and get stuck, are not sure what the function does, or need a refresher on what the inputs are, Python has a way to get help.