Exercise 1: Warmup

Learning Goals

When you’ve finished this exercise, you’ll have some basic experience with RStudio, Python, data frames, and plotting.

Getting Help

If you haven’t yet done this week’s tutorials, I recommend finishing them before starting this activity. You may want to have them open for reference too.

If you have any questions about the assignment, please post them on Perusall!

Getting started

  1. Launch RStudio by clicking on the link in Moodle and starting a new Session.
  2. If you haven’t already done so, create a new project for your class exercises; call it data202-exercises and place it in the rprojects folder in your home folder.
  3. Create an ex01 folder in your project (the New Folder button is on the toolbar of the Files pane).
  4. Create a new Quarto document called ex01.qmd in the folder you just created.
  5. Make sure the Source mode is selected in the top left corner of the editor pane. (the next step won’t work in the Visual mode, but you can switch back to Visual mode afterwards.)
  6. Expand the block below, click the Copy button in the top-right, then Select All in your ex01.qmd document and paste the copied text, replacing all existing text in the document.
---
title: "Title Goes Here"
author: "Author Name(s) Goes Here"
format:
  html:
    embed-resources: true
    code-tools: true
    code-fold: true
---

# Load Packages

```{python}
import pandas as pd
import plotly.express as px
```


```{python}
#| include: false
# DATA 202 hack for displaying plotly within RStudio:
if 'r' in globals() and r['.Platform$GUI'] == "RStudio" and r['suppressMessages(requireNamespace("htmltools"))']:
  r[".GlobalEnv$to_html <- function(x) { print(htmltools::HTML(x)) }"] and None
  def show_plot(p): r.to_html(p._repr_html_())
else:
  def show_plot(p): return p
# End hack
```


# Read Data

```{python}
daily_rides = pd.read_csv("https://calvin-data-science.github.io/data202/data/bikeshare/day_simple.csv", parse_dates=["date"])
```

# Exercise 1

Replace this sentence with your short answer about rows and what each row represents.

# Exercise 2

The following plot shows the total number of rides each day.

```{python}
# your code here
```

# Exercise 3

The following plot shows the total number of rides for weekdays vs weekends.

```{python}
# your code here
```



# Exercise 4

Replace this sentence with your short answer about interpreting the graph.

# Reflections

Write a sentence or two of your reflections on this exercise here.

The document contains placeholders for the following exercises.

The Data

Let’s imagine we’re hired by the administrators of the Capital Bikeshare program program in Washington D.C., to help them understand and predict the hourly demand for rental bikes. We will be working with this data in several activities throughout this semester.

This understanding will help them plan the number of bikes that need to be available at different parts of the system at different times, to avoid the case where someone wants a bike but the station is empty, or someone wants to return a bike but the station is full.

The data for this problem were collected from the Capital Bikeshare program over the course of two years (2011 and 2012). Researchers at the University of Porto obtained it from the Capital Bikeshare data website, processed the data, and augmented it with extra information, as described on this page. It has been simplified slightly for this assignment.

Load and Explore Data

The data is in a CSV (comma separated values) file that we’ve pre-wrangled for you (you’ll do your own wrangling later). The following code block, already included, uses the read_csv function in pandas to load the data into a data frame called daily_rides.

```{python}
daily_rides = pd.read_csv("https://calvin-data-science.github.io/data202/data/bikeshare/day_simple.csv", parse_dates=["date"])
```
Note

The parse_dates argument tells pandas to convert the date column to a date object, rather than leaving it as a string. (CSV files don’t usually specify the data type of each column, so we need to tell pandas to interpret those as dates and not as strings like “2011-01-01”.)

The variables and their descriptions are given below:

Note: the original dataset also includes rides by users who didn’t register for a Capital Bikeshare account. That data has been removed for this first assignment to keep things simple.

  • date
  • day_of_week: an integer between 0 and 6 inclusive. In a later exercise you will decode which number represents Monday, etc.
  • workingday: either “weekday” or “weekend” (which actually includes holidays too).
  • total_rides: the number of rides that day by users who registered for a Capital Bikeshare account
  1. Write a sentence answering the following two questions: How many rows does the dataset contain? What does each row represent?
Important

Write the answer outside a code chunk.

There are 5 rows in the dataset. Each row represents a different type of fruit.

Not like this:

There are 5 rows in the dataset

And not like this:

```{python}
# There are 5 rows in the dataset
```

Once you’re confident with this, you can try using an asis-output code chunk; see the class notes.

Click Render to check that you can view the rendered output of your document.

  1. Create a scatter plot showing the total number of rides each day. Sample code is provided below, but you will need to fill in the blanks.
```{python}
show_plot(
  px.scatter(
    daily_rides,
    x="____", y="____",
    trendline="lowess",
    labels={"date": "____", "____": "Number of Rides"},
    ))
```

Let’s deconstruct this code:

  • show_plot() is a hack we’re using in this class to display plots within the RStudio IDE. Once this bug in RStudio is fixed we can remove this hack.
  • px.scatter is the Plotly Express (px) function we’ll use to easily construct a scatter plot. We always specify the data frame as the first argument.
  • The next arguments define the mappings between the variables in the dataset and the aesthetics of the plot (e.g. x and y coordinates, colors, etc.).
  • The trendline argument tells Plotly Express to add a smoothing trend line to the plot. (Try removing this argument to see what happens.) Lowess is a smooth curve; you can try switching it to ols to see a linear trend line.
  • Finally, the labels argument lets us make the visualization more hospitable by specifing the labels for the x and y axes. We pass a dictionary with the column names as keys and the labels as values.

The result should look like:

Click Render to check that you can view the rendered output of your document.

  1. Add a new code chunk that creates the same plot again, but add a mapping of workingday to the color aesthetic. Your result should look like:

You should start by coping and pasting from your previous code chunk.

Don’t forget to separate arguments with commas (,).

  1. Write a one or two sentence interpretation of the graph, focusing on the following question: How do the number of rides compare for weekdays vs weekends? Based on this, make a guess about what Capital Bikeshare riders use the bikes for.

Click Render to check that you can view the rendered output of your document.

Reflection

At the end of your document, write a sentence or two of your overall reflections on this exercise. You may write whatever you want, but you might perhaps respond to one or two of these questions:

  • Was anything unclear about this assignment?
  • How hard was it for you? Where did you get “stuck”?
  • How long did it take you?
  • What questions or uncertainties remain?
  • What skills do you think you’ll need more practice with?
  • Did you try anything out of curiosity that you weren’t specifically asked to do?
Note

We’ll respond to these reflections in class, but only in an overall sense and only once we’ve had a chance to review them. If you have a question that needs a response, please post it on EdStem.

Submitting

First, make sure that your qmd file is free of template text. (Did you change the title and author names? Did you remove the “your code here” lines? Did you delete any “replace this line” text?)

Make sure that your qmd file renders successfully by clicking Render. Spot-check that your most recent change is reflected in the rendered output. Then, submit your qmd file to Moodle. Here’s how:

How to export your document from the Posit server
  1. Find the Files pane in the lower-right pane of RStudio.
  2. Find your .qmd file.
  3. Tick the checkbox next to the file name.
  4. Under the More button (a gear icon), choose Export.
  5. Click the Download button.
  6. Upload the resulting file to Moodle (in some browsers, you can drag and drop it onto the Moodle page).

Going Further

Once you’ve finished this exercise, you might try:

  1. Normally we’ll leave code displayed, but you might try hiding the code for some of the plots.
  2. Can you use code to get the number of rows, instead of typing it out? See “Putting computed results in the document” in the notes.
  3. Can you try out some of the other options for px.scatter? Maybe make the dots smaller? See the plotly reference materials linked to from the notes
  4. Can you reproduce the Gapminder “health and wealth” plot from the video?