```{python}
daily_rides = pd.read_csv("https://calvin-data-science.github.io/data202/data/bikeshare/day_simple.csv", parse_dates=["date"])
```
Exercise 1: Warmup
Learning Goals
When you’ve finished this exercise, you’ll have some basic experience with RStudio, Python, data frames, and plotting.
Getting Help
If you haven’t yet done this week’s tutorials, I recommend finishing them before starting this activity. You may want to have them open for reference too.
If you have any questions about the assignment, please post them on Perusall!
Getting started
- Launch RStudio by clicking on the link in Moodle and starting a new Session.
- If you haven’t already done so, create a new project for your class exercises; call it
data202-exercises
and place it in therprojects
folder in your home folder. - Create an
ex01
folder in your project (the New Folder button is on the toolbar of the Files pane). - Create a new Quarto document called
ex01.qmd
in the folder you just created. - Make sure the
Source
mode is selected in the top left corner of the editor pane. (the next step won’t work in the Visual mode, but you can switch back to Visual mode afterwards.) - Expand the block below, click the Copy button in the top-right, then Select All in your
ex01.qmd
document and paste the copied text, replacing all existing text in the document.
---
"Title Goes Here"
title: "Author Name(s) Goes Here"
author: format:
html:-resources: true
embed-tools: true
code-fold: true
code---
# Load Packages
```{python}import pandas as pd
import plotly.express as px
```
```{python}#| include: false
# DATA 202 hack for displaying plotly within RStudio:
if 'r' in globals() and r['.Platform$GUI'] == "RStudio" and r['suppressMessages(requireNamespace("htmltools"))']:
".GlobalEnv$to_html <- function(x) { print(htmltools::HTML(x)) }"] and None
r[def show_plot(p): r.to_html(p._repr_html_())
else:
def show_plot(p): return p
# End hack
```
# Read Data
```{python}= pd.read_csv("https://calvin-data-science.github.io/data202/data/bikeshare/day_simple.csv", parse_dates=["date"])
daily_rides
```
# Exercise 1
with your short answer about rows and what each row represents.
Replace this sentence
# Exercise 2
The following plot shows the total number of rides each day.
```{python}# your code here
```
# Exercise 3
for weekdays vs weekends.
The following plot shows the total number of rides
```{python}# your code here
```
# Exercise 4
with your short answer about interpreting the graph.
Replace this sentence
# Reflections
or two of your reflections on this exercise here. Write a sentence
The document contains placeholders for the following exercises.
The Data
Let’s imagine we’re hired by the administrators of the Capital Bikeshare program program in Washington D.C., to help them understand and predict the hourly demand for rental bikes. We will be working with this data in several activities throughout this semester.
This understanding will help them plan the number of bikes that need to be available at different parts of the system at different times, to avoid the case where someone wants a bike but the station is empty, or someone wants to return a bike but the station is full.
The data for this problem were collected from the Capital Bikeshare program over the course of two years (2011 and 2012). Researchers at the University of Porto obtained it from the Capital Bikeshare data website, processed the data, and augmented it with extra information, as described on this page. It has been simplified slightly for this assignment.
Load and Explore Data
The data is in a CSV (comma separated values) file that we’ve pre-wrangled for you (you’ll do your own wrangling later). The following code block, already included, uses the read_csv
function in pandas to load the data into a data frame called daily_rides
.
The parse_dates
argument tells pandas to convert the date
column to a date object, rather than leaving it as a string. (CSV files don’t usually specify the data type of each column, so we need to tell pandas to interpret those as dates and not as strings like “2011-01-01”.)
The variables and their descriptions are given below:
Note: the original dataset also includes rides by users who didn’t register for a Capital Bikeshare account. That data has been removed for this first assignment to keep things simple.
date
day_of_week
: an integer between 0 and 6 inclusive. In a later exercise you will decode which number represents Monday, etc.workingday
: either “weekday” or “weekend” (which actually includes holidays too).total_rides
: the number of rides that day by users who registered for a Capital Bikeshare account
- Write a sentence answering the following two questions: How many rows does the dataset contain? What does each row represent?
Write the answer outside a code chunk.
There are 5 rows in the dataset. Each row represents a different type of fruit.
Not like this:
There are 5 rows in the dataset
And not like this:
```{python}
# There are 5 rows in the dataset
```
Once you’re confident with this, you can try using an asis
-output code chunk; see the class notes.
Click Render to check that you can view the rendered output of your document.
- Create a scatter plot showing the total number of rides each day. Sample code is provided below, but you will need to fill in the blanks.
```{python}
show_plot(
px.scatter(
daily_rides,
x="____", y="____",
trendline="lowess",
labels={"date": "____", "____": "Number of Rides"},
))
```
Let’s deconstruct this code:
show_plot()
is a hack we’re using in this class to display plots within the RStudio IDE. Once this bug in RStudio is fixed we can remove this hack.px.scatter
is the Plotly Express (px
) function we’ll use to easily construct a scatter plot. We always specify the data frame as the first argument.- The next arguments define the mappings between the variables in the dataset and the aesthetics of the plot (e.g. x and y coordinates, colors, etc.).
- The
trendline
argument tells Plotly Express to add a smoothing trend line to the plot. (Try removing this argument to see what happens.) Lowess is a smooth curve; you can try switching it tools
to see a linear trend line. - Finally, the
labels
argument lets us make the visualization more hospitable by specifing the labels for the x and y axes. We pass a dictionary with the column names as keys and the labels as values.
The result should look like:
Click Render to check that you can view the rendered output of your document.
- Add a new code chunk that creates the same plot again, but add a mapping of
workingday
to thecolor
aesthetic. Your result should look like:
You should start by coping and pasting from your previous code chunk.
Don’t forget to separate arguments with commas (,
).
- Write a one or two sentence interpretation of the graph, focusing on the following question: How do the number of rides compare for weekdays vs weekends? Based on this, make a guess about what Capital Bikeshare riders use the bikes for.
Click Render to check that you can view the rendered output of your document.
Reflection
At the end of your document, write a sentence or two of your overall reflections on this exercise. You may write whatever you want, but you might perhaps respond to one or two of these questions:
- Was anything unclear about this assignment?
- How hard was it for you? Where did you get “stuck”?
- How long did it take you?
- What questions or uncertainties remain?
- What skills do you think you’ll need more practice with?
- Did you try anything out of curiosity that you weren’t specifically asked to do?
We’ll respond to these reflections in class, but only in an overall sense and only once we’ve had a chance to review them. If you have a question that needs a response, please post it on EdStem.
Submitting
First, make sure that your qmd
file is free of template text. (Did you change the title and author names? Did you remove the “your code here” lines? Did you delete any “replace this line” text?)
Make sure that your qmd
file renders successfully by clicking Render. Spot-check that your most recent change is reflected in the rendered output. Then, submit your qmd
file to Moodle. Here’s how:
- Find the Files pane in the lower-right pane of RStudio.
- Find your
.qmd
file. - Tick the checkbox next to the file name.
- Under the More button (a gear icon), choose Export.
- Click the Download button.
- Upload the resulting file to Moodle (in some browsers, you can drag and drop it onto the Moodle page).
Going Further
Once you’ve finished this exercise, you might try:
- Normally we’ll leave code displayed, but you might try hiding the code for some of the plots.
- Can you use code to get the number of rows, instead of typing it out? See “Putting computed results in the document” in the notes.
- Can you try out some of the other options for
px.scatter
? Maybe make the dots smaller? See the plotly reference materials linked to from the notes - Can you reproduce the Gapminder “health and wealth” plot from the video?