W3.2 Visualizing

Review

Retrieval

Fill in the table using variables from the health and wealth gapminder plot.

Variable Type Visual Cue
gdpPercap
lifeExp
continent
pop
country
year

Q&A

Why does data type matter?

Is date continuous or discrete?

Composing a plot

  • What’s the data? “Each row is a ___”
  • What is the coordinate system? (What’s x and y?)
  • What graphical symbols are used? (dot? bar? line?)
  • What data variables are mapped to what visual cues (aesthetics)?
    • What scales are used? (Any transformations?)
    • What guides are shown? (What labels for values?)
  • What labels and annotations are added?

Grammar of Graphics (2005)

Concisely describe the components of a graphic

In Plotly…

e.g., px.scatter(data, ...):

  • Aesthetics (visual cues): x, y, color, symbol, size, animation_frame, text
  • Scale: log_x, log_y, range_x, range_y
  • Guides: labels
  • Faceting: facet_row, facet_col, facet_col_wrap
  • Other context: title, hover_name (tooltip), etc.

Other details can be customized with update_layout, update_traces, etc.

(
  px.scatter(votes, x='year', y='percent_yes', color='country',
1    facet_col='issue', facet_col_wrap=3, facet_col_spacing=.1,
2    trendline='lowess',
3    labels={"percent_yes": "% Yes Votes", "year": "Year", "country": "Country"},
4    title="Percentage of 'Yes' votes in the UN General Assembly",
5  ).update_traces(marker_size=2)
)
1
Faceting
2
Trendline
3
Labels
4
Plot title.
5
Sets size directly (not based on data).

(See notes for details.)

The Palmer Penguins

Palmer Penguins

penguins = pd.read_csv("https://raw.githubusercontent.com/allisonhorst/palmerpenguins/main/inst/extdata/penguins.csv")
penguins.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 344 entries, 0 to 343
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   species            344 non-null    object 
 1   island             344 non-null    object 
 2   bill_length_mm     342 non-null    float64
 3   bill_depth_mm      342 non-null    float64
 4   flipper_length_mm  342 non-null    float64
 5   body_mass_g        342 non-null    float64
 6   sex                333 non-null    object 
 7   year               344 non-null    int64  
dtypes: float64(4), int64(1), object(3)
memory usage: 21.6+ KB
px.scatter(
    penguins,
    x='bill_depth_mm', y='bill_length_mm', color='species',
    title="Bill length vs depth",
    labels={
        "bill_depth_mm": "Bill Depth (mm)",
        "bill_length_mm": "Bill Length (mm)",
        "species": "Species"
    },
)

Color vision deficiency 1

  1. Redundant encoding: use two visual cues to encode the same variable
px.scatter(penguins,
    x='bill_depth_mm', y='bill_length_mm', color='species', 
    symbol='species',
    title="Bill length vs depth",
    labels={"bill_depth_mm": "Bill Depth (mm)", "bill_length_mm": "Bill Length (mm)", "species": "Species"},
)

Color vision deficiency 2

  1. Redundant encoding: use two visual cues to encode the same variable
  2. Color palette
px.scatter(penguins,
    x='bill_depth_mm', y='bill_length_mm', color='species', 
    symbol='species',
    title="Bill length vs depth",
    labels={"bill_depth_mm": "Bill Depth (mm)", "bill_length_mm": "Bill Length (mm)", "species": "Species"},
    color_discrete_sequence=px.colors.qualitative.Safe
)

Mapping vs Setting

Mapping

px.scatter(
    penguins2, x='bill_depth_mm', y='bill_length_mm', 
    color='species', size='body_mass_g', size_max=6,
)

Setting

(
  px.scatter(penguins2, x='bill_depth_mm', y='bill_length_mm')
  .update_traces(marker_color='red', marker_size=4)
)

Mapping vs Setting Defined

  • Mapping: Determine the color, size, etc. of points based on the values of a variable in the data
    • goes into the initial plot function (px.scatter())
    • plotly computes the actual value using its scales
  • Setting: Determine the color, size, etc. of points not based on the values of a variable in the data
    • in plotly we have to update the “traces” after we create the plot
    • plotly uses the value you give it directly (sets the color to literally “red”)
    • “make all the points smaller”

Faceting (Small Multiples)

px.scatter(
    penguins,
    x='bill_depth_mm', y='bill_length_mm', color='species', 
    facet_col="island",
    width=1000
)

Visualization Design Questions

Bill depth vs length?

px.scatter(
    penguins,
    x='bill_depth_mm', y='bill_length_mm', 
    trendline='ols',
)

Simpson’s Paradox

px.scatter(
    penguins,
    x='bill_depth_mm', y='bill_length_mm', 
    trendline='ols',
    color='species',
)

Most commonly observed species?

px.pie(
    penguins,
    names='species',
    title="Penguin Species",
)
px.bar(
    penguins,
    x='species',
    title="Penguin Species",
)

Even slightly better

px.bar(
  penguins, x='species', height=400,
)
px.histogram(
  penguins, x='species', height=400,
)
  • px.bar: stack one mark for each row (notice the little gaps between the bars)
  • px.histogram: count how many rows have each species, make a single mark for each species

Pie charts are useful for

  • Evaluating whether one category is more or less than half
  • And also for groups of adjacent categories
  • and for cueing “part of a whole”

But not useful for comparing categories.

Just use a bar chart.

Your Turn

Extended activities in the plotting tutorial