Each row in this dataset represents a faculty type, and the columns are the years for which we have data. The values are percentage of hires of that type of faculty for each year.
| faculty_type | 1975 | 1989 | 1993 | 1995 | 1999 | 2001 | 2003 | 2005 | 2007 | 2009 | 2011 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Full-Time Tenured Faculty | 29.0 | 27.6 | 25.0 | 24.8 | 21.8 | 20.3 | 19.3 | 17.8 | 17.2 | 16.8 | 16.7 |
| 1 | Full-Time Tenure-Track Faculty | 16.1 | 11.4 | 10.2 | 9.6 | 8.9 | 9.2 | 8.8 | 8.2 | 8.0 | 7.6 | 7.4 |
| 2 | Full-Time Non-Tenure-Track Faculty | 10.3 | 14.1 | 13.6 | 13.6 | 15.2 | 15.5 | 15.0 | 14.8 | 14.9 | 15.1 | 15.4 |
| 3 | Part-Time Faculty | 24.0 | 30.4 | 33.1 | 33.2 | 35.5 | 36.0 | 37.0 | 39.3 | 40.5 | 41.1 | 41.3 |
| 4 | Graduate Student Employees | 20.5 | 16.5 | 18.1 | 18.8 | 18.7 | 19.0 | 20.0 | 19.9 | 19.5 | 19.4 | 19.3 |
What are the variables in this plot?
If the long data will have a row for each year/faculty type combination, and there are 5 faculty types and 11 years of data, how many rows will the data have?
| faculty_type | 1975 | 1989 | 1993 | 1995 | 1999 | 2001 | 2003 | 2005 | 2007 | 2009 | 2011 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Full-Time Tenured Faculty | 29.0 | 27.6 | 25.0 | 24.8 | 21.8 | 20.3 | 19.3 | 17.8 | 17.2 | 16.8 | 16.7 |
| 1 | Full-Time Tenure-Track Faculty | 16.1 | 11.4 | 10.2 | 9.6 | 8.9 | 9.2 | 8.8 | 8.2 | 8.0 | 7.6 | 7.4 |
| 2 | Full-Time Non-Tenure-Track Faculty | 10.3 | 14.1 | 13.6 | 13.6 | 15.2 | 15.5 | 15.0 | 14.8 | 14.9 | 15.1 | 15.4 |
| 3 | Part-Time Faculty | 24.0 | 30.4 | 33.1 | 33.2 | 35.5 | 36.0 | 37.0 | 39.3 | 40.5 | 41.1 | 41.3 |
| 4 | Graduate Student Employees | 20.5 | 16.5 | 18.1 | 18.8 | 18.7 | 19.0 | 20.0 | 19.9 | 19.5 | 19.4 | 19.3 |
staff_long = staff.melt(
id_vars=["faculty_type"],
var_name="year",
value_name="percentage"
)
staff_long| faculty_type | year | percentage | |
|---|---|---|---|
| 0 | Full-Time Tenured Faculty | 1975 | 29.0 |
| 1 | Full-Time Tenure-Track Faculty | 1975 | 16.1 |
| 2 | Full-Time Non-Tenure-Track Faculty | 1975 | 10.3 |
| 3 | Part-Time Faculty | 1975 | 24.0 |
| 4 | Graduate Student Employees | 1975 | 20.5 |
| 5 | Full-Time Tenured Faculty | 1989 | 27.6 |
| 6 | Full-Time Tenure-Track Faculty | 1989 | 11.4 |
| 7 | Full-Time Non-Tenure-Track Faculty | 1989 | 14.1 |
| 8 | Part-Time Faculty | 1989 | 30.4 |
| 9 | Graduate Student Employees | 1989 | 16.5 |
| 10 | Full-Time Tenured Faculty | 1993 | 25.0 |
| 11 | Full-Time Tenure-Track Faculty | 1993 | 10.2 |
| 12 | Full-Time Non-Tenure-Track Faculty | 1993 | 13.6 |
| 13 | Part-Time Faculty | 1993 | 33.1 |
| 14 | Graduate Student Employees | 1993 | 18.1 |
| 15 | Full-Time Tenured Faculty | 1995 | 24.8 |
| 16 | Full-Time Tenure-Track Faculty | 1995 | 9.6 |
| 17 | Full-Time Non-Tenure-Track Faculty | 1995 | 13.6 |
| 18 | Part-Time Faculty | 1995 | 33.2 |
| 19 | Graduate Student Employees | 1995 | 18.8 |
| 20 | Full-Time Tenured Faculty | 1999 | 21.8 |
| 21 | Full-Time Tenure-Track Faculty | 1999 | 8.9 |
| 22 | Full-Time Non-Tenure-Track Faculty | 1999 | 15.2 |
| 23 | Part-Time Faculty | 1999 | 35.5 |
| 24 | Graduate Student Employees | 1999 | 18.7 |
| 25 | Full-Time Tenured Faculty | 2001 | 20.3 |
| 26 | Full-Time Tenure-Track Faculty | 2001 | 9.2 |
| 27 | Full-Time Non-Tenure-Track Faculty | 2001 | 15.5 |
| 28 | Part-Time Faculty | 2001 | 36.0 |
| 29 | Graduate Student Employees | 2001 | 19.0 |
| 30 | Full-Time Tenured Faculty | 2003 | 19.3 |
| 31 | Full-Time Tenure-Track Faculty | 2003 | 8.8 |
| 32 | Full-Time Non-Tenure-Track Faculty | 2003 | 15.0 |
| 33 | Part-Time Faculty | 2003 | 37.0 |
| 34 | Graduate Student Employees | 2003 | 20.0 |
| 35 | Full-Time Tenured Faculty | 2005 | 17.8 |
| 36 | Full-Time Tenure-Track Faculty | 2005 | 8.2 |
| 37 | Full-Time Non-Tenure-Track Faculty | 2005 | 14.8 |
| 38 | Part-Time Faculty | 2005 | 39.3 |
| 39 | Graduate Student Employees | 2005 | 19.9 |
| 40 | Full-Time Tenured Faculty | 2007 | 17.2 |
| 41 | Full-Time Tenure-Track Faculty | 2007 | 8.0 |
| 42 | Full-Time Non-Tenure-Track Faculty | 2007 | 14.9 |
| 43 | Part-Time Faculty | 2007 | 40.5 |
| 44 | Graduate Student Employees | 2007 | 19.5 |
| 45 | Full-Time Tenured Faculty | 2009 | 16.8 |
| 46 | Full-Time Tenure-Track Faculty | 2009 | 7.6 |
| 47 | Full-Time Non-Tenure-Track Faculty | 2009 | 15.1 |
| 48 | Part-Time Faculty | 2009 | 41.1 |
| 49 | Graduate Student Employees | 2009 | 19.4 |
| 50 | Full-Time Tenured Faculty | 2011 | 16.7 |
| 51 | Full-Time Tenure-Track Faculty | 2011 | 7.4 |
| 52 | Full-Time Non-Tenure-Track Faculty | 2011 | 15.4 |
| 53 | Part-Time Faculty | 2011 | 41.3 |
| 54 | Graduate Student Employees | 2011 | 19.3 |
Why does that say “sum of percentage”?
Hm, what’s going on here?
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 55 entries, 0 to 54
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 faculty_type 55 non-null object
1 year 55 non-null object
2 percentage 55 non-null float64
dtypes: float64(1), object(2)
memory usage: 1.4+ KB
array(['1975', '1989', '1993', '1995', '1999', '2001', '2003', '2005',
'2007', '2009', '2011'], dtype=object)
Year was a string because it was a column name of the CSV.
px.line(
staff_long,
x="year", y="percentage", color="faculty_type",
markers=True,
labels={"year": "Year", "percentage": "Percentage of hires", "faculty_type": "Faculty type"}
)