﻿ More On Slopegraphs – Data Science Austria

# More On Slopegraphs

About a week ago Bob Rudis created a nice blog
post

that I saw on my R Bloggers feed that
simultaneously:

1. Threw a bit of “shade” on the ToS for Axios (well done Bob)
2. Showed how to use `EtherCalc` as a data entry tool
3. And, most importantly to me, showed how to make great use of a slopegraph

I happened to be on vacation at the time but as soon as I got back and caught up
I vowed to follow up since slopegraphs have always fascinated
me

and I happened to write a function to make them, about a year
back
. I
wanted to look at Bob’s post in detail after very quickly agreeing with his
premise that it was a much better choice than a “dumbbell chart”. So this post

This post assumes that you’ve read the earlier posts.

Let’s quickly recreate the dataset Bob created in `EtherCalc` keeping it simple
using the `str` function and choosing to make it a dataframe not a tibble.

``````library(tidyverse)
# install.packages("devtools")
# devtools::install_github("ibecav/CGPfunctions")
library(CGPfunctions) thedata <- structure(list( topic = structure(c(6L, 3L, 5L, 4L, 11L, 13L, 2L, 8L, 9L, 12L, 7L, 1L, 14L, 10L), .Label = c("Arts & entertainment", "Business", "Climate change", "Economics", "Education", "Health care", "Immigration", "National Security", "Politics", "Religion", "Science", "Sports", "Technology", "U.S. foreign policy"), class = "factor"), actually_read = c(7L, 5L, 11L, 6L, 10L, 14L, 13L, 1L, 2L, 3L, 4L, 8L, 9L, 12L), say_want_covered = c(1L, 2L, 3L, 4L, 7L, 8L, 11L, 5L, 10L, 14L, 6L, 13L, 9L, 12L)), class = "data.frame", row.names = c(NA, -14L))
thedata```
```
``````## topic actually_read say_want_covered
## 1 Health care 7 1
## 2 Climate change 5 2
## 3 Education 11 3
## 4 Economics 6 4
## 5 Science 10 7
## 6 Technology 14 8
## 8 National Security 1 5
## 9 Politics 2 10
## 10 Sports 3 14
## 11 Immigration 4 6
## 12 Arts & entertainment 8 13
## 13 U.S. foreign policy 9 9
## 14 Religion 12 12```
```

#### Making slopegraphs easy

When you look at Bob’s post there’s actually a lot of code in there to make a
very nice graphic. Being extraordinarily lazy I wrote my function to get a
slopegraph with the least amount of work possible. The first step, which is
unavoidable if you want to make use of `newggslopegraph`, though is to reshape
the data into a “longer” format. We’ll use `reshape2::melt` and keep the `topic`
column but collapse the other two columns into a factor called `Saydo` and put the
actual “rank” into a column called `Rank`. Since “actually_read” and
“say_want_covered” are now factor levels instead of column names we can use
`forcats::fct_recode` to make them much nicer built in labels when we make our
plot. Voila a new dataframe called `temp`.

``````temp <- reshape2::melt(data = thedata, id = "topic", variable.name = "Saydo", value.name = "Rank")
temp\$Saydo <- forcats::fct_recode(temp\$Saydo, "Actually read" = "actually_read", "Say they want" = "say_want_covered")
temp```
```
``````## topic Saydo Rank
## 1 Health care Actually read 7
## 2 Climate change Actually read 5
## 3 Education Actually read 11
## 4 Economics Actually read 6
## 5 Science Actually read 10
## 6 Technology Actually read 14
## 8 National Security Actually read 1
## 9 Politics Actually read 2
## 10 Sports Actually read 3
## 11 Immigration Actually read 4
## 12 Arts & entertainment Actually read 8
## 13 U.S. foreign policy Actually read 9
## 14 Religion Actually read 12
## 15 Health care Say they want 1
## 16 Climate change Say they want 2
## 17 Education Say they want 3
## 18 Economics Say they want 4
## 19 Science Say they want 7
## 20 Technology Say they want 8
## 21 Business Say they want 11
## 22 National Security Say they want 5
## 23 Politics Say they want 10
## 24 Sports Say they want 14
## 25 Immigration Say they want 6
## 26 Arts & entertainment Say they want 13
## 27 U.S. foreign policy Say they want 9
## 28 Religion Say they want 12```
```

Once we get the data in the right shape I tried to make `newggslopegraph` as
simple and intuitive as possible. I love working with `ggplot` but I will admit
it can get quite complex. So to create the default plot all we need to do is:

````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic)`
```
````## ## Converting 'Saydo' to an ordered factor`
```

That was pretty painless wasn’t it? But clearly there’s a lot of room for
tweaking! Let’s make it better!

#### Tweaking

Whole books can and have been written just on the issue of graphic design so I’m
not going to try and summarize it all in one little blog post. I will however,
for the impatient reader, immediately take care of a few key things:

1. Titles, subtitles and captions are important! Don’t ignore them or give
them short change. You’ll notice that since we didn’t initially specify them,
placeholders appear. That’s to be shameless about making you think about them
even if you eventually decide to turn them “off” (read the
doco
)

2. The default is that every line is it’s own color. That’s seldom a good choice
for telling a story unless the number of topics (a.k.a. `Groups`) is very
small. For now let’s make them all “black” and come back to this in a bit.

3. By default `Measurement` is treated as a real number so the highest values
are on the top of the graph. Makes more sense here to reverse the scale and
put the highest ranked “1” at he top. `ReverseYAxis = TRUE`. If we needed or
wanted to `ReverseXAxis = TRUE` might be useful.

Our second attempt looks like this:

````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, Title = "14 Topics Ranked by What Americans Read vs Want Covered", SubTitle = "'Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019", Caption = "Source: Axios \nMakeover by @hrbrmstr", LineColor = "black" )`
```

Alright, that’s looking a little bit better for basic layout. But it doesn’t yet
tell the reader a story and focus their attention on the message we want to
convey. To be honest I’m not a huge fan of adding a lot of annotations to a plot
so let’s first try to catch the readers attention by using color selectively.

#### Emphasizing the “slope” in slopegraph

As the name implies slopegraphs get the reader to attend to relative differences
in slope, right now our choice of “black” as the only color is marginally better
than our original multicolor mess but still falls far short of conveying a
message. The `LineColor` parameter is quite flexible. The default is suitable
for a small number of topics, a single color can be the right choice on occasion,
but we can also pass it a character vector of colors that is as customized as we
like. For example `LineColor = c("black", "red")` would recycle the colors red
and black to create an alternating pattern. We could even build a named list
that associates a color to each of the topic areas if we desired (see the
vignette

for an example). But right now, that is too much effort and I’d like to
handle this by algorithm not by manually entry.

As a start point let’s assume we’d like to get the reader to focus on
understanding which topics increase in rank, decrease in rank or stay the same.
We’ll color increase as black, decreases as red and things that remain level as
light gray. We can accomplish that through a series of `pipes` and `dplyr`
verbs.

``````colorvect <- temp %>% group_by(topic) %>% summarise(difference = diff(Rank)) %>% mutate(whatcolor = case_when( difference == 0 ~ "light gray", difference > 0 ~ "red", difference < 0 ~ "black" )) %>% select(topic, whatcolor) %>% tibble::deframe()
colorvect```
```
````## Arts & entertainment Business Climate change ## "red" "black" "black" ## Economics Education Health care ## "black" "black" "black" ## Immigration National Security Politics ## "red" "red" "red" ## Religion Science Sports ## "light gray" "black" "red" ## Technology U.S. foreign policy ## "black" "light gray"`
```

Each topic now has a color assigned, and it’s trivial to pass our color vector
to `newggslopegraph`. While we’re at it we can showcase some of the other
formatting options, like changing font sizes for the labels. `DataLabelPadding`
is important if you are likely to have datapoints close together (see the
vignette
for the cancer data) but in this case we can be more generous since
ranks won’t overlap.

``````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, DataTextSize = 3.5, YTextSize = 4, XTextSize = 16, DataLabelPadding = .2, Title = "Topic Rankings Compared Between\nWhat Americans Actually Read vs Want Covered", SubTitle = "'Actually Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019", Caption = "Source: Axios \nMakeover by @hrbrmstr", LineColor = colorvect
)```
```

Very nice looking, but I think it is still too crowded with colors. Let’s adjust
our coloring to highlight only the larger rank differences. It’s a matter of
personal taste but easy to adjust our little script and test, rinse and repeat
until we’re happy. Let’s adjust so that changes of greater than 4 or less than 4
are highlighted and the rest are gray.

````colorvect <- temp %>% group_by(topic) %>% summarise(difference = diff(Rank)) %>% mutate(whatcolor = case_when( difference >= 4 ~ "red", difference <= -4 ~ "black", TRUE ~ "light gray" )) %>% select(topic, whatcolor) %>% tibble::deframe()`
```

Then we can run the same lines into `newggslopegraph`.

``````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, DataTextSize = 3.5, YTextSize = 4, XTextSize = 16, DataLabelPadding = .2, Title = "Topic Rankings Compared Between\nWhat Americans Actually Read vs Want Covered", SubTitle = "'Actually Read' rank from Parse.ly May 2019 data.\n'Want covered' rank from Axios/SurveyMonkey poll conducted May 17-20, 2019", Caption = "Source: Axios \nMakeover by @hrbrmstr", LineColor = colorvect
)```
```

Personally, I think that even 7 topics may be too much, but hopefully you’re
getting the point that while we’re not losing any information, we’re making it
easier for the reader to focus on the big changes in the data. It’s easy to
discern the pattern whether it’s answering a simple question, such as what is the
number one thing they say they want to read about (Health care), or a more
complex question such as which topic has the biggest disparity (Sports).

#### Use titles, subtitles and captions well

One thing we can do to make our message clearer is make better use of the title
and subtitle areas. It seems simple but is too often forgotten. While we’re at
it I’ll highlight a couple of new capabilities I added to the function:

1. The ability to choose from a select number of `ggplot` themes. In this case Bob
Rudis `ipsum_rc` theme.

2. Control the justification of the titles and subtitles and caption.

But the most important change here IMHO is simply choosing words for the title
and subtitle that convey what we want to look for in the plot or think about.

````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, DataTextSize = 3.5, YTextSize = 3.2, XTextSize = 14, DataLabelPadding = .2, Title = "Americans Don't Actually Read the News They Say They Want", SubTitle = "Many sharp differences in rankings in both directions. Hypocrisy, laziness or gratification?", Caption = "Source: Rud.is \nMakeover by @hrbrmstr", LineColor = colorvect, ThemeChoice = "ipsum", TitleTextSize = 18, SubTitleTextSize = 12, SubTitleJustify = "right")`
```

The same plot in Wall Street Journal style (`wsj`)

``````newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, ReverseXAxis = TRUE, DataTextSize = 3.5, YTextSize = 4, XTextSize = 13, DataLabelPadding = .2, Title = "Americans Don't Actually Read the News They Say They Want", SubTitle = "Many sharp differences in rankings in both directions.\nHypocrisy or laziness or gratification?", Caption = "Source: Rud.is \nMakeover by @hrbrmstr", LineColor = colorvect, ThemeChoice = "wsj", TitleTextSize = 15, CaptionTextSize = 6, SubTitleTextSize = 11, SubTitleJustify = "right"
)```
```

I’m not actually sure I like it better at all but simply demonstrating capability

#### A final example

In Bob’s blog post he demonstrated how to add lines and arrows and text to add
annotation to his plot. I’m of the mindset that less is more and too much
annotation can be a distraction not an aid to telling our story. As you can see
from his code it is also relatively complex using native `ggplot::geom_*`’s to
place things exactly right.

I actually find `cowplot` easier to use for simple annotation. In the example
below I’ll shift to the `gdocs` theme. Save the plot and then use cowplot to add
one important factoid!

``````p <- newggslopegraph(dataframe = temp, Times = Saydo, Measurement = Rank, Grouping = topic, ReverseYAxis = TRUE, DataTextSize = 3.5, YTextSize = 4, XTextSize = 14, DataLabelPadding = .2, Title = "Americans Don't Actually Read the News They Say They Want", SubTitle = "Many sharp differences in rankings in both directions. Hypocrisy, laziness or gratification?", Caption = "Source: Rud.is \nMakeover by @hrbrmstr", LineColor = colorvect, ThemeChoice = "gdocs", TitleTextSize = 16, TitleJustify = "center", SubTitleTextSize = 12, SubTitleJustify = "center"
) cowplot::ggdraw(p) + cowplot::draw_label(label = "Reading about sports shows the\nlargest difference in ranking --\n11 places!", colour = "dark blue", size = 10, y = .10, x = .53, fontface = "italic")```
```

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…