Statistiek: introductie tot R


10. Data visualization principles

Deze reeks bevat interessante informatie over wat informatieve en niet-informatieve plots zijn. De reeks is optioneel.

We have already provided some rules to follow as we created plots for our examples. Here, we aim to provide some general principles we can use as a guide for effective data visualization. Much of this section is based on a talk by Karl Broman33 titled “Creating Effective Figures and Tables”34 and includes some of the figures which were made with code that Karl makes available on his GitHub repository35, as well as class notes from Peter Aldhous’ Introduction to Data Visualization course36. Following Karl’s approach, we show some examples of plot styles we should avoid, explain how to improve them, and use these as motivation for a list of principles. We compare and contrast plots that follow these principles to those that don’t.

The principles are mostly based on research related to how humans detect patterns and make visual comparisons. The preferred approaches are those that best fit the way our brains process visual information. When deciding on a visualization approach, it is also important to keep our goal in mind. We may be comparing a viewable number of quantities, describing distributions for categories or numeric values, comparing the data from two groups, or describing the relationship between two variables. As a final note, we want to emphasize that for a data scientist it is important to adapt and optimize graphs to the audience. For example, an exploratory plot made for ourselves will be different than a chart intended to communicate a finding to a general audience.

We will be using these libraries:

library(tidyverse)
library(dslabs)
library(gridExtra)
Titel Voortgang groep
10.1. Encoding data using visual cues
10.2. Know when to include 0
10.3. Do not distort quantities
10.4. Order categories by a meaningful value
10.5. Show the data
10.6. Ease comparisons
10.7. Think of the color blind
10.8. Plots for two variables
10.9. Encoding a third variable
10.10. Avoid pseudo-three-dimensional plots
10.11. Avoid too many significant digits
10.12. Know your audience
10.13.1. Bad Plots
10.13.2. Reordering
10.13.3. Boxplot of Murder Rates
10.13.4. 3D Plots
10.14. Case study: vaccines and infectious diseases
10.15.1. Smallpox Tileplot
10.15.2. Smallpox Time Series
10.15.3. Comparing Diseases