RankingProject
packageThis vignette introduces the
RankingProject
package, which accompanies the articles “A Primer on
Visualizations for Comparing Populations, Including the Issue of
Overlapping Confidence Intervals” (Wright, Klein, and Wieczorek,
2019, The American Statistician) and “A Joint Confidence Region for
an Overall Ranking of Populations” (Klein, Wright, and Wieczorek,
2020, Journal of the Royal Statistical Society: Series C). For
instructions on exactly replicating the article’s main figures, please
see the Primer
and Joint
vignettes:
vignette("primer", package = "RankingProject")
and
vignette("joint", package = "RankingProject")
The package provides functions for plotting ranked tables of data side-by-side with their plots. The available visualizations include shaded columns plots, adjusted confidence intervals, and related plots intended for making correct inferences about one-to-many or many-to-many comparisons.
First, we load the package and the TravelTime2011
dataset used in the paper. This dataset contains 51 rows (estimates for
each of the 50 US states and Washington, D.C.) and 7 columns. The
variables Estimate.2dec
and SE.2dec
are
estimates of the mean and its standard error for the mean travel time
(in minutes) to work of workers 16 years and over who did not work at
home, from the 2011 American Community Survey (ACS).
We also create string versions of our estimates and their standard errors, for printing out tables with a consistent number of digits.
## Rank State Estimate.2dec SE.2dec Abbreviation Region FIPS
## 1 1 South Dakota 16.86 0.28 SD MIDWEST 46
## 2 2 North Dakota 16.91 0.36 ND MIDWEST 38
## 3 3 Nebraska 18.06 0.19 NE MIDWEST 31
## 4 4 Wyoming 18.10 0.50 WY WEST 56
## 5 5 Montana 18.18 0.32 MT WEST 30
## 6 6 Alaska 18.39 0.33 AK PACIFIC 2
# Format estimates and SEs into strings with 2 digits past the decimal
USdata$Estimate.Print = formatC(USdata$Estimate.2dec,
format = 'f', digits = 2)
# For SEs, also drop the leading 0
USdata$SE.Print = substring(formatC(USdata$SE.2dec,
format = 'f', digits = 2),
first = 2)
Next, we set up several list-type objects to contain parameters
needed for the tables and plots. We will use Colorado (CO) as the
reference area. In particular, to plot individual confidence intervals,
we set plotType="individual"
inside
plotParList
.
# Set Colorado as the reference area
refAbbr <- "CO"
refRow <- which(USdata$Abbreviation==refAbbr)
# Set up parameter lists for table function and plot function
tableParList <- with(USdata,
list(ranks = Rank, names = State,
est = Estimate.Print, se = SE.Print,
placeType = "State"))
plotParList <- with(USdata,
list(est = Estimate.2dec, se = SE.2dec,
names = Abbreviation, refName = refAbbr,
confLevel = .90, cex = 0.6,
plotType = "individual"))
Finally, we create a simple plot of individual 90% confidence intervals (CIs), alongside the ranking table:
However, such individual 90% confidence intervals are not explicitly
designed for making comparisons. As noted in the article, when reading
individual CIs it is tempting to conclude that states with overlapping
intervals are not significantly different at the α = 0.10 significance level, but
this is not necessarily true. The RankingProject
package
includes several other plot types that are specifically designed for
making inferences from a plot appropriately.
Notes: if we set plotParList$refName=NULL
for the
individual CIs plot above, the state abbreviations will flip sides
around the median state instead of around the reference state. And if we
wish, we can set plotParList$legendText <- NA
to remove
the default legend.
First, we can use plotParList$plotType="columns"
to
create a grid or “shaded columns plot” that simply shows which states
are significantly different from one another. We start with significance
level α = 0.10 and apply a
“demi-Bonferroni” correction for 50 comparisons between one reference
state and all other states. Then we perform these 50 one-to-many tests
within each column of the plot, testing the reference state (labeled at
the bottom of the column) against each other state at significance level
α/50 = 0.002.
(The intended audience behind this demi-Bonferroni correction is a reader who already has one reference state in mind, such as his or her home state, and does not care about comparisons that exclude this particular state. For more general all-to-all comparisons, the 50-way demi-Bonferroni correction could be replaced with a ${51 \choose 2}$-way full-Bonferroni correction.)
# Shaded Columns plot
plotParList$plotType <- "columns"
# Specify where to position the "Reference State:" text,
# and adjust column widths from their defaults
tableParList <- c(tableParList,
list(columnsPlotRefLine = .7, col2 = .55, col3 = .8))
# Use abbreviations instead of full names in the table, to save space
tableParList$names <- USdata$Abbreviation
# Use default plotting character size
plotParList$cex <- NULL
# Actually draw the figure
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList,
tableWidthProp = 2/7)
# Reset defaults for future plots
tableParList[c("columnsPlotRefLine", "col2", "col3")] <- NULL
# Use full state names in the table for future plots
tableParList$names <- USdata$State
The vertical lines highlighting Colorado are optional and could be
removed by setting plotParList$refName=NULL
.
We can improve the first plot (of individual CIs) by performing a Goldstein-Healy adjustment, as summarized in the article and originally proposed by Goldstein and Healy (1995). Using this adjustment, instead of plotting actual 90% CIs, we plot CIs at a different confidence level which is chosen (based on the standard errors of all the estimates) to achieve an “average significance level” of 0.10. This lets us actually perform inferences by checking for overlap of intervals. If two intervals do (do not) overlap, they are not (are) significantly different at this “average significance level.” Usual CIs do not have this property.
# Set smaller plot text and lower x-axis label for future plots
plotParList$cex <- 0.6
plotParList$thetaLine <- 1.5
# Goldstein-Healy adjusted CIs
plotParList$plotType <- "individual"
plotParList$GH <- TRUE
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList)
We could show both the usual CIs and the Goldstein-Healy-adjusted
versions on a single plot, by using two-tiered error bars. Here, the
inner tier (between the cross bars) shows the Goldstein-Healy
adjustment, and the outer tier (all the way to the ends of the
intervals) shows the individual 90% CIs. When we set
plotParList$tiers=2
the plot function will always show
individual confidence intervals on one tier, and Goldstein-Healy and/or
Bonferroni adjustments on another tier. The legend will automatically
show which tier is which (since adjusted intervals could either all be
shorter or all be longer than the un-adjusted intervals, depending on
which adjustments are made).
# Double-tiered GH plot:
# inner tiers are GH CIs,
# outer tiers are usual 90% CIs
plotParList$tiers <- 2
# Legend auto-positioning is poor with line breaks in legend text;
# we can improve it by controlling (X,Y) manually
plotParList$legendX <- 12
plotParList$legendY <- 53
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList)
However, the Goldstein-Healy adjustment allows for visual inspection of overlap, but does not correct for multiple comparisons. If we have one reference state in mind, we may wish to use a demi-Bonferroni correction for comparing one state against the other 50:
# Double-tiered GH + Bonferroni plot:
# inner tiers are usual 90% CIs,
# outer tiers are 50-way demi-Bonferroni-corrected GH CIs
plotParList$multcomp.scope <- "demi"
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList)
Finally, if we do not have a reference state in mind, we may prefer to use a full-Bonferroni correction, for all ${51 \choose 2}$ possible pairwise comparisons:
# Double-tiered GH + Bonferroni plot:
# inner tiers are usual 90% CIs,
# outer tiers are (51 choose 2)-way full-Bonferroni-corrected GH CIs
plotParList$multcomp.scope <- "full"
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList)
Instead of using Goldstein and Healy (1995)’s adjustment to confidence levels, we may prefer to directly plot the differences between the reference state and all others. If the 90% CI for such a difference does (does not) contain 0, the difference is not (is) statistically significant. Again, we use a demi-Bonferroni correction for the 50 possible comparisons against the reference state.
# CIs for differences from reference state
plotParList$plotType <- "difference"
plotParList$lwdBold <- 2
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList,
annotRefName = USdata$Rank[refRow],
annotRefRank = USdata$State[refRow])
Finally, when we plot the CIs for differences above, we cannot show a confidence interval for the reference state’s estimate itself. Almond et al. (2000) proposed a variant plot of “comparison intervals,” which does allows us to show the reference state’s confidence interval as well. Here, all the other states’ intervals are designed such that their difference from the reference is (is not) statistically significant if they do not (do) overlap with the reference state’s individual CI. However, these other states’ “comparison intervals” are designed for this particular comparison, and they should not be interpreted as standard confidence intervals.
# Comparison intervals (Almond et al. 2000)
plotParList$plotType <- "comparison"
RankPlotWithTable(tableParList = tableParList, plotParList = plotParList)
Please see the Primer
vignette for instructions on using
the tikzDevice
package to create figures with LaTeX-style
text, instead of default R-style text:
vignette("primer", package = "RankingProject")
By default, RankPlotWithTable()
uses the built-in
functions RankTable()
and RankPlot()
to create
the table and plot within each figure. If these functions are not
flexible enough to show what the user would like to see (e.g. if we
wanted to have more or fewer than 4 columns in the ranking table), we
can write a modified version of either of these functions. This new
function can be plugged into RankPlotWithTable()
using the
tableFunction
and plotFunction
arguments.