Title: | The Ranking Project: Visualizations for Comparing Populations |
---|---|
Description: | Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>. |
Authors: | Jerzy Wieczorek [cre, aut] |
Maintainer: | Jerzy Wieczorek <[email protected]> |
License: | GPL-2 |
Version: | 0.4.0.9002 |
Built: | 2025-02-28 05:17:08 UTC |
Source: | https://github.com/civilstat/rankingproject |
Functions to generate plots and tables for comparing independently-sampled
populations. Companion package to "A Primer on Visualizations for Comparing
Populations, Including the Issue of Overlapping Confidence Intervals"
by Wright, Klein, and Wieczorek (2019)
<DOI:10.1080/00031305.2017.1392359>
and "A Joint Confidence Region for an Overall Ranking of Populations"
by Klein, Wright, and Wieczorek (2020)
<DOI:10.1111/rssc.12402>.
See the Intro vignette (html) for an overview and examples:
vignette("intro", package = "RankingProject")
.
See the Primer vignette (pdf)
for code which replicates the main figures from the 2019 article:
vignette("primer", package = "RankingProject")
.
See the Joint vignette (pdf)
for code which replicates the main figures from the 2020 article:
vignette("joint", package = "RankingProject")
.
The "comparison" plots are based on figures and S code from
Almond et al. (2000).
The present package does not contain a direct modification of their S code,
but draws inspiration from it. Their script was originally hosted at
Statlib at http://stat.cmu.edu/S/comprB
and may still be found at
Statlib mirrors such as
http://ftp.uni-bayreuth.de/math/statlib/S/comprB.
The code for the "columns" plots is directly based on R's
stats::heatmap()
function, with minor modifications to remove dendrograms and allow the heatmap
to be placed inside a larger layout()
.
Almond, R.G., Lewis, C., Tukey, J.W., and Yan, D. (2000). "Displays for Comparing a Given State to Many Others," The American Statistician, vol. 54, no. 2, 89-93, DOI:10.1080/00031305.2000.10474517.
Klein, M., Wright, T., and Wieczorek, J. (2020). "A Joint Confidence Region for an Overall Ranking of Populations," Journal of the Royal Statistical Society: Series C, vol. 69, no.3, 589-606, DOI:10.1111/rssc.12402.
Wright, T., Klein, M., and Wieczorek, J. (2019). "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals," The American Statistician, vol. 73, no. 2, 165-178, DOI:10.1080/00031305.2017.1392359.
RankPlot
creates a figure with a plot of ranking data,
from among several options for showing uncertainty in the ranked estimates.
This function is meant for use within RankPlotWithTable
,
which draws a ranking table aligned with this plot of the data
in one combined figure.
RankPlot( est, se, names, refName = NULL, confLevel = 0.9, plotType = c("individual", "difference", "comparison", "columns"), tiers = 1, GH = FALSE, multcomp.scope = ifelse(plotType == "individual", "none", "demi"), multcomp.type = c("bonferroni", "independence"), tikzText = FALSE, cex = 1, tickWidth = NULL, rangeFactor = 1.2, textPad = 0, legendX = "topleft", legendY = NULL, legendText = NULL, lwdReg = 1, lwdBold = 3, thetaLine = 1, xlim = NULL, Bonferroni )
RankPlot( est, se, names, refName = NULL, confLevel = 0.9, plotType = c("individual", "difference", "comparison", "columns"), tiers = 1, GH = FALSE, multcomp.scope = ifelse(plotType == "individual", "none", "demi"), multcomp.type = c("bonferroni", "independence"), tikzText = FALSE, cex = 1, tickWidth = NULL, rangeFactor = 1.2, textPad = 0, legendX = "topleft", legendY = NULL, legendText = NULL, lwdReg = 1, lwdBold = 3, thetaLine = 1, xlim = NULL, Bonferroni )
est , se
|
Vectors containing the point estimate and its standard error for each area. |
names |
Vector containing the name of each area. Abbreviations may be preferable to full names (e.g. "CO" instead of "Colorado") since these names will be displayed directly on the plot. |
refName |
String containing the name of the reference area;
must be one of the values in |
confLevel |
Number between 0 and 1: confidence level for individual
(uncorrected) hypothesis tests and/or confidence intervals. E.g. with
|
plotType |
Which type of ranking plot to use. See vignettes for examples and details.
|
tiers |
Numeric, either 1 for usual confidence intervals,
or 2 for two-tiered intervals. 2 can only be used with
|
GH |
Logical, for whether or not to plot adjusted
confidence intervals at an "average" |
multcomp.scope |
Whether to correct for multiple comparisons,
and if so, for how many
(by a correction to the confidence level of the tests or intervals).
|
multcomp.type |
(Only used if |
tikzText |
Logical, for whether or not to format text for tikz plotting. |
cex |
Character expansion factor for the points use to plot each area's point estimate, and for the text used to plot each area's name next to its interval. |
tickWidth |
Numeric height of the cross-bars on interval endpoints
(or inner tiers, if |
rangeFactor |
Numeric multiple by which to expand the range of the data
when setting the x-axis limits. The function tries to leave sufficient room
for plotting margins of error and names next to each area,
but sometimes it may help to adjust |
textPad |
Numeric amount by which to shift the text of |
legendX , legendY
|
The x and y co-ordinates used to position the legend;
see |
legendText |
String, or string vector, with legend text. By default,
each plot type adds informative legend text, but the user may override.
To remove legends entirely, set |
lwdReg |
Positive number for the line width of regular lines.
Used for all intervals when |
lwdBold |
Positive number for the line width of bold lines.
Used for intervals significantly different from the reference area
when |
thetaLine |
Number for how many lines below bottom axis to display
"theta" or other default x-axis labels (which depend on |
xlim |
Vector of 2 numbers for x-axis limits. If |
Bonferroni |
Deprecated name for the |
Users may wish to modify this code and write
their own plot function, which can be swapped into figureFunction
within RankPlotWithTable
. Be aware that
RankPlotWithTable
uses layout
to arrange
the table and plot side-by-side, so layout
cannot be used within
a new figureFunction
.
See Goldstein and Healy (1995) for details on the
"average" confidence level procedure used when GH = TRUE
.
See Almond et al. (2000) for details
on the "comparison intervals" procedure.
Almond, R.G., Lewis, C., Tukey, J.W., and Yan, D. (2000). "Displays for Comparing a Given State to Many Others," The American Statistician, vol. 54, no. 2, 89-93.
Goldstein, H. and Healy, M.J.R. (1995). "The Graphical Presentation of a Collection of Means," JRSS A, vol. 158, no. 1, 175-177.
RankPlotWithTable
and RankTable
.
# Plot of 90% confidence intervals for differences # between each state and Colorado, with demi-Bonferroni correction, # for US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) with(TravelTime2011, RankPlot(est = Estimate.2dec, se = SE.2dec, names = Abbreviation, refName = "CO", confLevel = 0.90, cex = 0.6, plotType = "difference"))
# Plot of 90% confidence intervals for differences # between each state and Colorado, with demi-Bonferroni correction, # for US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) with(TravelTime2011, RankPlot(est = Estimate.2dec, se = SE.2dec, names = Abbreviation, refName = "CO", confLevel = 0.90, cex = 0.6, plotType = "difference"))
RankPlotWithTable
aligns a table of ranking data with a plot of the
data, in one combined figure. See RankTable
and
RankPlot
for details about the default table and plot
functions, including arguments that can be passed to those functions.
RankPlotWithTable( tableParList, plotParList, tableFunction = RankTable, plotFunction = RankPlot, tableWidthProp = 3/8, tikzText = FALSE, annotRefName = NULL, annotRefRank = NULL, annotX = 0 )
RankPlotWithTable( tableParList, plotParList, tableFunction = RankTable, plotFunction = RankPlot, tableWidthProp = 3/8, tikzText = FALSE, annotRefName = NULL, annotRefRank = NULL, annotX = 0 )
tableParList |
A required named list of arguments that will be passed
to |
plotParList |
A required named list of arguments that will be passed
to |
tableFunction |
The function to use for plotting a table of the data
on the left-hand side of the layout. Default is |
plotFunction |
The function to use for plotting a figure of the data
on the right-hand side of the layout. Default is |
tableWidthProp |
A number between 0 and 1, for what proportion of the
layout's width should be used to plot the table. The remaining proportion
|
tikzText |
Logical, formats text for tikz plotting if |
annotRefName , annotRefRank
|
Optional rank and name of the reference
area, for adding an extra
annotation below the figure created by |
annotX |
A number, showing where on the x-axis to center the annotation
if |
Users may write their own table and plot functions to swap into
tableFunction
and plotFunction
. Be aware that
RankPlotWithTable
uses layout
to arrange
the table and plot side-by-side, so layout
cannot be used within
either tableFunction
or plotFunction
. This can also cause
trouble for using the lattice
package within plotFunction
.
# Table with plot of individual 90% confidence intervals # for US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) tableParList <- with(TravelTime2011, list(ranks = Rank, names = State, est = Estimate.2dec, se = SE.2dec, placeType = "State")) plotParList <- with(TravelTime2011, list(est = Estimate.2dec, se = SE.2dec, names = Abbreviation, confLevel = .90, plotType = "individual", cex = 0.6)) RankPlotWithTable(tableParList = tableParList, plotParList = plotParList) # Illustrating the use of annotRefName and annotRefRank: # Table with plot of 90% confidence intervals for differences # between each state and Colorado, with demi-Bonferroni correction plotParList$plotType <- "difference" plotParList$refName <- "CO" RankPlotWithTable(tableParList = tableParList, plotParList = plotParList, annotRefName = "Colorado", annotRefRank = TravelTime2011$Rank[which(TravelTime2011$Abbreviation == "CO")])
# Table with plot of individual 90% confidence intervals # for US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) tableParList <- with(TravelTime2011, list(ranks = Rank, names = State, est = Estimate.2dec, se = SE.2dec, placeType = "State")) plotParList <- with(TravelTime2011, list(est = Estimate.2dec, se = SE.2dec, names = Abbreviation, confLevel = .90, plotType = "individual", cex = 0.6)) RankPlotWithTable(tableParList = tableParList, plotParList = plotParList) # Illustrating the use of annotRefName and annotRefRank: # Table with plot of 90% confidence intervals for differences # between each state and Colorado, with demi-Bonferroni correction plotParList$plotType <- "difference" plotParList$refName <- "CO" RankPlotWithTable(tableParList = tableParList, plotParList = plotParList, annotRefName = "Colorado", annotRefRank = TravelTime2011$Rank[which(TravelTime2011$Abbreviation == "CO")])
RankTable
creates a figure with a table of ranking data.
This may not look very good plotted on its own.
Rather, it is meant for use within RankPlotWithTable
,
which draws this table aligned with a plot of the data
in one combined figure.
RankTable( ranks, names, est, se, placeType = "State", col1 = 0.15, col2 = 0.6, col3 = 0.85, col4 = 1, textPos = 2, titleCex = 0.9, titleLift = 1.5, contentCex = 0.7, columnsPlotRefLine = NULL, tikzText = FALSE )
RankTable( ranks, names, est, se, placeType = "State", col1 = 0.15, col2 = 0.6, col3 = 0.85, col4 = 1, textPos = 2, titleCex = 0.9, titleLift = 1.5, contentCex = 0.7, columnsPlotRefLine = NULL, tikzText = FALSE )
ranks |
Vector containing the rank of each area. |
names |
Vector containing the name of each area. |
est , se
|
Vectors containing the point estimate and its standard error
for each area.
See vignettes for examples of using |
placeType |
String, naming the type of places or units being ranked. |
col1 , col2 , col3 , col4
|
Numeric values between 0 and 1,
showing where each column's right-hand-side endpoint is
along the table's width. In other words, |
textPos |
Passed to |
titleCex |
Character expansion factor for column titles. |
titleLift |
Numeric value for how many row-heights to raise column titles above top row of column contents. |
contentCex |
Character expansion factor for column contents (all column text except the titles). |
columnsPlotRefLine |
Optional numeric value. If not NULL, how many row-heights below bottom row of column contents to print the phrase "Reference State:" (or "Reference <placeType>:") as a label for bottom row of columns plot. |
tikzText |
Logical, for whether or not to format text for tikz plotting. |
This function is currently hardcoded to give a table with four columns,
with given column names. Users may wish to modify this code and write
their own table function, which can be swapped into tableFunction
within RankPlotWithTable
. Be aware that
RankPlotWithTable
uses layout
to arrange
the table and plot side-by-side, so layout
cannot be used within
a new tableFunction
.
RankPlotWithTable
and RankPlot
.
# Table of US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) # Just as inside RankPlotWithTable(), # we have to set par(xpd=TRUE) # and adjust the plotting margins oldpar <- par(no.readonly = TRUE) oldmar <- par('mar') par(xpd=TRUE, mar=c(oldmar[1],0,oldmar[3],0)) with(TravelTime2011, RankTable(ranks = Rank, names = State, est = Estimate.2dec, se = SE.2dec, placeType = "State")) par(oldpar)
# Table of US states' mean travel times to work, from the 2011 ACS data(TravelTime2011) # Just as inside RankPlotWithTable(), # we have to set par(xpd=TRUE) # and adjust the plotting margins oldpar <- par(no.readonly = TRUE) oldmar <- par('mar') par(xpd=TRUE, mar=c(oldmar[1],0,oldmar[3],0)) with(TravelTime2011, RankTable(ranks = Rank, names = State, est = Estimate.2dec, se = SE.2dec, placeType = "State")) par(oldpar)
A dataset containing the estimated mean travel time (in minutes) to work of workers 16 years and over who did not work at home (henceforth "mean travel time to work"), and its estimated standard error, for each of the 51 states (including Washington, D.C.), from the 2011 American Community Survey.
TravelTime2011
TravelTime2011
A data frame with 51 rows and 7 variables:
state rank, by estimated mean travel time, where 1 is lowest travel time and 51 is highest
full name of the state
estimated mean travel time, in minutes
estimated standard error of the estimated mean travel time, in minutes
postal abbreviation of the state
factor variable for geographic region of the state: Northeast, South, Midwest, West, Pacific
Federal Information Processing Standard (FIPS) code of the state; may be useful for linking with other datasets
A dataset containing the estimated mean travel time (in minutes) to work of workers 16 years and over who did not work at home (henceforth "mean travel time to work"), and its estimated Margin of Error at the 90% confidence level, for each of the 51 states (including Washington, D.C.), from the 2011 American Community Survey.
TravelTime2011.1dec
TravelTime2011.1dec
A data frame with 51 rows and 7 variables:
state rank, by estimated mean travel time, where 1 is lowest travel time and 51 is highest
full name of the state
estimated mean travel time, in minutes
estimated Margin of Error (at the 90% confidence level) of the estimated mean travel time, in minutes
postal abbreviation of the state
factor variable for geographic region of the state: Northeast, South, Midwest, West, Pacific
Federal Information Processing Standard (FIPS) code of the state; may be useful for linking with other datasets
Due to rounding, some ranks are tied in this version of the data. Also note that this dataset reports Margins of Error (MoEs) instead of standard errors.