Package 'RankingProject'

Title: The Ranking Project: Visualizations for Comparing Populations
Description: Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>.
Authors: Jerzy Wieczorek [cre, aut] , Joel Beard [ctb], Adam Hall [ctb], Andy Liaw [ctb], Robert Gentleman [ctb], Martin Maechler [ctb]
Maintainer: Jerzy Wieczorek <[email protected]>
License: GPL-2
Version: 0.4.0.9002
Built: 2025-02-28 05:17:08 UTC
Source: https://github.com/civilstat/rankingproject

Help Index


The Ranking Project: Visualizations for Comparing Populations

Description

Functions to generate plots and tables for comparing independently-sampled populations. Companion package to "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals" by Wright, Klein, and Wieczorek (2019) <DOI:10.1080/00031305.2017.1392359> and "A Joint Confidence Region for an Overall Ranking of Populations" by Klein, Wright, and Wieczorek (2020) <DOI:10.1111/rssc.12402>. See the Intro vignette (html) for an overview and examples: vignette("intro", package = "RankingProject"). See the Primer vignette (pdf) for code which replicates the main figures from the 2019 article: vignette("primer", package = "RankingProject"). See the Joint vignette (pdf) for code which replicates the main figures from the 2020 article: vignette("joint", package = "RankingProject").

Details

The "comparison" plots are based on figures and S code from Almond et al. (2000). The present package does not contain a direct modification of their S code, but draws inspiration from it. Their script was originally hosted at Statlib at http://stat.cmu.edu/S/comprB and may still be found at Statlib mirrors such as http://ftp.uni-bayreuth.de/math/statlib/S/comprB.

The code for the "columns" plots is directly based on R's stats::heatmap() function, with minor modifications to remove dendrograms and allow the heatmap to be placed inside a larger layout().

References

Almond, R.G., Lewis, C., Tukey, J.W., and Yan, D. (2000). "Displays for Comparing a Given State to Many Others," The American Statistician, vol. 54, no. 2, 89-93, DOI:10.1080/00031305.2000.10474517.

Klein, M., Wright, T., and Wieczorek, J. (2020). "A Joint Confidence Region for an Overall Ranking of Populations," Journal of the Royal Statistical Society: Series C, vol. 69, no.3, 589-606, DOI:10.1111/rssc.12402.

Wright, T., Klein, M., and Wieczorek, J. (2019). "A Primer on Visualizations for Comparing Populations, Including the Issue of Overlapping Confidence Intervals," The American Statistician, vol. 73, no. 2, 165-178, DOI:10.1080/00031305.2017.1392359.


Figure containing a plot of ranking data.

Description

RankPlot creates a figure with a plot of ranking data, from among several options for showing uncertainty in the ranked estimates. This function is meant for use within RankPlotWithTable, which draws a ranking table aligned with this plot of the data in one combined figure.

Usage

RankPlot(
  est,
  se,
  names,
  refName = NULL,
  confLevel = 0.9,
  plotType = c("individual", "difference", "comparison", "columns"),
  tiers = 1,
  GH = FALSE,
  multcomp.scope = ifelse(plotType == "individual", "none", "demi"),
  multcomp.type = c("bonferroni", "independence"),
  tikzText = FALSE,
  cex = 1,
  tickWidth = NULL,
  rangeFactor = 1.2,
  textPad = 0,
  legendX = "topleft",
  legendY = NULL,
  legendText = NULL,
  lwdReg = 1,
  lwdBold = 3,
  thetaLine = 1,
  xlim = NULL,
  Bonferroni
)

Arguments

est, se

Vectors containing the point estimate and its standard error for each area.

names

Vector containing the name of each area. Abbreviations may be preferable to full names (e.g. "CO" instead of "Colorado") since these names will be displayed directly on the plot.

refName

String containing the name of the reference area; must be one of the values in names. Required for plotType = c("difference", "comparison"). Optional for plotType = "individual" (where it only determines the row above/below which the names are plotted to the right/left of the intervals; if unspecified, defaults to median rank); or for plotType = "columns" (where it selects one column to be highlighted by vertical lines, if specified).

confLevel

Number between 0 and 1: confidence level for individual (uncorrected) hypothesis tests and/or confidence intervals. E.g. with plotType = "individual", confLevel = 0.9 will plot individual 90% confidence intervals. If using GH = TRUE and/or multcomp.scope != "none", the Goldstein-Healy and/or Bonferroni/Independence corrections will be applied to the confLevel baseline.

plotType

Which type of ranking plot to use. See vignettes for examples and details.

  • "individual" is used for usual individual confidence intervals, with or without Goldstein-Healy adjustment and/or (demi or full) Bonferroni/Independence corrections.

  • "difference" shows confidence intervals for the differences between the reference area refName and all other areas.

  • "comparison" also compares the reference area refName to all others, but using the "comparison intervals" of Almond et al. (2000).

  • "columns" plots a grid of shaded columns, where each column uses shading to report demi-Bonferroni/Independence-corrected significance tests for comparing the reference area (labeled at the bottom of the column) with all other areas.

tiers

Numeric, either 1 for usual confidence intervals, or 2 for two-tiered intervals. 2 can only be used with plotType = "individual", when either GH = TRUE or multcomp.scope != "none" or both. In that case, the "inner tiers" run between each interval's cross-bars, and the "outer tiers" run past the cross-bars all the way to the ends of each interval. One of the tiers will show uncorrected confLevel*100% confidence intervals, and the other tier will show the Goldstein-Healy and/or Bonferroni/Independence adjusted intervals. A legend will show which tier is which; usually Goldstein-Healy alone gives shorter intervals (inner tier), but Bonferroni/Independence corrections make them into longer intervals (outer tier).

GH

Logical, for whether or not to plot adjusted confidence intervals at an "average" confLevel*100% confidence level as in Goldstein and Healy (1995). Can only be used with plotType = "individual".

multcomp.scope

Whether to correct for multiple comparisons, and if so, for how many (by a correction to the confidence level of the tests or intervals). "none" performs no correction; "demi" corrects for comparing one reference area to all n-1 other areas; and "full" corrects for comparing all possible choose(n, 2) pairs of areas. Also use the multcomp.type argument to specify whether the correction should rely on Bonferroni (default) or on an assumption of Independence. If GH = TRUE, the Goldstein-Healy adjustment is performed first, and any Bonferroni/Independence correction is applied afterwards. Settings "none" and "full" can only be used with plotType = "individual"; all other plot types use the setting "demi".

multcomp.type

(Only used if multcomp.scope != "none".) Whether multiple comparison corrections should use a Bonferroni correction ("bonferroni") or an independence-based correction ("independence"). See Section 4 of the paper "A Joint Confidence Region..." (2020, JRSS-C) for the difference in these two corrections.

tikzText

Logical, for whether or not to format text for tikz plotting.

cex

Character expansion factor for the points use to plot each area's point estimate, and for the text used to plot each area's name next to its interval.

tickWidth

Numeric height of the cross-bars on interval endpoints (or inner tiers, if tiers = 2). The function tries to leave a reasonable amount of space between intervals plotted in different rows, but sometimes it may help to adjust tickWidth manually.

rangeFactor

Numeric multiple by which to expand the range of the data when setting the x-axis limits. The function tries to leave sufficient room for plotting margins of error and names next to each area, but sometimes it may help to adjust rangeFactor manually.

textPad

Numeric amount by which to shift the text of names past the interval endpoints when plotting. Positive values shift outwards (towards the edges of the plot); negative values shift inwards.

legendX, legendY

The x and y co-ordinates used to position the legend; see legend for details on specifying x by keyword.

legendText

String, or string vector, with legend text. By default, each plot type adds informative legend text, but the user may override. To remove legends entirely, set legendText=NA.

lwdReg

Positive number for the line width of regular lines. Used for all intervals when plotType = "individual", or for intervals not significantly different from the reference area when plotType = c("difference", "comparison").

lwdBold

Positive number for the line width of bold lines. Used for intervals significantly different from the reference area when plotType = c("difference", "comparison").

thetaLine

Number for how many lines below bottom axis to display "theta" or other default x-axis labels (which depend on plotType).

xlim

Vector of 2 numbers for x-axis limits. If NULL, will be automatically set using range of data expanded by rangeFactor.

Bonferroni

Deprecated name for the multcomp.scope argument.

Details

Users may wish to modify this code and write their own plot function, which can be swapped into figureFunction within RankPlotWithTable. Be aware that RankPlotWithTable uses layout to arrange the table and plot side-by-side, so layout cannot be used within a new figureFunction.

See Goldstein and Healy (1995) for details on the "average" confidence level procedure used when GH = TRUE. See Almond et al. (2000) for details on the "comparison intervals" procedure.

References

Almond, R.G., Lewis, C., Tukey, J.W., and Yan, D. (2000). "Displays for Comparing a Given State to Many Others," The American Statistician, vol. 54, no. 2, 89-93.

Goldstein, H. and Healy, M.J.R. (1995). "The Graphical Presentation of a Collection of Means," JRSS A, vol. 158, no. 1, 175-177.

See Also

RankPlotWithTable and RankTable.

Examples

# Plot of 90% confidence intervals for differences
# between each state and Colorado, with demi-Bonferroni correction,
# for US states' mean travel times to work, from the 2011 ACS
data(TravelTime2011)
with(TravelTime2011,
     RankPlot(est = Estimate.2dec, se = SE.2dec,
              names = Abbreviation, refName = "CO",
              confLevel = 0.90, cex = 0.6,
              plotType = "difference"))

Figure containing aligned table and plot of ranking data.

Description

RankPlotWithTable aligns a table of ranking data with a plot of the data, in one combined figure. See RankTable and RankPlot for details about the default table and plot functions, including arguments that can be passed to those functions.

Usage

RankPlotWithTable(
  tableParList,
  plotParList,
  tableFunction = RankTable,
  plotFunction = RankPlot,
  tableWidthProp = 3/8,
  tikzText = FALSE,
  annotRefName = NULL,
  annotRefRank = NULL,
  annotX = 0
)

Arguments

tableParList

A required named list of arguments that will be passed to tableFunction using do.call(). The default tableFunction is RankTable, which requires at least these four arguments: ranks, names, est, se.

plotParList

A required named list of arguments that will be passed to plotFunction using do.call(). The default plotFunction is RankPlot, which requires at least these three arguments: est, se, names.

tableFunction

The function to use for plotting a table of the data on the left-hand side of the layout. Default is RankTable.

plotFunction

The function to use for plotting a figure of the data on the right-hand side of the layout. Default is RankPlot.

tableWidthProp

A number between 0 and 1, for what proportion of the layout's width should be used to plot the table. The remaining proportion 1-tableWidthProp is used to plot the figure.

tikzText

Logical, formats text for tikz plotting if TRUE.

annotRefName, annotRefRank

Optional rank and name of the reference area, for adding an extra annotation below the figure created by plotFunction. Currently centered at 0 on x-axis, so only useful when plotType = "difference". If provided, the list must contain two required named elements (refFullName and refRank, the reference area's name and rank)

annotX

A number, showing where on the x-axis to center the annotation if annotRefName and annotRefRank are not NULL.

Details

Users may write their own table and plot functions to swap into tableFunction and plotFunction. Be aware that RankPlotWithTable uses layout to arrange the table and plot side-by-side, so layout cannot be used within either tableFunction or plotFunction. This can also cause trouble for using the lattice package within plotFunction.

See Also

RankPlot and RankTable.

Examples

# Table with plot of individual 90% confidence intervals
# for US states' mean travel times to work, from the 2011 ACS
data(TravelTime2011)
tableParList <- with(TravelTime2011,
  list(ranks = Rank, names = State,
       est = Estimate.2dec, se = SE.2dec,
       placeType = "State"))
plotParList <- with(TravelTime2011,
  list(est = Estimate.2dec, se = SE.2dec,
       names = Abbreviation,
       confLevel = .90, plotType = "individual", cex = 0.6))
RankPlotWithTable(tableParList = tableParList,
  plotParList = plotParList)

# Illustrating the use of annotRefName and annotRefRank:
# Table with plot of 90% confidence intervals for differences
# between each state and Colorado, with demi-Bonferroni correction
plotParList$plotType <- "difference"
plotParList$refName <- "CO"
RankPlotWithTable(tableParList = tableParList,
  plotParList = plotParList, annotRefName = "Colorado",
  annotRefRank = TravelTime2011$Rank[which(TravelTime2011$Abbreviation == "CO")])

Figure containing a table of ranking data.

Description

RankTable creates a figure with a table of ranking data. This may not look very good plotted on its own. Rather, it is meant for use within RankPlotWithTable, which draws this table aligned with a plot of the data in one combined figure.

Usage

RankTable(
  ranks,
  names,
  est,
  se,
  placeType = "State",
  col1 = 0.15,
  col2 = 0.6,
  col3 = 0.85,
  col4 = 1,
  textPos = 2,
  titleCex = 0.9,
  titleLift = 1.5,
  contentCex = 0.7,
  columnsPlotRefLine = NULL,
  tikzText = FALSE
)

Arguments

ranks

Vector containing the rank of each area.

names

Vector containing the name of each area.

est, se

Vectors containing the point estimate and its standard error for each area. See vignettes for examples of using formatC to turn the numeric estimates or SEs into strings, for printing with a consistent number of decimal places.

placeType

String, naming the type of places or units being ranked.

col1, col2, col3, col4

Numeric values between 0 and 1, showing where each column's right-hand-side endpoint is along the table's width. In other words, colJ should be the fraction of the table's total width at which the Jth column should end, if using default of right-aligned columns (unless textPos != 2). Use col4 = 1 unless you want the table to be narrower than the space available, or unless you switch to centered or left-aligned columns.

textPos

Passed to pos argument of text. Default of 2 ensures each column of text is right-justified.

titleCex

Character expansion factor for column titles.

titleLift

Numeric value for how many row-heights to raise column titles above top row of column contents.

contentCex

Character expansion factor for column contents (all column text except the titles).

columnsPlotRefLine

Optional numeric value. If not NULL, how many row-heights below bottom row of column contents to print the phrase "Reference State:" (or "Reference <placeType>:") as a label for bottom row of columns plot.

tikzText

Logical, for whether or not to format text for tikz plotting.

Details

This function is currently hardcoded to give a table with four columns, with given column names. Users may wish to modify this code and write their own table function, which can be swapped into tableFunction within RankPlotWithTable. Be aware that RankPlotWithTable uses layout to arrange the table and plot side-by-side, so layout cannot be used within a new tableFunction.

See Also

RankPlotWithTable and RankPlot.

Examples

# Table of US states' mean travel times to work, from the 2011 ACS
data(TravelTime2011)
# Just as inside RankPlotWithTable(),
# we have to set par(xpd=TRUE)
# and adjust the plotting margins
oldpar <- par(no.readonly = TRUE)
oldmar <- par('mar')
par(xpd=TRUE, mar=c(oldmar[1],0,oldmar[3],0))
with(TravelTime2011,
     RankTable(ranks = Rank, names = State,
               est = Estimate.2dec, se = SE.2dec,
               placeType = "State"))
par(oldpar)

Mean travel times to work, from 2011 ACS.

Description

A dataset containing the estimated mean travel time (in minutes) to work of workers 16 years and over who did not work at home (henceforth "mean travel time to work"), and its estimated standard error, for each of the 51 states (including Washington, D.C.), from the 2011 American Community Survey.

Usage

TravelTime2011

Format

A data frame with 51 rows and 7 variables:

Rank

state rank, by estimated mean travel time, where 1 is lowest travel time and 51 is highest

State

full name of the state

Estimate.2dec

estimated mean travel time, in minutes

SE.2dec

estimated standard error of the estimated mean travel time, in minutes

Abbreviation

postal abbreviation of the state

Region

factor variable for geographic region of the state: Northeast, South, Midwest, West, Pacific

FIPS

Federal Information Processing Standard (FIPS) code of the state; may be useful for linking with other datasets

Source

https://www.census.gov/


Mean travel times to work, from 2011 ACS, rounded to 1 decimal place.

Description

A dataset containing the estimated mean travel time (in minutes) to work of workers 16 years and over who did not work at home (henceforth "mean travel time to work"), and its estimated Margin of Error at the 90% confidence level, for each of the 51 states (including Washington, D.C.), from the 2011 American Community Survey.

Usage

TravelTime2011.1dec

Format

A data frame with 51 rows and 7 variables:

Rank

state rank, by estimated mean travel time, where 1 is lowest travel time and 51 is highest

State

full name of the state

Estimate.1dec

estimated mean travel time, in minutes

MOE.1dec

estimated Margin of Error (at the 90% confidence level) of the estimated mean travel time, in minutes

Abbreviation

postal abbreviation of the state

Region

factor variable for geographic region of the state: Northeast, South, Midwest, West, Pacific

FIPS

Federal Information Processing Standard (FIPS) code of the state; may be useful for linking with other datasets

Details

Due to rounding, some ranks are tied in this version of the data. Also note that this dataset reports Margins of Error (MoEs) instead of standard errors.

Source

https://www.census.gov/