Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in ordered multinomial models with missing categories #28

Closed
dmphillippo opened this issue Jan 9, 2024 · 2 comments
Closed

Bug in ordered multinomial models with missing categories #28

dmphillippo opened this issue Jan 9, 2024 · 2 comments

Comments

@dmphillippo
Copy link
Owner

In ordered multinomial models where studies do not observe all categories, data points could be assigned the wrong category in a couple of scenarios:

  • Where a study does not observe the first category; absolute estimates in this study would be wrong
  • Where a study does not observe a middle category; relative and absolute estimates in this study would be wrong

For example, consider the HTA psoriasis data with categories removed from some studies:

Code
library(multinma)
library(dplyr)
library(tidyr)
library(ggplot2)

options(mc.cores = parallel::detectCores())

# Remove some categories
hta_dat <- hta_psoriasis %>% 
  mutate(PASI50 = ifelse(studyc == "Elewski", NA, PASI50),
         PASI75 = ifelse(studyc == "Leonardi", NA, PASI75))

hta_net <- set_agd_arm(hta_dat,
                       study = paste(studyc, year),
                       trt = trtc,
                       r = multi(r0 = sample_size - rowSums(cbind(PASI50, PASI75, PASI90), na.rm = TRUE),
                                 PASI50, PASI75, PASI90,
                                 inclusive = FALSE,
                                 type = "ordered"))
hta_net

hta_fit_FE <- nma(hta_net,
                  trt_effects = "fixed",
                  link = "probit",
                  prior_intercept = normal(scale = 100),
                  prior_trt = normal(scale = 10),
                  prior_aux = flat())

hta_fit_FE

predict(hta_fit_FE, 
        baseline = distr(qnorm, mean = -1.097, sd = 123^-0.5), 
        type = "response")

# Plot predictions against observed data
hta_pred_FE <- predict(hta_fit_FE, type = "response")

hta_pred_mod <- 
  hta_pred_FE %>% 
  as_tibble() %>% 
  mutate(method = "NMA",
         outcomef = recode_factor(gsub("^pred\\[.+: (.+), PASI(50|75|90)\\]$", "\\2", parameter), 
                                  "50" = "PASI 50",
                                  "75" = "PASI 75",
                                  "90" = "PASI 90"),
         .trt = factor(gsub("^pred\\[.+: (.+?)(, PASI(50|75|90))?\\]$", "\\1", parameter), 
                       levels = levels(hta_net$treatments)),
         estimate = mean, conf.low = `2.5%`, conf.high = `97.5%`)

# Get observed proportions
get_binom_ci <- function(r, n) {
  if (is.na(r)) return(tibble(estimate = NA_real_, conf.low = NA_real_, conf.high = NA_real_))
  bt <- binom.test(r, n)
  tibble(estimate = bt$estimate,
         conf.low = bt$conf.int[1],
         conf.high = bt$conf.int[2])
}

hta_pred_obs <- 
  hta_dat %>%
  mutate(studyc = paste(studyc, year)) %>% 
  pivot_longer(matches("PASI(50|75|90)"), names_to = "outcome", values_to = "value",
               names_pattern = "(PASI(?:50|75|90))") %>% 
  rowwise() %>% #group_by(studyc, trtc, outcome) %>%
  summarise(studyc, trtc, outcome, get_binom_ci(value, sample_size))%>% 
  mutate(method = "Observed", 
         outcomef = recode_factor(outcome,
                                  PASI50 = "PASI 50",
                                  PASI75 = "PASI 75", 
                                  PASI90 = "PASI 90"),
         .trt = factor(trtc, levels = levels(hta_net$treatments)),
         .study = factor(studyc, levels = levels(hta_net$studies)))

hta_pred_all <- bind_rows(hta_pred_mod, hta_pred_obs) %>% 
  mutate(methodf = factor(method, levels = c("NMA", "Observed")))

# Plot
ggplot(hta_pred_all,
       aes(x = estimate*100, xmin = conf.low*100, xmax = conf.high*100, 
           y = forcats::fct_rev(.trt), shape = methodf, colour = methodf)) +
  geom_errorbar(width = 0, position = position_dodge(width = 0.75), size = 0.3) +
  geom_point(position = position_dodge(width = 0.75), fill = "white") +
  xlab("Percent achieving PASI outcome") + ylab("Treatment") +
  coord_cartesian(xlim = c(0, 100)) +
  facet_grid(.study~outcomef) +
  scale_shape_manual("Method", values = c(`NMA` = 16, Observed = 21)) +
  scale_colour_manual("Method", values = c(`NMA` = "#00468B", Observed = "#8B0000")) +
  theme_multinma()

The output plot with the current package version (0.5.1) with the issue looks like this, whereas the fixed version with the correct categories looks like this.

The issue is particularly noticeable in this example for studies omitting the first category (PASI 50).

dmphillippo added a commit that referenced this issue Jan 9, 2024
Data points could be assigned the wrong category in a couple of scenarios:
* Where a study does not observe the first category; absolute estimates in this study would be wrong
* Where a study does not observe a middle category; relative and absolute estimates in this study would be wrong
@dmphillippo
Copy link
Owner Author

Fix merged into development branch, for next CRAN release.

@dmphillippo
Copy link
Owner Author

Now on CRAN, with the release of version 0.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant