Skip to content

Theoretical Justification for Conjoint Ensembles

Derek Miller edited this page Oct 30, 2018 · 2 revisions

Introduction to the Problem

Consider the ideal conjoint study. A respondent begins the survey that collects their choice data. She reads all of the instructions beforehand, carefully considers each alternative and corresponding features, and chooses the alternative that maximizes her utility. This behavior is constant throughout every question and she remains focused and attentive until the last choice task is completed.

Such an ideal situation is rarely the case. When respondents deviate from this ideal in systematic ways, we say that a pathology is present in the data. Researchers have proposed various strategies to compensate for the effects of data pathologies.

Solution 1 Use a tailored model to correct each pathology

A data pathology is noticed in some discrete choice experiment, and a new model is derived to handle the effects of the pathology. In the context of conjoint studies, this approach has several disadvantages.

  • In order to account for the pathology, tailored models must increase in complexity. This complexity is both technical and conceptual.
  • Tailored models are more computationally intensive than standard models.
  • Tailored models are not robust to pathologies in the data other than those accounted for by the model.

Solution 2 Use a Bayesian mixture model to account for all pathologies

Ideally, we could take a principled Bayesian approach by carefully considering data pathologies and incorporating their effects into a prior probability distribution. In this framework, a mixture prior can account for multiple data pathologies and their interactions. Additionally, we can still use the multinomial logit model; accounting for the pathologies in the prior means that we don't have to change the structure of the model. With this, we get accurate inference on all of the parameters jointly and the resulting predictions are more reliable. Of course, the ideal case is unrealistic and presents a variety of problems.

  • If there are only a few relatively minor pathologies, a mixture approach is best (although from our empirical research, it probably wouldn't differ much from the standard multinomial logit results). However, if the pathology is nontrivial, the computational intensity spikes. Accounting for more than one pathology may even lead to a model that is computationally intractable.
  • A Bayesian mixture model assumes that knowledge of the pathologies is known beforehand and can be incorporated into the prior in a way that does not conflict with the structure of the other pathologies.
  • Adding additional structure for pathologies that in reality are not present is undesirable but largely unavoidable. So we are left with a method that is complete in its scope but that is computationally unrealistic and does not scale linearly with the addition of more pathologies.

Solution 3 Use a stacked ensemble to approximate the ideal mixture model

Assuming that the ideal model is a mixture, we can approximate that mixture through a stacked ensemble of submodels that account for the data pathologies. The way to think about this is that if each submodel contains a prior that can identify a pathology in the data set, it will generate predictions that account for the effects of the pathology. On its own, each submodel may give sub-standard results. When combined, however, the ensemble can outperform each model individually. We prefer to use stacking of predictive distributions as the ensemble method (Yao et al, 2018).

This stacked ensemble does come with some disadvantages. Our inferences are not guaranteed, for example. However, this is an approach that is scalable in the number of pathologies, is easier to understand conceptually, is fairly easy to implement, and gives empirically good results (pending). The ensemble will continue to improve as the priors for each submodel are tailored to identify the pathological substructures in the data and more models of pathologies are stacked with existing ones. For more information, see our paper (pending).

Clone this wiki locally