Using web based surveys for research is cheaper and the data collection can be quicker and easier to administrate than postal, telephone or face-to-face surveys. However, they do come with their own challenges. There are two problems: the first is that the whole of the UK population is not online (13% of the UK population has never used the internet). This could be overcome by using a mixed-mode data collection (ref). However, when considering surveying just internet users, this would not be a problem. The second problem, and by far a bigger problem, is the issue of sampling. Generalisability of results can only be claimed if the sample is a probability sample, and this is extremely difficult to do for web-based surveys that are for the entire population, as there is no sampling frame even if the whole population is online.
There are two scenarios where you could have a probability based sample with a web based survey:
- Sampling a closed population: You have a list of email addresses for the population (for example, lists of company employees, university staff members, or magazine subscribers) that you take a sample from, and they are then sent the web survey link
- The Postcode Address File (PAF) is used as a frame for general population random probability sampling. Sample members are contacted via post, and asked to fill in an online questionnaire (where they have internet access), or posted a paper based questionnaire if they don’t. This could reduce cost, but not as much as using a non-probability sample.
The non-probability sampling is usually an opportunity/convenient sample. This is where the web based survey is open to everyone and anyone can participate. Participants are recruited via pop-up surveys, online advertisements or through direct emails sent to samples selected from lists constructed from various sources. The problem is that statistical inference is difficult to do and you are more likely to get respondents who have strong opinions about the subject. However, this type of sampling can be valuable for hard-to-reach (although internet connected) populations. Under certain assumptions, convenience samples can also be used for model- based inference (ref). Others argue that though a convenient sample is less robust, any survey with a low response rate is also not a random probability sample, and similar biases can be found there too.
To limit the problems associated with a convenient sample, some suggestions have been recommended:
- Sample matching or Quota Sampling is where you select people non-randomly according to some fixed quota, and this is often used by online panels, in marketing research. There are two types:
- “Proportional quota sampling – you want to represent the major characteristics of the population by sampling a proportional amount of each. For instance, if you know the population has 40% women and 60% men, and that you want a total sample size of 100, you will continue sampling until you get those percentages and then you will stop. So, if you’ve already got the 40 women for your sample, but not the sixty men, you will continue to sample men but even if legitimate women respondents come along, you will not sample them because you have already “met your quota.” The problem here (as in much purposive sampling) is that you have to decide the specific characteristics on which you will base the quota. Will it be by gender, age, education race, religion, etc.?
- Nonproportional quota sampling is a bit less restrictive. In this method, you specify the minimum number of sampled units you want in each category. here, you’re not concerned with having numbers that match the proportions in the population. Instead, you simply want to have enough to assure that you will be able to talk about even small groups in the population. This method is the nonprobabilistic analogue of stratified random sampling in that it is typically used to assure that smaller groups are adequately represented in your sample.” (Taken directly from Source)
- Weighting: Post-stratification weighting is viewed as a common solution to removing sample bias from convenience sample and is used by market research companies. The implicit assumption is that people that were surveyed in a particular demographic group are representative of the people that were not surveyed in that group. The propensity score adjustment has been discussed in a recently published book Online Panel Research: A Data Quality Perspective (page 274), and they claim this can be used to correct the deficiency of non-probability web based surveys and make the results projectable to the entire population (more on propensity weighting here too). However, some argue that this is not valid and is problematic (ref). Others have used linear weighting, multiplicative weighting and more generally calibration to reduce bias (ref, page 276). It is still not clear how successful these weighting techniques are and which biases they aim to address (a research group is looking into this).
There is a huge need for a probability-based web panel in the UK that researchers can use for survey based research (ref), and researchers are currently look into the possibility of creating such a panel (ref). The current provision of online panels (that are non-probability based) in the UK, including costs are listed here. In summary, probabilistic or random sampling methods are preferred over non-probabilistic one in research because they are more accurate and rigorous. However, there may be circumstances (especially in applied social research) where random sampling is not feasible, practical or theoretically sensible to do, in which case a convenient sample is used (ref).