How to clean the data from a survey?

You designed an online survey, distributed it, collected an interesting critical mass of data… is your work done?

How to separate the noise from the signal in our survey: four tips to keep in mind when analyzing data

You designed an online survey, distributed it, collected an interesting critical mass of data… is your work done? Unfortunately (and fortunately, we’ll see) no. In order to start making strategic decisions based on the information gathered, perhaps the most important phase of the surveys is missing: the data analysis phase, which allows us to separate the signal from the noise and gives us the real observations.

The objective of this process in which we clean the information is to optimize the responses obtained in the online survey. Skipping this process is risky, given that it can limit the capacity of obtaining key information and bias the credibility of the conclusions we draw.


The respondents who answer only part of the survey may generate bias in the overall results either because they were not qualified to respond or because they were not truly motivated by the form
 
So, to understand which respondents should be excluded from the analysis and to review the nature of their responses, here are some important criteria to consider when curating the information:

“Lazy” responses

The respondents who answer only part of the survey may generate bias in the overall results either because they were not qualified to respond or because they were not truly motivated by the form. Also, if at this phase several respondents did not complete the online survey, this may be a sign that there were problems in it’s design.

This is why it is sometimes necessary to take a few minutes to evaluate the completion of the survey and draw some conclusions: how many people left one or more responses incomplete? What was the most missed response?

This data will enable you to make decisions prior to the analysis like, for example, omitting the responses of the laziest respondents or, if necessary, removing a response from the analysis.

Meaningless responses

When it comes to the mandatory open-ended responses, there is always someone who fills the text box with a series of randomly typed letters, such as “jdgsgre”. Clearly this information will not allow you to make too much progress in your analysis.

We recommend you browse into the participants tab to detect these responses as soon as possible and remove them. If possible, tag the responses that are a signal, in order to later filter and exclude those noisy comments.

Implausible responses

Have you heard about atypical cases? These are those responses that are excessively out of range from the rest of the participants. That is, unrealistic responses.

Let’s see it with an example. Suppose you are realizing a survey on coffee consumption habits and you ask: how many coffees do you usually drink per day? What do we do with those who confess to drinking 50 coffees per day (which means more than three per hour in a 16 hour shift)?

Those responses will certainly distort the results of your online survey, so we need to trim them.

Just as we recommended in the previous case, you can use tools to filter and select these responses and, when you already have, remove them.

Responses from outsiders

What happens when the person who responds to the online survey does not meet the requirements of the target? Continuing with the previous example, you want to get to know the world of coffee lovers, but the person who responds does not drink any coffee. In these cases, the basic technique to avoid meeting these participants is developing a closed-ended qualifying question at the beginning of the form, to determine whether or not the respondent meets the criteria of the defined target. In this case, the closed-ended question would be: Do you drink at least one coffee a day? YES/NO.

However, if you forgot to apply that filter question in your online form, you can still retroactively create a personal data field for each respondent (for example, sociodemographic data at the end of the survey). Then, you can filter according to the personal data to focus on the responses of the target you are interested in studying.

Cleaning up online survey data is a necessary and very rewarding practice. Focusing resources on this process will allow you to optimize information, separate the wheat from the chaff and, ultimately, make better decisions.

Suscríbete a nuestro blog y recibe notificaciones de nuevas notas

Comienza ahora a tomar mejores decisiones

Crear cuenta

Gratis - Sin tarjeta de crédito - Cancela en cualquier momento