ODK form design tips that improve data quality

Aug 17, 2015

The team at Nafundi has been designing ODK forms for almost a decade. In that time, we've learned a few form design techniques that has improved data quality for our clients. In this post, we share a few of these techniques with our fellow form designers.

Boost accountability with timestamps and GPS location

When running a campaign, it's important to know that each surveyor spent roughly the same amount time on each survey and collected the data in the correct location. For example, if in a hour long randomized survey, one surveyor has five minute surveys clustered near a local bar, that surveyor needs follow up! Adding timestamps and GPS location is a great way to track time spent and identify outliers that need follow up. The example below shows how to implement this in XLSForm.

type name label required
start start_time
end end_time
geopoint gps_location Where are you? false()

When using timestamps, note that the start time is measured when the form was first opened and the end time is measured the last time the form was saved (and not when the form was first finished). If a surveyor opens a survey at 10 am, marks the form as finished and saves it at 11 am, the time difference is 1 hour. If the surveyor then eats lunch at 2 pm, edits that saved form, makes one change, and saves that change, the time difference is now 4 hours.

When using GPS locations, it is best practice not to make the question required. GPS devices can take up to 15 minutes to get lock (especially if you don't have a clear view of the sky). Don't prevent surveyors from completing a form because they can't get GPS coordinates. Instead, strongly encourage them to try, but train them to skip the question if they end up waiting for too long.

Use constraints for impossible values and relevance for improbable ones

Most ODK form designers know about constraints and they are very useful in controlling data quality. For example, ages should never be less than zero. But what about the unlikely ages of 95 or 100? Should those be forbidden too? Of course not!

Rather than force surveyors to enter bad data or reject interesting outliers, consider using relevance (how ODK does branching) to show a note as a warning for improbable values. In example below, the only constraint is that age has to be greater than or equal to 0. For improbable values, we add a follow on note, a warning, that is only relevant if the age is greater than 95.

type name label relevance constraint
integer age_years What is your age in years? ${age_years} >= 0
note age_warning Ages over 95 are unusual. If you have confirmed this age is correct, please continue. If the age is not correct, please swipe back and change your answer. ${age_years} > 95

Relevance is also a very useful strategy to increase data quality for strongly-coded surveys. For example, if a survey wants to capture a participant's organization, use of a select one list with the top 20 organizations will ensure the data is easy to enter and analyze. But to capture the wide range of organizations that a participant can work for, add a 21st option called Other and add a follow up free text question that is only relevant when Other is selected.

type name label relevance
select_one organizations organization What organization do you work for?
text organization_other It looks like your organization was not in the previous list. Please specify the organization you work for. ${organization} = 'other'

Cascading selects, previous values, and more

We've put the remaining techniques like cascading selects to reduce manual entry, printing previous values for context, and others into an annotated Excel file that provides practical examples of these techniques in the XLSForm format. Download ODK Form Design Tips today and start improving your data quality!

Recent posts