Summary of strengths and weaknesses of each package in the context of analysing open-ended questions
A summary comparison of ATLAS.ti, MAXQDA, NVivo and QDA Miner in the context of the analysis of open-ended questions from surveys;
This section draws some comparisons between the four selected CAQDAS packages as an aid for users who may be considering which program to use for analyzing the responses to open-ended survey questions. There is no question of identifying which package is the ‘best’ because they each have different strengths and weaknesses, which may interact differently with the very wide range of circumstances that may be covered by the description “open-ended survey questions” (OEQs). In view of the time that it can take to become familiar with a new software program, we would suggest that if you have been using one of these packages already then that would probably be the best one for you to consider first. All four programs have been used successfully with the trial data so there are few critical weaknesses.
The particular packages examined here are ATLAS.ti 6, MAXQDA 2010, NVivo 8, and QDA Miner 3.2. Of these only QDA Miner could be described as having been designed for this particular application. The other three programs were all designed initially for mainstream qualitative data analysis, generally involving a moderate number of lengthy text documents. Here we are considering the problems arising from analysing a large number of short texts, which are typical with OEQs.
In our view, a key advantage of using a CAQDAS program to analyse this sort of data is the availability of semi-automation tools, such as word frequency counts, text searches, and autocoding. These tools can be combined in an inductive approach to derive concepts from the words used in the responses and identify accurately the cases that have used those concepts. Further benefit may be gained when data about the respondents, such as socio-demographic and other relevant variables, is combined with the thematic codes to reveal potential patterns of co-occurrence. But the real power of CAQDAS programs is the facility to keep close to the words recorded in the data capture process at all stages of the analysis, so that emerging ideas of possible relationships in the data can be tested against readings of the language used. All of these features are combined in an iterative process where the analyst uses a variety of program routines all guided by judgment decisions and interpretations based on skill and experience. Thus there is no single procedural path of steps to follow to achieve a ‘correct’ analysis, and it follows that differing styles of working practice between individual analysts will integrate differently with the programs reviewed here.
Presentation of the data and ease of reading the texts
As mentioned in several places in this website, a key analytical decision with OEQ data is whether to read and code the responses on a case by case or question by question basis . The case by case approach is central to qualitative analytical work and so all of the programs handle that efficiently. But in survey situations it will often be more useful to analyse all of the responses to a single question together in order to make effective comparisons between them, and for this approach the packages have different strengths and weaknesses.
In this respect MAXQDA has a clear advantage over the other three programs as it can display all of the responses to one question in one part of its main window and all of the responses by one case in another part of the window, with both panels linked interactively so that selecting a response in the question part causes the same response to be shown in the case part within the context of the other answers by that case. QDA Miner is almost as flexible but the user has to configure text search routines to generate the required lists within a separate output window, which is then linked interactively with the data appearing in the main window. NVivo and ATLAS.ti are both less flexible in this respect as these programs have a different architecture and the data needs to be organized into documents whose presentation style is then used unaltered when those documents are read within the software.
As the analysis proceeds and codes are attached to the responses to signify the presence of specific concepts in the texts, it becomes important to be able to see where codes have or have not been applied. All of the programs use coloured coding markers in a margin, but NVivo is less flexible than the other three programs, in that program the user has no control over the colour for any particular code, and the user has to specify which coding stripes are to be displayed at any one time. In ATLAS.ti, MAXQDA and QDA Miner it is possible to define a separate colour for each code as it is created (or common colours for different groups of codes), and when the response texts are displayed in the main working window all the codes that have been applied to any paragraph are shown by default. (NVivo v9 has added some user controlled colour functionality for codes).
Use of semi-automation tools
In most qualitative analysis work it would be normal practice for codes to be applied entirely under the manual control of the analyst, often by using a combination of mouse highlighting and clicking to select carefully identified passages of text and then to apply one or more codes to that marked segment. All of these programs handle that way of working with similar efficiency. But, as commented above, in the survey situation it can be advantageous to apply some codes automatically on the basis of the presence of specific words or phrases. We have identified three basic steps where computer power can be harnessed by the analyst to speed up this process for OEQ data.
All four programs have a word frequency function, however in MAXQDA and QDA Miner this is only available through an additional module of the package (MAXdictio in MAXQDA+, and WordStat with QDA Miner). In ATLAS.ti and NVivo the word frequency (and subsequent text searching) tool can only be restricted to the responses of a single question if those responses have been imported or assigned to the project in a single document, without any other question responses in the same document. QDA Miner’s WordStat module also includes a “Phrase finder” which searches for and counts groups of words occurring together in the data, and this can be very effective with OEQ responses. It is often useful to be able to exclude trivial or unhelpful words through the application of a ‘stop-list’. NVivo has a fixed list of words to exclude for each language (which can be effectively turned off by setting the language to ‘none’), whereas the other three programs all allow users to create their own stop-lists, if required, separately from a default list.
When a word has been identified as occurring frequently in the responses, and thus may be an indicator of a common theme or concept, it will be necessary to read a sample of the those instances in order to judge whether it has been used with sufficient consistency to be the basis for a code. QDA Miner in its WordStat module has a “Key Word in Context” (or KWIC) function which not only lists all of the instances but can also be sorted alphabetically according to either the word preceding the key word or the word following it and this can be very useful with OEQ responses. MAXQDA has very good integration of the word frequency and text searching functions, using separate windows for each which can be kept open and re-used easily, but each instance is displayed separately in its full context in the main window. In NVivo it is possible to view all of the instances in which a word has been used as a list derived by selecting that word from the word frequency report, but that list only shows a fixed number of words before and after the key word, although it can be linked interactively with the source data via a context menu, which makes this slightly less useful. ATLAS.ti does not integrate its word frequency tool with other functions, so separate text searches have to be run to explore the usage of any particular word in the data.
The third semi-automation tool is autocoding. By this we mean using a single command in the program to attach a particular code to all of the responses that match some specified criterion, most often those that include a specified word. This is most useful when it can be done in conjunction with a text search so that the analyst has the opportunity to check at least some of the selected responses immediately prior to the coding process. ATLAS.ti does not integrate the text search and autocode routine, but does allow the autocode to be run with user intervention at each ‘hit’ (to select code or skip). NVivo uses different options within a saved query to show a ‘preview’ list of search finds or to code all of the search finds, but this makes it difficult to exclude a small number of unwanted ‘hits’. MAXQDA and QDA Miner both make it possible for the analyst to exclude the unwanted ‘hits’ at the viewing stage and then to apply the code to all of the remaining ‘hits’.
A further aspect of the autocoding process is the decision as to how much text to code in the automation process. In MAXQDA and QDA Miner, the two relational database programs, the option to code the whole paragraph should generally capture the whole of each relevant response, with the identity of the respondent being always attached to any paragraph. In ATLAS.ti, provided the data has been prepared in the way we suggest, the option to code until the next “Multi Hard Return” effectively captures the whole response and the case identity. We have found it more difficult to find a combination of data preparation and query parameters to achieve this in NVivo, although the paragraph setting in the query does capture all the response text and the matrix coding queries can relate the thematic codes to the attributes in the casebook after autocoding by paragraph, it is just harder to locate a specific hit manually in the source document.
In some ways the analysis of OEQs may be seen as akin to content analysis, and there are more sophisticated procedures and tools available in that approach. QDA Miner with its WordStat module has a considerable array of further tools that are not matched in the other three programs reviewed here. However, MAXQDA with its MAXDictio module does have a dictionary function that may be roughly equivalent to the thesaurus function in the main QDA Miner program. These facilitate analysis by allowing users to build up sets of words that are likely to signify the presence of identified thematic concepts in response texts. This approach is laborious at first but is very powerful and effective if a similar analysis has to be repeated on different data, such as repeated surveys.
Looking for patterns amongst thematic codes and personal attributes
Although originally most CAQDAS programs were not developed with quantitative analysis techniques in mind, because generally the necessary conditions of sample size and randomness for quantitative abstraction are not found in qualitative research, some matrix tables are now available and their use can be justified in the OEQ situation where the remainder of the survey is quantitative.
It is anticipated that a common output required from such analysis will be a table of the frequencies with which the thematic codes have been applied to the responses to certain questions. With the data preparation strategies recommended in this section of this website, we have been able to generate these in report format and as a spreadsheet export in all four programs.
Another possible output is the export of thematic code frequencies at the case level (ie separately for each case) so that these can be imported into a more sophisticated statistics package for further analysis in conjunction with other data collected in the survey. This can be achieved effectively in QDA Miner, MAXQDA and NVivo (in increasing order of processing time when the number of respondents is large), but is not possible in ATLAS.ti when our suggested method of data preparation has been used.
We have identified effective ways of incorporating personal attributes, such as socio-demographic data, into the project datasets in the four CAQDAS programs, and these can be used effectively to create crosstabulation tables of thematic codes against the set of values for any one attribute in all of these programs. For ATLAS.ti this involves a fairly extensive workaround at the data preparation stage, while the other three programs can all import such data directly from a spreadsheet layout. All of the programs have facilities to extract all of the segments in the response texts that match a selected combination of theme and attribute value, and this should assist in the interpretation of any quantitative patterns that may be observed in the tables.
Ease of use
This sort of comparison is quite subjective and opinions may vary on the comments below. However, it is important to make some effort to consider this aspect, because of the iterative approach which seems to be necessary with this sort of data. To analyse OEQs effectively will require the use of several different tools in the chosen software, the application of human judgment, and the development of skills to move rapidly between these.
We have found NVivo to be quite laborious because of the extensive use of dialog screens and the need to save exploratory queries; it can do all of the tasks we require but it does take a lot of user effort to work out how to achieve the desired effects. ATLAS.ti can also require an investment of time to learn, and it is not ideally suited to the quantitative aspects of these tasks, but its large area of working screen is useful when the analysis has to be done by reading and manual coding. MAXQDA is probably the easiest program to learn, of the four reviewed here, and many of its functions seem to operate as one intuitively expects as well as integrating well with each other. QDA Miner is the most sophisticated and the most powerful when the number of cases is very large, (but it is considerably more expensive than the others) however the sheer range of options and facilities can be daunting for the novice user so it does require some investment of time and effort to identify what is useful and meaningful in relation to your particular data.
Subsequent to much of the work on which these web pages are based, NVivo and ATLAS.ti introduced new versions of their software with special “Survey Import” functions added. These have not been included in the comments above, or generally in other sections of this website. Our preliminary review of both of these new routines indicates that they are only marginally helpful, because in both programs the data that is imported with the new routine is not readily available for analysis with the aid of the semi automation tools on the basis of one question at a time. The new routine in ATLAS.ti v6.2 does create a project file from which one can output the code frequencies on a case by case basis (the one expected output that was not possible with the way we have recommended preparing the data for earlier versions of ATLAS.ti), but before that stage is reached the user will either have to apply all codes manually, or separate each question’s responses into a separate Hermeneutic Unit if autocoding is to be used while keeping each question distinct. There is a similar problem in NVivo v9, as the data imported with its Survey Import routine is treated as a single source. The potential workaround for both new routines of using separate projects for each question will probably work satisfactorily at one level but may cause understanding of the broader phenomena being studied to become fragmented and so be less effective.
Page created on 15th August 2011, written by Graham Hughes and Christina Silver