Team working using CAQDAS packages
This information resource is for sociology researchers using, or planning to use, Computer Assisted Qualitative Data Analysis (CAQDAS) software packages for collaborative projects. This type of software is referred to as Qualitative data analysis software (QDAS). The general protocols for data processing and preparing for analytic work below derive from our experience of using the software and supporting a wide range of team-based projects.
Collaborative qualitative research: the extra dimensions of project management
Conducting research in team situations can pose a number of challenges. Here we raise broad considerations as starting points in planning to use CAQDAS packages within team projects. The dynamics of research teams can vary quite significantly, as can the role of software within them. For example, you may be involved in a local participatory project, a multi-disciplinary intervention evaluation or a cross-national study comprising many complex components. Whatever the situation qualitative software may play an important role.
The needs of particular projects will be quite different as will the way your chosen CAQDAS package is used. Our aim is to clarify processes and provide constructive ideas about the management of collaborative work in the context of team working. See our software reviews for specific information about collaborative processes using individual software packages.
In team research there should ideally be a CAQDAS co-ordinator who is responsible for systematising everyone’s use of a qualitative software package. Therefore while these protocols are useful for everyone in a team or even an individual, it is the co-ordinator who might operationalise some of the ideas coming from all the protocol sections at this resource, into a set of conventions to be followed by team members.
Key strategic issues in collaborative qualitative research are the research project design, the issue of multiple users of CAQDAS packages within a team project and the logistics of collaboration.
Research project design
Assuming that different teams of researchers are all contributing in some way to the research project, are they each pursuing similar research processes at individual sites? There could be disciplinary or methodological differences to each team’s contributions. Will it be relevant for just one team, several, or all teams to use a CAQDAS package to assist in the analysis? Are both within case and cross case analyses intended? Project design, though not merely revolving around what the CAQDAS package can do, can be reflected in structures and displays within the software project. Elements of the project can be isolated and interrogated while comparisons across the different elements can also be achieved once they are drawn together.
Multiple users within a team project
Most CAQDAS packages are single user software programs in which (although the terminology differs), the user creates a ‘project’ file and assigns or imports all the data files relevant to the research into that ‘project’. The project becomes a container or linking device for all further work. Different team members cannot usually feed into one ‘project’ from different work stations at the same time, although Transana, a video analysis package, is the exception to this rule in that it provides a specific multi-user version designed for team situations. Many planning issues have to be addressed because of this one circumstance. Separate projects can be ‘merged’ or one project (or several, in turn) can be imported into another. In some software packages specific parts of projects can be imported into another project. Team projects often seek to follow an incremental process of merging work repeatedly. Such a process might require certain early safeguards to be sure that the repeated merges work each time. Basic measures such as working inside the software with data files whose text is never edited or are read-only might be necessary.
Logistics of collaboration
Partly arising from the above but possibly independent of it are the dimensions and logistics of collaboration. Final reporting by each partner may be largely independent and site specific, as in within case analysis. There might be a long term resolve to combine and compare in the form of journal articles, conference papers and so perhaps this will not impinge on day to day decisions during analysis. In the case of cross case analysis the findings of individual members could be brought together to be compared and analysed at points during the process. The latter will obviously involve more planning and collaborative negotiation and agreement. This would especially the case in international projects where the potential for mismatches in confidence and competency levels with software is high.
General planning stage isues with CAQDAS use in mind include the numbers and types of project partners, the geographical distribution of team members, co-ordinating and planning for the use of CAQDAS packages, considering the kinds of data to be analysed, management of the project and the creation of protocols.
Numbers and types of partners
The greater the number of partners the more co-ordination and communication there has to be. The issue is further complicated by the use of CAQDAS packages and the differing levels of software and analytic experience amongst teams and their members. CAQDAS packages can assist in communication and co-analysis issues and may provide increased transparency and greater potential for rigorous enquiry. However it is wise not to underestimate the challenges of planning for the use of a CAQDAS package. If the teams are cross-disciplinary it is difficult to predict or generalise how close the co-ordination of individual processes of analysis needs to be.
If working from different countries, the time teams spend together will be much more difficult to arrange and will out of necessity be much more limited. Whatever the spread of the collaboration, co-ordination of training in CAQDAS, the negotiations of analysis and brainstorming in the light of CAQDAS use will all be necessary aspects of team preparation. If the same package is to be used by all members of the team so that work can be merged, there would be a minimum amount of time for a team to spend together to talk about data, analysis related issues and software related issues (with CAQDAS training included). Videoconferencing type tools, such as access grid node technology, Google talk or Skype could be investigated to improve the regularity and efficacy of electronic face to face team discussions.
Co-ordination and planning
Based on the factors mentioned above a suggested minimum time allowance would be 3 days for the co-ordination of and planning for the use of CAQDAS packages. The ideal format would be a sandwich structure, similar to the following:
- Day 1 - CAQDAS package training part 1: overview and considerations for planning, step by step familiarisation with basic tools
- Day 2 - Team planning and brainstorming: to include data collection issues, discussion of analytic strategies and development of data transcription protocols
- Day 3 - Software part 2: further step by step training in the context of team meeting and decisions
For teams who are not widely geographically dispersed the days could be at suitable intervals though the ideal would be for days 1 and 2 to be held on consecutive days.
Which software package and which version?
Ensure that if the same software is being used by all, extra measures are taken to ensure that the same version number is to be used. Do not take it for granted that software which is already in situ for all collaboration partners will be the same. The version number of the CAQDAS package may be crucial if and when the software ‘project’ files are to be transferred to research colleagues and merged. A CAQDAS ‘project’ file may not transfer down to an older version of the software.
The individual software package and the size of the update will vary the extent to which backward compatibility is possible.
What kinds of data will be analysed?
The type of dataset may be a clear indicator of what software is to be used. You may decide the use of software is not appropriate for some of the data because of methodological or ethical concerns. A complex mixed dataset may suggest the use of more than one software package. See sections on multimedia, survey data and data integration. Do you just have video data? Do you have video and textual data? Take a look at the comparative reviews of CAQDAS software.
Managing how the software is used
Regardless of who the overall research ‘leader’ is, ideally a separate CAQDAS co-ordinator with the appropriate skills should be appointed to take responsibility for planning the use of the software, organising the discussion and communication of findings and merging (if this happens) software ‘projects’. Ideally each sub-team also needs to have one person who can ensure that information is cascaded to the larger team and that protocols are followed.
Creation of teamworking protocols
In the team context it will be helpful for all team members to agree on protocols to ensure a consistency of approach (if this is required by the project design). Protocols can take the form of rigid agreements on formats and processes, or they can begin with an agreed framework that allows for certain areas of flexibility as analysis proceeds. Even that flexibility can be systematised to a degree so that other team members can see what, where and who is the departing from the norm so that it is noticed and subsequently discussed. Deliberate departures from the agreed norm might provide valuable insights for all. See "Teamworking protocols: introducing consistency with flexibility" below.
We have outlined the general considerations at the outset of planning for the use of a CAQDAS package in the team situation. The protocols below are for data processing and the preparation for analytic work. See also our software specific guidance for collaborative processes.
Teamworking protocols: introducing consistency with flexibility
These generic protocols are intended to enable efficient co-ordinated planning in the management of data. Collaborative project management and individual research projects can make use of these protocols (though they are more important for teams than for individuals). In some instances detailed minimal protocols are included to enable efficient planning. At other times more general advice is given since software specific considerations influence decisions to be made early in the process.
The production and use of protocols early in the planning and execution of data collection and then analysis will save time and confusion. Protocols which are followed in data preparation may make some structural coding processes faster and more efficient, e.g. via the auto coding of repeated questions, headings or speaker identifiers within or across multiple data files.
Minimal and optimal formatting is referred to. Optimal formatting is only desirable if data is structured, either with repeated headers across files or speaker identifiers as in group interviews or focus groups, or for very structured interview frameworks which have been adhered to in a standardised way for each respondent.
This section covers minimal data formatting protocols to enable efficient conversion of files into optimal format later if required.
- Minimal data formatting works for any software package
- Optimal formats may not be necessary – but if they are required for efficient processing they will vary for each CAQDAS package
- Optimal formatting information will be located in individual software information
- Optimal is easy to arrive at (if you have at least followed the minimal strategy first)
When we talk about optimal formats for use in CAQDAS packages – we are assuming we want these so that auto-coding operations can speed up certain types of organisational/analytic work in the data. The problem with optimal formats is that they are usually software specific, so an optimal format applied for NVivo data will not be relevant for ATLAS.ti or MAXqda.
What do we mean by minimal and optimal in this context?
The problem with an overly fussy protocol issued say to various transcribers is that it puts the onus on the transcriber (who might not be deeply involved in the project) to get it right. If on the other hand the transcriber is given a relatively simple minimal but strict protocol, it will be easier to convert all the files that are submitted by the transcribers to a more complex or optimal protocol if this is necessary later on. See table 1.
The files produced by the relatively simple minimal protocols are ones which later in Word – using Edit/Find/Replace– using the /More>>Format and Special options can be converted to a more complicated format. If one person is responsible for this – a smooth and speedy process can be developed and applied to every file. The format required may be different for different CAQDAS software.
A minimal protocol is also one which is useable in CAQDAS without doing anything further. It may not be the most efficient format in terms of being able to use all the autocoding tools but significant qualitative work can proceed. Table 1 below illustrates whether minimal or optimal strategies are preferable.
Minimal strategies for data transcription and handling - do’s and don'ts
Try not transcribe data without having first considered these minimal notes. Many of them are common sense but are useful seen together.
Starting the transcription
- Do not transcribe or input qualitative data into a table or into Excel (unless it is survey data and so is likely to be already in such a format – see notes below) – if data are in tables this will limit freedom to select small amounts of text and will prevent the use of certain types of coding and quick coding. You will have to convert tables to text in order to get rid of the table format.
Certain software packages, MAXqda and QDA Miner can take survey data, i.e. the open ended questions almost direct from Excel. If you have open-ended questions from Survey data, use some of these minimal suggestions but even better, go to survey data section. The instructions that follow here are not for survey data.
- Do - (for other qualitative data, interview transcripts, field notes etc.) - always use MS Word or similar (Word files can be used directly within some packages, or can be converted to rich text format if necessary)
- Do - make use of paragraph breaks – see table 1, also insert full stops to create sentences where it seems useful. (Some software MAXqda, ATLAS.ti, QDA Miner, recognise sentences as units of text – so this is useful)
- Do not – imagine that double spacing of data will be helpful in CAQDAS packages! You will want to see as much data as possible at any one time
- Do – use single spacing for the version of your file which will be used in the CAQDAS package
Identifiers, questions, headers
- Do not - only use numerals or only single letters to identify question numbers (or respondent numbers or respondent names) - they will be difficult to alter on a global basis using Edit/Find/Replace tools – always use an exclusive text prefix with another character – preferably a full stop. So....
- Do - always precede numbered questions or respondent identifiers with something exclusively replaceable e.g. RESP. QU. The full-stop/period together with the text – makes this string of characters exclusively findable and replaceable later should you need to use Find/Replace tools in Word. e.g. never just use Q because later searching for Q may find Q’s used in words in the text itself rather than as question ID) or RESP.001 for respondents ID’s - much better than just R or even R1, R2 etc. (see the section on survey data/focus group data)
- Do - if you need to, add individual identifiers but only after the above, e.g. after RESP.AL. or R..AL. QU.2.
- Do not - use colons < : > as part of a speaker/question identifier because in at least one major CAQDAS package they are illegal characters when incorporated in a ‘text search’, and you may use this tool to find places where speaker identifiers etc. occur.
- Do - adhere to similar minimal identifiers if transcribing a focus group with identified individual speakers (see our tips for optimal formats).
- Do not - separate question ID’s and respondent ID’s from their relevant text or speaker section by more than one hard return. Clear empty white lines are a sign that there is more than one hard return. These are appropriate only where you want clear separation between the end of one speaker’s text and the beginning of another section or speaker.
- Do - always make spelling, spacing etc., of repeating speaker identifiers, question headers, section headers, topic headers, absolutely uniform throughout text, not an inconsistent mixture of several formats e.g. Not all of these Q.1, QU1, QU 1, choose just one format and stick to it.
- Do - use Word tools to do the checking etc; get spelling right if this is a concern in Word. You may be allowed to edit your data freely. However there is usually no spell checker or easy Edit/Find and Replace tools in the CAQDAS packages. These tools are invaluable in Word and should be used to clean up your data before the file is assigned/imported. Also using the same tools...
- Do - in Word anonymise data where necessary, using Edit/ Find and Replace. If anonymisation is a requirement during analysis, do not leave this until your data is in a CAQDAS package. This is particularly important for teams about to share data and share CAQDAS project work.
- Do - get data as finished as possible in Word – before import/assignment to the CAQDAS package
- Do - create one set of finished data files for use in the CAQDAS package and keep them in a separate folder well away from other versions of the data which can be opened in e.g. Word for other types of work
- Do not (if working in teams) under any circumstances allow the editing or changing of data files or documents once they have been assigned or imported into the CAQDAS package. This is because you may eventually wish to merge multiple software project files from different team members together, and if the project files to be merged will at any time contain the same data files (e.g. same individual transcript files) what is wanted is that these individual files also merge. You do not for instance, in the eventual merged project want to end up with multiple versions of the same transcript. Coding and interrogation of the data can continue within the CAQDAS package but the individual data files should remain physically the same.
Each software has different routines and mechanisms for merging work. The abiding principle for most of them however is that working on the same overall project by multiple users using the same CAQDAS package, has to happen on different workstations in isolation. Eventually to see all project work together in one CAQDAS project, the work of others has to be merged or imported.
Ideally if different teams or researchers are using the same data (to start with or eventually) there should be one set of data files used by everyone (possibly copied between different machines or sites) which remain unaltered (in terms of editing) once they begin to be used in the CAQDAS package. New files can be added to this set at any time using the same principles. After progressive collating of different datasets or merging project work in CAQDAS, the likelihood that all team members will eventually have the same data incorporated in the CAQDAS package increases over time.
Table 1. Different broad types of data with indication where more than minimal formatting (note: flags mean words, identifiers or codified abbreviations which might be useful to insert in transcript for searching and recovery later)
Broad categories of data
Types of textual data sources
Useful flags or structures – general comments
Any free text, literature etc.
As it comes - a few choices to be made
Paragraphs based on Speaker Identifiers to include their relevant speaker sections. Sentences.
Might be more structured see below*
As it comes.
In depth interviews
Speaker ID’s, speaker sections. Sentences, paragraphs.
Consistency with transcription of recurring speaker ID’s within files, speaker sections. Paragraphs contain the speaker section… Speaker ID no further than the line above.
Minimal, then optimal OR straight to optimal
Consistency with transcription of respondent IDs, question numbers. See Special Sections on Survey Data Integration
Minimal then optimal OR straight to optimal
Consistency with transcription of recurring question structure across interview files
Minimal then optimal OR straight to optimal
Consistency with transcription of Headings based on contexts, dates, situations.
Minimal then optimal OR straight to optimal
Dates, daily records
Minimal then optimal
Later... differing auto coding tools in each software package offer ways to quickly code for different units of text structures in the data – e.g. NVivo uses heading levels, ATLAS.ti uses two types of of paragraph breaks (single or multiple hard returns), MAXqda uses paragraphs and sentences. For example, in Word before data import, it may be good to use Edit/Find/Replace/Special or /Format tools to replace all the QU. with the necessary format – putting in heading levels (NVivo), or putting in or taking out paragraph breaks (ATLAS.ti, MAXqda) – e.g. at or after all occurrences of RESP.
Earlier... a similar logic to the above could be applied to all sorts of shorthand expressions at the transcription stage with the Edit/Find/Replace tools in mind and so...
Anonymisation... can be achieved quickly in Word if names are exclusive and therefore easily replaceable text string abbreviations are used when necessary...
Early uniform decisions for teams... teams are vulnerable to difficulties later if the right measures are not agreed and applied early enough to protect data.
The guidelines for developing consistent and efficient file naming protocols below are relevant to individuals and teams using CAQDAS software for project organisation and data preparation. It is important to get file naming protocols in place which are useful, informative and not too space consuming. In the team context it is important to put in place a file-naming protocol which everyone sticks to.
File naming protocols could reflect such things as:
- Which team generated the data
- The respondent’s most important details
- The type of data
In generating file naming protocols, consider the following:
- Always abbreviate
- How you define what is useful will vary with the project. Similar protocols for codifying file naming can also be used in speaker identifiers
- The more files you have collected, the more complex the dataset, the more there is need of good file naming
- Long file names are often long without telling you anything interesting about the file
In an individual's project about carers and clients or family carers and the family members they care for might be:
|FCR_F13_LTD||(Family carer respondent, family 13, Litchfield)|
|FCR_F14_RDH||(Family Carer respondent, family 14, Redditch)|
|CFson_F14_RDH||(Cared-for son, family14, Redditch)|
|CFsis_F13_SOL||(Cared-for sister, family09, Solihull)|
|CR_03_RDH||(Carer respondent no.3, Redditch)|
|CX_04_WBR||(Client respondent no. 4, West Bromwich)|
This is only one example. In some packages the names by default can be listed alphabetically – or this can be arranged, so the first prefix might be the important organising factor if you want files to ‘sit together’ in a list. Above the role of the respondent is the organising factor. If for instance you wanted the family members to sit together in a list – use the Family identifier (e.g. "F14") as the initial prefix (not all clients are featured as part of a family unit because they are not cared for by family members).
In a team project an additional prefix at the beginning would identify the team who have collected/worked on the data e.g. T1 – the secondary ordering here is by family identifier e.g. F09 where family is relevant (most of the professional carers are not family-specific).
|T1_F14_FCR_RDH||(team 1, family 14, family carer, Redditch)|
|T1_F14_CFson _RDH||(team 1, family 14, cared for son-, Redditch)|
|T1_F09_FCR_SOL||(team 1, family 09, family Carer, Solihull)|
|T1_F09_CFsis_ SOL||(team 1, family 09, cared for sister, Solihull)|
|T1_CR_03_RDH||(team 1, carer respondent no.3, Redditch)|
|T2_CX_04_WBR||(team 2, client respondent no. 4, West Bromwich)|
|T2_FCR_ F13_ LTD||(team 2, family carer respondent, Family 13 Litchfield)|
File naming is one way of showing a limited number demographics or key features of each data file or respondent. Usually though there is a practical limit to the size of the file name because too much scrolling left and right is needed to see the entire file name in the panes which list them.
Further organisation: you can assign additional categories to data files using different tools in CAQDAS packages. These will provide varying interrogation functionalities to allow filtering or comparing across subsets etc.
Using CAQDAS tools for early grouping of files, data and respondents will facilitate subsequent data interrogation. Data organisation using different functions within each software will be useful eventually since it will provide a whole range of different ways in which data can be interrogated, compared and contrasted. CAQDAS packages provide different means and levels of organisation.
At a super-arching level with the CAQDAS project, you may be offered folders or textgroups to organise the basic ‘listing’ of files within the CAQDAS project. These are not universal in software and are often ignored. They do offer a way of filtering the files that you are looking at, at any given moment or for scoping queries. Such structures are most useful for separating different types of data, so that tasks which are useful for some types of data but inappropriate for others, can be accurately scoped.
Further and more detailed organisation will depend on the type of data. The choice of organisational tools employed to categorise data will vary. Whole files which can be categorised (e.g. semi-structured interview files) usually by ‘attributes’ (but variably named) and this is likely to be used at least for minimal ‘sorting’ of data. These become most useful though is where a greater variation of quantitative or descriptive information about qualitative files needs to be incorporated into the project work for purposes of later interrogation. Though such information can be created in a step by step way in the software, it can also be imported via a spread sheet - see table below (detailed format differs for each software).
Other types of data, for instance individual speakers in a focus group (where they have been identified in the transcript) may be organised by coding e.g. using a code for each person or a code representing the female speaker sections etc.
If survey data (open ended questions) are involved, see the detailed information on survey data sections (their data preparation and the incorporation of additional quantitative data) below.
In the team context organisation of data is one of the areas that can be planned ahead to great advantage. A common, structured table distributed for uniform completion by all team members provides another building block towards an agreed understanding of the basic structural bases on which interrogation might happen later. Such a table might be formed from information arising from the team structure, from research questions, from known relevant facts about respondents or contexts of the data. A generic example is included below. The individual format or additional information required in each software (were that table to be imported via a special attribute import) would differ with each software.
|RESPONDENT||Resrch TEAM||ROLE||GENDER||DISORDER||GP contact||PCHT||MATRON||Family ID|
|T1_F14_FCR_RDH||Team 1||Fam carer||female||n-a||freq||RDH3-MH||no||F14|
|T1_F14_CFson _RDH||Team 1||Cared son||male||M.H||freq||RDH3-MH||no||F14|
Remarks about the above (fictional exemplar) table containing attributes/variables
- Some of the information is duplicated in file name and table – the file name is visually informative, the attribute value in the table will be functionally useful as a set of factors to interrogate the data by
- Each column comprises a potential basis for qualitative cross tabulation/comparison across subsets
- Where possible labels are kept abbreviated – for instance PCHT will involve far less scrolling in a table than primary care health team, as will the codified health team number in the values
- Sensitive data: where anonymisation is required – this table may fall short – since we might guess from various contextual clues that RDH stands for Redditch –the PCHT value labels, e.g. RDH3-MH might incorporate a degree of codified anonymisation which would be difficult but not impossible to work out. So, the researcher or research project manager should attend to anonymisation if required, early, before too much inconsistent information is proliferated!
- Whatever the function used in software to apply this information, an agreed table provides a useful up-front planning device for teams involved in collaborative research
Similar tools in CAQDAS packages use different terminology:
Use Excel for any additional data, e.g. demographic, quantitative data, that you may have alongside the qualitative data (this may be imported into CAQDAS packages by different means and the relevant data categorised using the differing tools within each software tool).
High level listing, filtering: if offered by software use folders or similar structures (these are not vital) to separate e.g. different types of data.
Parts of a data file which need categorising e.g. by socio-demographic variables for interrogation later – use coding - code relevant parts with e.g. ‘female’ code. Enables later queries e.g. looking for where the code ‘communication difficulties’ co-occurs with places in data where ‘females’ have spoken.
Whole files which need categorising in any way (or many ways) possibly for more complex interrogation (comparing across subsets, querying within subset combinations – e.g. ‘single females’ in their ‘70’s’ who talk about ‘day-care’ – use Attributes or one of the alternates used in above box together with any relevant topic/concept code/s.
For teams: the main ‘CAQDAS co-ordinator’ in a collaborative project should develop a spreadsheet containing the potential attributes and then copy it to all teams for completion to provide a consistent base level template of data organisation. This can be applied to an individual team’s project in the CAQDAS package later if necessary. Or it can be retained by the co-ordinator and applied to the aggregated ‘all in one’ project as produced by the merging of the teams’ work.
Various coding tools can be used as a means for developing consistency from the outset of a project. Users and CAQDAS data co-ordinators can work together to create codes at the start of a project. Together, they can agree an overall coding scheme, the way the codes will be applied and the units of measurement.
The idea that there can be protocols for the development of coding schemes is somewhat anachronistic, since the development and refinement of coding schemes and coding processes are very individual to the demands of each project. For those reasons, this section primarily offers generic ideas for managing the process of developing a coding scheme but mainly for those engaged in collaborative research. Further, the material provided here is particularly aimed at those projects in which more than one team member will engage in the use of a CAQDAS package (the same package, hopefully!). This section assumes that a coding approach to the management of data is being used, but there may be other qualitative approaches not revolving around a coding approach, where the emphasis is rather on the use of memos, annotations etc. See protocol 5 on memo systems below for more information and ideas for the management of memos.
The possibilities are too numerous to cover all the different models of team research. Here are some statements and discussion points which may help to contextualise your requirements.
Statements and reminders about the nature of team work when using CAQDAS packages
- Different team members cannot usually feed into one ‘project’ from different work stations at the same time, although Transana, a video analysis package, is the exception to this rule in that it provides a specific multi-user version designed for team situations. A special ‘server’ version of Nvivo 9 (released in October 2010) also promises simultaneous accumulating work on one project from different machines. For the purposes of this document, it is assumed that this facility is not yet widely available.
- Many planning issues have to be addressed because of the above circumstance. Separate projects can be ‘merged’ or one project (or several, in turn) can be imported into another. In some software packages specific parts of projects can be imported into another project (see software specific sections).
- Data preparation protocols need to be considered as do the necessity for reaching agreements on coding strategies.
- If the data for each team is so different in type and rationale then it is possible that codes would also be different in nature, and therefore the coding agreement between teams is not a factor, though merging of different CAQDAS projects might still be desirable.
- Merging projects in CAQDAS packages will merge codes which are in same hierarchical position, with same name. if they are sub-codes they have to be under the same overall hierarchical top level code.
- Team projects who feel confident in their use of software often seek to follow an incremental process of merging work repeatedly.
- Any process involving the merging of different work in one CAQDAS project might require certain early safeguards to be sure that the repeated merges work each time.
- Basic measures such as working inside the software with data files whose text is never edited or are read-only might be necessary (see the protocols on general data preparation).
Discussion points: how are you dividing work?
- Each team member (or sub-team) is handling different case studies (different data, similar or same codes)
- Each team member (or sub-team) is analysing the same files but each from different perspective or focusing on different aspects (different codes)
- Each team member (or sub-team) is analysing different files from similar perspectives and aspects (same codes)
- Each team member (or sub-team) is analysing different types of data which require different ideas and approaches to coding
Teams may keep each project separate and write up separately. However if the separate projects are joined into one larger project (from different collections of individual data files), once they have been merged – the new project is likely to contain more/all of everybody’s files from that point. If this project is then redistributed among the team for further work, coding etc. on these files can continue but each team member will have to be careful not to alter the content of each file in any way since future merging projects must be able to reference only one version of each file or a conflict or inaccuracies during the merging of work will occur. See cautions in this respect in data preparation protocols.
Coding scheme and coding agreement
For the purposes of the information below, an assumption is made that the model being pursued is one where coding agreement of some sort is required, so that a comparison across cases or teamwork can be made. Team projects often make pragmatic agreements about coding schemes up front. Project design and research questions impact on these decisions and others (see also protocol 3 on grouping data and files for interrogation). Much negotiation has to occur between the team members about what codes might be relevant, what is meant by each code etc. Nothing can be taken for granted in terms of different researchers’ understandings of an important term or code name. Other approaches to the generation of codes such as are included in Grounded Theory or its various derivatives can be accommodated. Generally however in such a case, if like is to be compared to like and some effort made to address individual differences in subjectivity, the initial codes will need to be refined into fewer, agreed categories. The agreed coding framework will be used for the coding of further data. This does not mean that there can be no flexibility during the rest of the coding processes, but that flexibility needs to be systematic.
Once a coding scheme is agreed upon, the style of applying codes to text matters. There should be some commonality in the amount of context included around a relevant nugget of text. Though it is possible to merely code a phrase or even just a word, the use of CAQDAS lends itself better to the inclusion of useful additional context. This is difficult to regulate but the general principle is more important to agree on than the detail. The inclusion of more context also enables the application of more than one code (there is no limit to the number of codes allowed) at the same position or at overlapping segments when needed.
Generic strategies: creating up-front coding scheme (a priori)
Agreement will be reached on the bare coding frame, probably independently of data. The teams might be allowed flexibility to add codes later. Usually a hierarchical (or quasi) coding scheme will be created.
In an originating (master?) project – other structures could also be created… Folders (if appropriate), Attributes, Memo titles. All these structures and how they are made useful in the context of the project might reflect project design, and the research questions. See protocol 3 file/data grouping for interrogation.
In an originating (master?) CAQDAS project - create the coding scheme inside the CAQDAS package. Add the agreed definitions for each code in appropriate place attached to each code. Include useful instructions about rough amount of context to be included in coding actions for this code, and whether to look out for the co-occurrence of other codes, at same or overlapping text.
If teams are to be allowed to add their own new codes, either ask them to create new labelled hierarchical area – e.g. called Team B new codes, with new codes as sub-nodes underneath. Or create new codes within appropriate existing hierarchy but initialise each new code. e.g. ‘AL-communication problem’ or ‘TmB communication problem’. Clearly describe/define new code. This is to differentiate from the accidental creation of the same code by someone else, so that accidental merging of new (apparently identical) codes does not occur until the co-ordinator has checked the levels of similarity, interpretation, application etc. The similar codes can be merged together by the co-ordinator if appropriate at that stage.
New memos created by a team should follow the same initialised naming protocol.
The coding agreement may be part of the development process of original coding frame. As an exercise to check on shared coding style and understanding of the up-front coding frame – either in or out of the software (agree on a method) – all code up 3 of the same data files (if appropriate to data type collection for separate teams). To do it in the software is a useful shared familiarisation process for all concerned and could be part of the early team meetings and CAQDAS training mentioned in the collaborative working: extra dimensions of project management section above.
Agree on a method of marking up hard copy so that it is clear how much data is coded and what code/s are relevant at any point. Clearly mark any new codes developed which haven’t already been agreed – get together and discuss.
An easy equivalent of the above, is to copy the master CAQDAS project developed above, to each team member (with the 3 files imported), code up the three files independently – using suggestions in the flexibility section above, if necessary for the development of new codes, print off the coded file (together with the codes indicated in the margin). These are software specific steps not included here, but available from Help menus.
Get together with printouts in order to come to understandings of difference and pragmatic agreement on how to code further data independently. Discussions could include:
- Comparing similarly coded segments of data
- Are researchers interpreting the codes in the same way
- Are they interpreting coded segments in the same way
- Clarifying similarly labelled but different (new) codes
- Comparing the style of coding - the way that segments have been coded, and if that needs to change re surrounding context
Generic strategies: a grounded approach to the generation of coding scheme
Grounded Theory, originating in Glazer and Strauss (1977) and subsequently debated and developed in Strauss and Corbin (1994, 1998) and Charmaz (2006), is much referenced but much adapted in its application. This is how Lewins and Silver (2007) summarise the first Open Coding processes of Grounded Theory: "...the first coding phase in which small segments of data (perhaps a word, line, sentence or paragraph) are considered in detail and compared with one another. This process usually generates large numbers of codes from the data level, which encapsulate what is seen to be ‘going on’. These codes may be descriptive or more conceptual in nature". From the same volume a summary of the second Axial coding process begins: "... Axial coding is a more abstract process. It refers to the second pass through the data when the codes generated by open coding are reconsidered. Code labels and the data linked to them are rethought in terms of similarity and difference. Similar codes may be grouped together, merged into higher level categories, or sub-divided into more detailed ones". Whether two processes are followed exactly, the team will follow a process something like it in developing and agreeing a coding frame which will be used for the bulk of the data.
Generating codes could start with individual team members freely coding up one or maybe two of the same files in a grounded way creating codes inductively. See suggestions for the in-software and out-of-software approaches described above for the marking up of hard copy.
It is unusual for a team project not to have objectives which determine some elements of the coding frame, and this may be reflected in the way the grounded codes are developed into higher concepts or categories, or those objective-driven codes may feature in an a priori / up-front area of the coding frame alongside the inductive, grounded codes.
As discussed earlier, generate a minimal master project if coding is to be done in the software, and copy the minimal project (including any a priori codes) to other members. The process of agreeing and refining a partially developed coding frame and to make some progress towards common understandings will be messier than the earlier, mainly a priori approach. It is possible that more discussion points are required.
After say, the first file is coded, discussion and perusal of work so far could include:
- Comparing of codes developed – their meanings, the interpretation of text where they have been applied
- Clarifying similarly labelled but different codes
- Comparing similarly coded segments of data
- Comparing the style of coding - the way that segments have been coded, and if that needs to change re surrounding context
Significant progress can be made during this discussion towards the collapsing of several detailed but similarly defined codes into fewer, agreed codes. This is aided by having printouts of each researchers list of codes. Working with an OHP for this process can be useful. This first discussion (if a really inductive process has been followed) will generate a great deal of varied opinions and perceived priorities. It is really important that sufficient time is allowed for such discussion. It is generally understood that analytic processes which incorporate grounded approaches to code generation and in particular Grounded Theory are iterative, and this can be reflected by the processes that the team undergoes to reach agreement on a coding frame. So, it is likely the team will undergo this process for a second and third carefully sampled file (to reveal differences) before researchers feel that they have the basis of an agreed coding frame which can be used and applied to the bulk of data while still allowing further flexibility in the creation of new codes.
After the creation of an agreed coding frame, if teams are to be allowed to add their own new codes, either ask them to create a new labelled hierarchical area – e.g. called ‘Team B new codes’, with new codes as sub-nodes underneath. Or create new codes within appropriate existing hierarchy but initialise (prefix) each new code to differentiate from the accidental creation by someone else of the same code, so that accidental merging of new (apparently the same) codes does not occur until the co-ordinator has checked the levels of similarity e.g. ‘AL-communication problem’ or ‘TmB communication problem’. Clearly describe/define new code.
New memos created by a team should follow same naming protocol. After meanings etc., are found to be the same for any similar codes, the co-ordinator can manually merge codes in the software.
The guidelines below are for developing and using systematic memo systems, including memo naming and employing an agreed structure of memos.
Memo tools are for all types of users
Memo tools in CAQDAS packages all differ in what other options are enabled with and within them. Generally they are places to analyse, write and keep track in a richer way than is afforded by coding on its own. Additional devices associated with memos in CAQDAS mean that hyperlinks can be made from memos to text, or codes. Entries can be dated to track progress. They can be grouped, graphically displayed, listed and exported in different ways. They can be used for different purposes at different stages. Early on, they can be used as a location for dialogue and discussion between team members e.g. about code development. Later they can be used as: "consolidating memos..." where progress on a theme or topic is reviewed so far: "...with recommendations on how to move the analysis forward" (Di Gregorio and Davidson 2008). We do not assume here that you are using a coding method to analyse data and indeed, the memo tools maybe your principal way of managing the analysis. Some of the protocols suggested may indeed relate to coding processes, the issue of themes identified in the data, the generation of codes and categories but the use of coding itself can be ignored by a team or individual whose main task is to unpack or deconstruct data. The memo tool (accompanied by any text based annotation tools - not covered here) will serve as good writing locations, with numerous linking devices to their related data and ways to organise and export notes.
Sharing memos – different situations
The creation of memos, how they are named and what they are used for is ultimately always up to the individual user in whatever circumstance. However in the collaborative context there are now three distinct situations which might influence practicalities of the creation, editing and particularly the labelling of memos.
- Where a CAQDAS (qualitative) package is being used by all, but the merging of whole projects will not occur. In other words researchers are working separately and communication happens only via meetings, the exchange of output (see teamworking protocol 6 on output and storage) and other communications to arrive at a cross-case (along with their own within-case) analysis.
- Where researchers are working separately (hopefully using good protocols to manage work in a systematic way) will eventually or repeatedly, merge the software ‘projects’ they are working on, into one larger project, so cross case analysis can occur within the CAQDAS package in one large project.
- Now, particularly with new packages like Dedoose and NVivo 9 Server version, team members can work concurrently on the same project from different work stations (given the right conditions for each software – e.g. working on the same server with NVivo 9 Server). Constraints which apply to the management of memos in 1 and 2 above, do not apply to the same extent for these packages, though different working practices and considerations may be necessary.
Though there may be similar general management principles which could be applied the discussion that follows is not designed to address the logistics of work in 3.
Software specificity and varied ways of working with both packages make it difficult to generalise. Some of suggested protocols would be advisable for both 1 and 2, others are particularly important for 2. This is made clear in each set of conventions below.
The varied role of memos and their relative importance
As suggested in the second paragraph, the memo tool is one aspect of CAQDAS packages which is of use to different varieties of qualitative analysis, those using interpretive approaches and also for those using traditions of discourse analysis. Our experience in teaching and supporting different types of collaborative project suggests that at one extreme the memo is hardly used at all because researchers are simply ‘coding-up’ the data for further analysis and write-up by particular members of the team. Other teams and their researchers are in full control of the analysis of their own projects and the memo tools get used variably if there is time or if there is an awareness of how the memo integrated with the software can be more useful than normal note-making outside the software. At the other extreme the memo becomes the real vehicle of communicating analysis between teams in situations 1-3 above. We sometimes recommend for instance that multinational collaborations using coding methodologies for different data in different languages and alphabet systems adopt a particular approach of cycling between the coding process in their own language and heavy use of the memo tools in the common language. Sharing the memos then becomes the cornerstone of sharing analysis.
Exporting memos can be achieved in different ways in different packages. Ultimately you can always copy the contents of a memo and paste into the Word processing application.
In situation 1 there are few constraints since careful naming of the memos before sharing (see protocol 6 on outputs and storage) in order to enable good storage and identification of them by other colleagues. See more ideas about the use and creation of memos for all to use in shaping a framework of memos below (see protocol 5 on memo systems).
Constraints when merging projects
In situation 2, the CAQDAS memo sharing process becomes more challenging because in the context of merging whole projects, various models of merge are possible and different things will happen to memos depending on the software and the model of merging that has been chosen. A standard principle in some software provides that the merging of two projects which have two memos with the same name (but probably with different content), may cause one of the memos to be dropped unless specific actions are taken. Sometimes the software will allow you to ‘merge’ or ‘add’ the second same-named memo – see the software specific support for collaborative work section above for details on these facilities. In some packages, only one memo is allowed in certain places – thus only the memo in the open project will be saved. Following on from that principle, where the facility to add a (possibly renamed with a number suffix) memo is provided, a special memo naming protocol will avoid confusion about who created which memo. For instance a memo in team A’s project about coping strategies could be called team A-coping strategies – and thus both Team A’s memo and team B’s similar memo will both be obvious in the merge.
The next challenge in situation 2 comes where the model of work between teams involves repeated merges – each time with incremental analysis happening on each newly merged project (the new merged, ‘master’ project distributed back to all team members and again merged and redistributed again and so on). If new work has been done around those memos, to avoid doubling and quadrupling the number of (possibly newly suffixed) memos some action needs to be taken by the CAQDAS co-ordinator. Possible actions will depend on where further memo editing is taking place.
Has the CAQDAS co-ordinator absorbed all of the memos on coping strategies and created one new synthesised memo, maybe calling it just "coping strategies"? If so, it would be good to delete all the separate team memos on the subject before redistributing the project so they can recreate new ones for further work as necessary.
Alternatively, the CAQDAS co-ordinator could retain all the separate memos for the redistributed merged project so that respective team memos can be added to by the respective teams.
Make additions look new!
But before the next merge, each team would need to delete the other team's memos from their own project to avoid conflicts. The CAQDAS co-ordinator will also need to delete old team memos in his or her ‘master’ project before the next merge.
The complications associated with merging projects will keep arising for both old and newly synthesised memos. The same logical process has to be applied before each merge to avoid the loss of important memos.
It can be seen that repeated merges are possible and the challenges caused by conflicts continually recur. A co-ordinator needs a clear picture of what is necessary at each stage of the process and has to be very proactive in ensuring that a project which makes heavy use of memo tools adheres to set procedures. The co-ordinator will find this task easier if he or she has complete control of the merging process.
Merging processes are quite different in each software tool, although the above principles will apply similarly in each tool. For all merging (e.g. ATLAS.ti) or importing of one project into another (e.g. NVivo 8 and 9), or exporting and importing teamwork (e.g. MAXqda) it is vital that the project into which another project is being merged is backed up first, or that the merge happens to a new copy of the project. This is because the merge will make changes to the originating project and in some cases these changes cannot be undone easily.
If you are the CAQDAS co-ordinator, experiment with merging in small ways with dummy or ‘baby’ projects – in this case with one or two memos created in each project, (with same names but different content) – just to see what happens in actuality. This will increase confidence in the whole process once it becomes a more serious exercise with real projects.
Shaping a framework of memos
Where it is conceived in the project design that much work will be done in memos, it is useful if some memo types and memos are defined into the originating software project as a framework for everyone’s later attention. Di Gregorio and Davidson (2007) refer to a software shell where the building of several structures in the early version of a software project, serve to reflect project design, research questions and multiple aspects of enquiry which feed the research questions. In the helpful exemplar projects they use, researchers (or team leaders) have structured memos in various ways to encourage reflection in particular ways. Although there are usually other ways to group memos, the naming protocols heavily used in the output and storage; and file naming protocols provided here, also apply to memos and how they are listed. Possible options for types of memos and memo name prefixes could include:
- Code - for the discussion of codes or themes
- Methods - for the discussion of methodological concerns
- Log - for the day to day tracking of various processes
- Theory - for the discussion of extant or new theories
For example, TM-A code-coping strategies.
Summarising tips for memos in the collaborative context
- Whatever the logistics or distribution of the team project under way, memos provide a location for dialogue and negotiation, especially early on
- Naming and management of the memos should be carefully co-ordinated if merging of software projects is to occur
- The creation of set-piece memos in an early project (for distribution to the rest of the team) is a way of emphasising the need for attention and reflection on specific issues later
The output is information or data which is exported from the qualitative software. The output can be read, discussed or analysed further; or imported into another software application. Most of what is discussed below is relevant to teams or individuals.
If teams are sharing whole software projects by means of ‘merging’ two or more project files, this is the ultimate form of export and subsequent import. These elements are discussed in software specific support for collaborative work (below).
For some types of collaborative project, exchanging output or ‘reports’, say, exported selections of data (e.g. coded data) may be one of the main methods of sharing information between different teams and members of research projects. A variety of output forms would assist that process and these are discussed below. Individually the outputs may seem like common sense but collectively they create a system that works when you have a mixed team of individuals who may have varied experience and expertise with computer usage.
The direction of flow need not be just from team researcher to project leader/co-ordinator – but in all directions. This is especially true if there are few meetings between teams. There may be key outputs which help the whole team to come to a rounded and agreed view on ways of handling and analysing the data, though cross-case results may eventually differ. Sharing data can be achieved by sending files or generating html files which can be viewed in a web browser. Remember to consider ethical and security constraints when placing data on a public resource.
Types of output files
We do not discuss all possible types of output but cover the generally useful requirements for qualitative output that might need to be exchanged in a collaborative project.
Some software packages provide instant output in html format in the form of tables which include qualitative segments, frequency information and cross tab matrices. See notes in software specific support for collaborative work.
- Code lists – together with definitions/descriptions of codes. These might be anything from a tentative first stab at generating shared ideas of what codes will be required in a standardised coding scheme, the standardised scheme itself, or the standardised scheme plus additional codes created to cater for new discoveries later on.
- Coded data – the segments of data that have been coded to specific topics and themes. These will be essential output for cross case analysis if time allows, and especially if merging of whole software ‘projects’ is not part of the plan!
- Summarised coded data output – typically summarising frequency of codes and the documents they feature in.
- Results of queries and searches – usually similar to the above, just further on in the process. These might include tables – which can be exported for opening in Excel, rather than to the usual Word processing software.
- Memos – each team member may need to summarise progress or provide interpretive accounts of major themes or individual respondents. All qualitative software (CAQDAS) provide memo tools; they can be exported/printed or reintegrated as data if required. A CAQDAS co-ordinator or project leader can use efficiently generated memos from the various teams’ projects as the way to produce cross case analysis. Clear and early instructions regarding the requirement for memos for later submission to the project leader or CAQDAS co-ordinator. There are some very good references to this topic in Di Gregorio and Davidson (2008). See protocol 5: memo systems, expectations and naming above for more information.
- Maps and models – these are closely associated with memo writing and provide ways of stepping back for the data and expressing connections in various contexts. They are less likely to be used in collaborative projects, simply because of other constraints such as varied expertise, time, and a possible lack of usefulness to the team as a whole. In the team context they must be carefully used or they simply become a distraction from the main demands of the project. They can be saved as files but more simply any model or map can be copied and pasted to e.g. a Word file to help to explain an idea or theory. They can be integrated into other output files –e.g. coded data. Detailed information regarding reasons and starting points for models can be found in Lewins and Silver (2007).
- Documents or source files – these can be output, exported or printed at any time and a paragraph numbered version generated from the qualitative software will be useful for various reasons. See contents of output section below for more information.
The labelling and storage of various output files
In a collaborative project there are some basic steps to improve the management of information.
- Naming the exported files/reports: the different team members – the consistent and systematic naming of reports, though only a detail, will enable the individual creator or other colleagues to know exactly where to look for and how to understand labels assigned to exported files. (See also protocol 2 on file naming). Though not using the same protocols as you design for data files themselves prior to import, the naming protocols for exporting data e.g. coded data could have a similar logic applied to them. Choose a protocol which suits the type of output. Some ideas based on the numerical/alphabetic listing of files in any folder.
- Folders are good sorting places but... it is possible to over-complicate sub and sub-sub folder structures as it may increase clicking and the dangers of accidentally moving or losing all files within a folder. You could opt to keep all the different types of output file within one folder say TeamA/Output/.... and still sort them the way you want by prefixing each file. Including a separate sub-folder under the output folder which houses paragraph numbered file versions of every source file (as generated from the qualitative software) is an excellent idea. You can always choose to export individual source files from the software.
- Prefixing files: Use different abbreviated prefixes for different types of output. It is good to abbreviate the type of output, since although it is important for systematising a list it is the least interesting bit of information in the title and therefore should take up the least room. It does mean however that all of each type of output are listed together by default. For example:
|MA_||(Maps or Models)|
For all coded data – use prefix CD_ <nameofcode> plus a date stamp (date of file generation). It is a mistake to simply rely on the date of the file in ‘details’ since you may open the file and change or annotate it at any time. So an example would be: CD_conflict_12-03-10.
The American date convention is sometimes good – (3 December 2010) since listing by month works better than listing by day. Thus, if you have more than one output file based on ‘conflict’ coded data, it will list the earliest by month, first (in alphanumeric sorting). Or, if such work spreads over several years – put the year first i.e. 10-12-03. If they are instead listed by day of the month it can take extra peering and searching to find the latest version if you have several e.g. conflict files.
If in a large collaborative project and sending via the internet, before sending a file or compressed folder – consider adding a further codified part to the file name – e.g. TeamA_CD_conflict_12-03-10.
- Consistency: whatever your naming protocol (the above suggestions are only ideas) – it will only work if all members of the team stick rigidly to it, especially if you are all sharing the same network drive. Always use standard abbreviations. Always use standard prefixes. Even if files are accidentally saved in the wrong folders (a very easy thing to do!) – a search can be done for them because you know exactly what you are searching for. Such rules will be equally effective for individuals. We have all lost work due to naming files inconsistently in haste.
Contents of output
It will be important to include extra information in any output sent to other colleagues in the wider research project:
- Basic information about which team/researcher generated the data – (this might be in the file name itself, but ideally could be in the content as well- this will depend on the number of teams and overall complication of the project).
- Name of the code: it may seem obvious, but especially for printouts, check that the name of the code appears in the ‘content’ of the output. It is not included by default in NVivo for instance (but is an option for inclusion). You waste a lot of paper in printing out coded data if you then forget and cannot identify what code is represented in the printout. The name is included by default in some packages, like ATLAS.ti and MAXqda.
- Inclusion of annotations or comments or memos: most coded output for instance allows the inclusion of various extra bits of information. This may be important for transparency, the transfer of interpretive understanding between colleagues and not least reminders of subtle but important insights that might otherwise be forgotten or overlooked. For instance in NVivo you can ‘annotate’ segments of data. If annotated segments are included in coded output – annotations are an option to include. In MAXqda, when printing out the code memos can be included. Similarly for ATLAS.ti, the comments linked to the code in the code list can be included. The CAQDAS co-ordinator can encourage the use of such tools ahead of time, if he or she can see a long term purpose behind them in enriching the shared output. Even without the use of such tools within the qualitative software, the user can edit the output either in Word or in software like ATLAS.ti, or QDA Miner just prior to export in a special Edit window.
- Contextual location: name of speaker or source data files featured: this is a default provision in e.g. any coded data output file i.e. the name of each source file is included above each segment or group of segments from each e.g. Interview. It is not so easy however to trace who said what when the segments come from a focus group or group interview. When the speaker identifier has been physically included in the coding process it is easy to see who said what, but there is no easy way to guarantee that you will know who said something when the coded segment comes from within a paragraph of speech. It is easy when still in the software to find out, via hyperlinks from the coded data window back to the source text. In the output however, if it is important to know ‘who said what’ then the only strategy is to be sure that you can match the position of the segment with the position in the source file. To help in this respect, it is a must to not only include paragraph numbers in the coded output, but also to have a paragraph numbered version of the source file which has been generated in the qualitative software itself. This version can be kept either as a file or hard copy. Keep a separate collection of the paragraph numbered file versions of your source data in a special sub-folder under output.
In order to be useful these must be generated in the qualitative software itself, so that paragraph numbers match with those of the coded outputs.
- Contextual location within each source: paragraph numbers are included in ATLAS.ti and MAxqda by default, but you have to opt to include paragraph numbers in some other packages. This can be useful in terms of being able to relate individual segments to their actual location in the original source data (as above re-group interviews).
Sharing and sending output
Sending textual output does not present any sort of problem, since the relatively small size of file means they can just be attached to emails. For larger files, experiment by right clicking and (e.g. in Windows) send to compressed (zipped) folder. Whether you are sending a folder (you must use this option to send a folder) or sending just one large file, the resultant file might be of a suitable size to send by email. Always check the size by looking at the details of the compressed file. Different mail servers will have different limits. When output includes graphic files or jpg files or audiovisuals, then the size of the output can quickly exceed what is reasonable to send by email and there may be other options you can use to send large files such as:
- Various local file sharing environments and ftp facilities if already in place.
- Special ‘P2P’ (person to person) web tools can be utilised. Investigate arrangements such as those offered by Hightail (formerly YouSendIt), a subscription facility which allows you to send large files. Free services of a similar nature may be available on the internet.
Project leaders, CAQDAS co-ordinators or individual users should satisfy themselves regarding the security of any web-based transfer of material.
Finally, in combination with the above note on security, the co-ordinator should be aware of any variety in the levels of security required for different uses of the data in respect of output, and ensure that protocols include instructions for anonymisation if necessary, at any or all of the required stages of data transcription / output / dissemination.
Summarising tips for output in collaborative projects
- As a project leader or co-ordinator be aware of what types of output will be most useful for all, and make a clear short protocol of absolute requirements based on the above for each.
- Encourage the involvement and joint understandings of researchers at different stages with appropriate output.
Team projects usually have to opt for high levels of organisation and co-ordination. All the items above will impact on the effective management of the project. However much more important is that project design and research questions should impact absolutely on how you apply these or similar planning exercises. The sections that follow on from this introduction are separate documents, but really they should be seen as interconnecting project management threads, all of which will allow CAQDAS packages to be used efficiently and to maximum potential.
Software specific support for collaborative work
We have distiled some of the major aspects of working collaboratively with a range of software packages. We have included: ATLAS.ti; Dedoose; MAXqda; NVivo and NVivo 9 Server; QDA Miner; and Transana. Each software package is discussed mainly in the context of collaborative work, which might take place in different ways. For system requirements and other information, see the website for each software package. No pricing information is provided here, except to mention free, low cost instances or unusual pricing mechanisms.
There are three main models of collaborative work in the context of CAQDAS (or QDAS) use.
1. Merging software projects and the work done, after working individually
Our main task is to describe if and how parts of projects or full project files can be merged or imported into each other, so that eventually multiple projects can be integrated. The usual principles considered as standard in the merging of projects are the ability to merge individual objects which are the same (and are in the same place e.g. identical codes in identical folder or hierarchy), and if required to add new things, codes, memos, data files (source documents – transcripts, field notes, audiovisual data ). It should be stressed that significant planning needs to occur with these models of collaboration to enable successful merging of work.
Tip: Before using any of these facilities on real project work it is important, to experiment with small projects created just for the purpose. Software included in this category: ATLAS.ti, MAXqda, NVivo, QDA MIner.
2. Working in serial and exporting work
In the context of sharing work however, much can be achieved simply by sending the whole project to another researcher for his/her contribution, or exporting parts of qualitative project work to other users. These aspects of teamwork are relatively unproblematic though working in serial requires careful controls. Included under each software heading are some of the major or special types of output and their format. For the synchronous use of software at point 3, the exchange of output is less important since researchers are potentially communicating and comparing ideas in the live software in real-time.
3. Synchronous working by multiple users
This model of collaborative work is newer, enabling multiple researchers to work simultaneously from several different workstations. In a sense there is less to say about this model of work, since it is similar in most respects to an individual working with the software. Much will depend on the numbers of researchers and the dynamics of communication rather than the careful planning required for sharing and merging work at model 1 above. Software included in this category: NVivo 9 Server, Transana, Dedoose.
Sharing and merging work initially done separately
Initial statements about the principles of sharing work:
- It is not possible to work synchronously from separate workstations on the same MAXqda project (at time of publishing)
- There are several quite different ways to share or combine work or merge projects data in MAXqda – see points 1 to 5 below
- Similar functions to MAXqda 2007 and v2
- MAXqda, originally designed to work with textual data also supports limited tasks with video
Ways of working collaboratively
1. Merging parts of projects
Export/import teamwork – in order to merge parts of a project (suitable for one off or repeated merging of work). This means the coding and memo work that has been done at a document, a folder of documents or the whole document system can be exported – then imported into another project. The user can pick and choose how much data needs merging. This would be the most useful option in a team project where it is the intention for multiple team members to work with MAXqda, and each team member would either work with different parts of the data set e.g. different cases, or different types of data, – or work with the same data but analysing different aspects of the same data – all with the eventual intention of aggregating separate work into one MAXqda project. The process allows the importer of the teamwork file to choose which objects should be included, coded segments, variables (attributes), memos, external links. Some objects in MAXqda have only one memo – the host project’s memos will have priority and thus it is important to be sure which project does the importing.
2. Merge whole projects
This approach is more suitable for one off merging of work e.g. towards end of coding process - not to be considered without experimentation on copies or dummy projects.
3. Working in serial
This can be a fruitful way of generating early coding agreement. Each researcher can create and use their own coloured codes (filtering the code margin display so others colours cannot be seen until ready to compare all contributions side by side).
4. Individual parts of the infrastructure
Individual parts of the the infrastructure of a project can be exported/imported to form part of the background structure of another or new project e.g. attributes, code system. Complicated lexical (text) search expressions (i.e. the way the search is built) can be saved as a file to be loaded in other projects to standardise particular explorations.
5. Sharing / distributing output files
Sharing / distributing output files e.g. retrieved coded segments (file or printout), coded segments with attached memos (printout only), the current document browser with full coloured code mapping in margin - see figure below (printout only), documents with paragraph numbering (file/print). Tabular files which are instantly viewable in html include the qualitative segments, frequency information, varied cross tab matrices, and therefore are quickly viewed and could be mounted on web pages for easy viewing, frequency tables of codes, words , memos plus content, attributes.
These options provide much variety and flexibility in the way merging work can be achieved; the varied range of one click tabular outputs (including coded qualitative segments) to html is excellent. See the MAXqda website for more information and online tutorials.
Initial statements about collaborative working in ATLAS.ti:
- It is not possible to work from separate workstations on the same ATLAS.ti Hermeneutic Unit (HU – similar term as ‘project’)
- ATLAS.ti uses an external database, so the HU reads the data from an external folder – this means that multiple users can synchronously work with the same ‘documents or dataset’ from different workstations as long as they can ‘read’ from the same network drive while working in ATLAS.ti
- In addition to working in serial on the same HU, one after the other, there are a variety of ways for multiple researchers to combine work or merge HUs or parts of them – see points 1 to 4 below
- Most options described below are also available similarly in ATLAS.ti 5
- ATLAS.ti was originally designed to work with text but was the first of such software to encompass direct analysis of multimedia files
Ways of working collaboratively
1. A merge tool can merge parts of other HU’s or the whole HU
A merge tool can merge parts of other HU’s or the whole HU into the target or ‘open’ HU it provides four macro options for choosing basic models of merge and default micro options (which can be changed) for fine tuning particular actions. This pane is illustrated below.
For instance if you choose same PDs (source data files) and same codes it will understand that if you have been coding the same data, any coding done by the other researcher will be added in and merged (unify) with the hosts’ coding but it will also add any other documents (and codes, coding) that happen to be in the other researcher’s HU. Memos are ignored by default – but in the micro options they can be added or unified. The unify option is effective since it adds any additional new text to existing duplicate memo together with an annotation of where and when it was merged. For any of the macro options chosen on the left, you have the option to choose Ignore for any of the micro options, thus leaving out as many aspects as you want from the merge.
2. Special support for working in serial
In ATLAS.ti (extras menu) there is a user editor and user administration window allowing the setting up of new users and the ability assign them full or restricted rights to change the HU.
3. Individual aspects of the infrastructure can be exported / imported
To set up an HU, e.g. the codes list can be exported (in XML) and then imported to a new project. Attribute style spreadsheets can be imported from another project to organise document families (subsets) of documents as long as they have the correct information and format (one option - the spreadsheet template can be created by exporting document families from e.g. the master or co-ordinator’s project into Excel). Exclusive super-codes (i.e. devices which act like codes but are combinations codes from saved previous search expressions) are listed in the main codes list. Thus the distribution of a relatively empty HU with full list of codes and potential standardised queries embodied at the super-codes can be another way of starting team members off with advanced query structures already in place.
4. Sharing and distributing output files from ATLAS.ti
Include documents with or without paragraph numbers plus efficient code mapping in margin (similar to illustration included with MAXqda details) (print only for code mapping). Usual capacity for coded outputs, summarised lists with comments included in order to share code definitions etc., (file/print). The whole HU can be exported to clickable html files for web browsing, or can be configured using style sheets and XML export. The HU can also be exported direct to SPSS if a strong quantitative mixed methods element in the overall research project would benefit.
The external database is potentially more awkward for moving whole projects around, but the copy bundle tool makes backing up and HU transferral easier; the one-stop merge tool provides endless flexibility; handles the unification of new work in existing memos well for second or later merges.
ATLAS.ti does not have hierarchical structure in its main listing of codes, but a quasi or cosmetic structure can be imposed by prefixing codes so that when alphabetically sorted they sit together much as in a hierarchy of related codes.
ATLAS.ti is developed by Scientific Software, Berlin.
Initial statements about the software and merging projects in QDA Miner:
- It is not possible to work from separate workstations on the same QDA Miner project at the same time
- In addition to working in serial on the same project, one after the other, there are several ways to combine work or merge projects in QDA Miner (see points 1 to 3 below)
- QDA Miner was originally designed as the qualitative, coding module of a suite of software which included SimStat (statistical analysis of content), and WordStat (text-mining)
- Data which can be handled directly by QDA Miner includes text,pdf, graphics and spreadsheet data
Ways of working collaboratively
1. Merging projects
The merge feature that will match cases, and append to a master project codings done to documents or image by someone else. It will also add new codes, new cases add new variables (if found), outputs stored in the report manager as well as a command log of what the other person have been doing.
2. Special support for working in serial on one project or passing projects to users
QDA Miner has a special teamwork menu. Via this, the originator has the potential of full control of other users’ range of actions in the project, e.g. whether they can edit text, add or delete cases, modify variables, view codings created by other users etc. A duplicate project option will allow the administrator of a project to create copies of the project (with option to rename it - typically to add the coder's name). This command can export the whole project but if a filtering condition is active, it will only export what is currently shown (allowing the other team member only a portion of the project). A send my email command will: a) duplicate the project; b) compress the project into a single ZIP file; c) create an email message with the ZIP file in attachment and default text and title; d) keep a backup of the file that was sent (optional). The recipient can also use the same function to send back the project to the administrator. The full project may also be saved into a single WPZ file and read by anybody who has the free QDA Miner reader. This free reader allows one to do text search and coding retrieval on a project in its original format or in the portable WPZ file format. This means potentially that writing up can be achieved by members of the team who are non-experts with the software.
3. Individual aspects of the infrastructure can be exported / imported
QDA Miner has a special facility via which a spreadsheet of say, survey data including quantitative information and open ended questions can actually initiate the creation of the project; in the process creating the cases, their text fields, and assigning the descriptive variables to the cases. direct from the spreadsheet. This means that in a mixed methods project, the team responsible for the quantitative research can feed directly into the qualitative part of the research. A codebook can also be imported from another project.
4. Sharing and distributing data via output
Most qualitative results and retrieval is organised into tables which can be exported. All tables (coding retrieval, memos, text search, crosstab searches across variables, command log, etc.) can be saved to disk in various formats including html, and Excel, XML as well as SPSS. All graphics can be saved into BMP, JPG, PNG files. A special Report Manager is a location in the software where results, charts and memos can be particularly selected and stored for export. The output manager can create a single HTML, RTF or MS Word file with all graphics, images, text segments, quotes, memos and tables stored. The full project may also be saved into a single WPZ file and read by anybody who has the free QDA Miner reader. This free reader allows one to do text search and coding retrieval on a project in its original format or it the portable WPZ file format.
QDA Miner has unique options for collaboration and mixed methods support, making the transferral of projects between team members as easy as it could possibly be.
QDA Miner is developed by Provalis Research Inc., Montreal, Canada.
Initial statements about the software and merging projects in NVivo 8 and NVivo 9:
- It is not possible to work from separate workstations on the same NVivo project at the same time (except with the Server version of NVivo 9)
- In addition to working in serial on the same project, one after the other there are several ways to combine work or merge projects in NVivo (see points 1 to 3 below)
- Similar functions in NVivo 7 and 8 (no Server)
- NVivo was originally designed to work with textual data, now supports direct, similar functionality with multimedia (graphics, audio-visual) and spreadsheet datasets
Ways of working collaboratively
1. Merging whole or selected parts of projects
Merging whole or selected parts of projects is possible by importing one project into another – see illustration below.
Options to merge or create new duplicate objects are included. This means that if you choose the merge option, it detects objects that have the same label and the same content (e.g. memos) and merges them. If the system detects different content then it will create a duplicate item. In contrast – choosing the merge option for duplicate codes might be important. If two codes are the same i.e. same label, same hierarchy and same folder but are linked to different data segments then the codes will merge and the different segments will be added to the one code. This is usually the outcome required by the teams.
2. Merging the structures only
Merging only the structures of a project by importing one project into another, while ‘excluding content’ is shown in the above illustration. You can pro-actively check any import options offered in order to create the structural framework for the new related project. The options available include: nodes, node and source classifications attributes etc., sets, folders and relationship types. Coding comparison reports can be generated on merges of early coded documents.
3. Individual structures
Individual structures such as classifications sheets (with attributes and values) can be imported from related projects.
4. Sharing and distributing output files
Sharing and distributing output files e.g. retrieved coded segments, with partial code mapping in margin, documents with options to include partial mapping of codes in the margin* paragraph numbering, annotations and see also links (added as end-notes), (file/print), full or partial user-defined tables containing codes lists, memos, documents etc. Some output can be viewed in html via exporting coded data or the results of some queries. Exported coded audio visual clips viewable in html.
NVivo enabled synchronous work in real time. It has good variety and flexibility in merging options; excellent transferral of project framework options (without content); *coded margin output not effective as code mapping is usually printed on separate page, but analytically useful since coded margin available in coded retrieval panes also.
NVivo is developed by QSR International, Doncaster, Victoria, Australia.
Synchronous work by multiple users
Some initial statements about NVivo 9 Server:
- NVivo 9 and NVivo Server 9 are separate software products. To work with NVivo projects stored on a server, an institution needs licenses for both NVivo 9 and NVivo Server 9. Each team member needs an NVivo 9 client access license (CAL), and a license is also required to install NVivo Server 9 onto the server.
- Licensing is based on the number of users and the hardware that NVivo Server 9 will be installed on. You can choose to allocate licenses to specific users, or to concurrent (floating) users, or a combination of both. NVivo Server 9 licensing overview will help with this process.
How collaborative work is enabled
Logistics of the Server element
NVivo Server is installed with its own SQL Server. The SQL Server database is created during installation. By ‘Server’ QSR mean within a local area network. Access via the internet might be possible but only via ‘thin client’ software such as MS remote desktop. Get further information from QSR re this aspect. Licenses and user access to the server can be managed using the browser-based NVivo Server Manager. Project administration enables monitoring, maintaining, backing up and keeping track of all projects in a secure location. The Server installation means that projects can be much greater in size than in the standalone NVivo 9 situation. More data and the embedding (internal database) of larger audio and video files (up to 100 MB) in your projects.
Managing a project
NVivo Server also provides for the control of who accesses the project and how. For example, the project supervisor has full rights to change any aspect of the project, and provide a colleague not directly involved with the project, ‘read only’ access. Each user is authenticated according to their existing network accounts.
Tracking progress by users
Users with CALs can log in to any Server project for which they have permissions, and work normally (or they can work on their own stand-alone project).
A project event log (also available in NVivo 9 standalone version) lets researchers see the changes that have been made to the project - see which team member deleted an interview, analysed or ‘coded’ information to a specific theme or edited a model for example - choose to use the event log, connect by turning it on (or off) at any time - filter the log, sort it, or export the log to prompt a discussion with your team. Use it to check on the contributions made by each team member and to prompt discussions if necessary.
Collaboration and analysis
Collaboration and analysis is carried out using the normal functionality in NVivo and because changes can be observed in real-time by users who are working on the same project, conventions can be used for communication and discussion areas in the software.
NVivo 9 Server has a complicated licencing system and requires expert installation. At this stage there is every reason to suppose that the synchronous context will add considerable benefits. It is not yet thoroughly proven concerning claims about size of project it can handle efficiently.
See NVivo 9 on the QSR International website for developer details.
Some initial statements about working with Transana:
- Transana is low cost open source software and was originally designed completely around the use and transcription of video (manual transcription, not voice recognition). It does not function without video. It was designed principally for individual use but is now available in multi-user format.
- Transana itself has three central components: the application program itself, any video or audio files that you are analysing, and a database that contains the analysis, including transcripts.
- The single-user version of Transana is designed for projects where all the video analysis is conducted on a single computer. Only one person at a time can work on it at a time.
- The multi-user version of Transana (Transana–MU) can split up these components. It allows users on different computers (and in different physical locations) to make changes to the same analysis files (the Transana database).
- If you only want to share video or audio data, and synchronous ‘share access’ to the Transana database is not required, then you do not need Transana-MU. Since these files don't change there is no barrier to everyone having his or her own copy. It is necessary to be clear about this, since set-up logistics for the multi user version are not trivial.
- If the people who are collaborating want to be able to keep their data in the same Transana database, they should use Transana-MU.
How collaborative work is enabled in Transana-MU
1. Constraints of sharing video over the internet
Sharing synchronous communications and the simultaneous viewing of video data from each distinct location (i.e. each separate computer network) needs a copy of each video (this is because video files are large, and access times for video over the Internet are slow- an hour of video in the most compressed usable format takes about 650MB of disk space. Internet traffic-related delays would be frustrating, and most streaming video formats do not provide the capacity to select an arbitrary location in a video file the way Transana requires. This problem is solved by each location having a set of the video files.
2. Transferring video between locations
The storage resources broker (SRB): to help solve problems related to distributing video to each location the SRB is a server that facilitates secure, encrypted transfer of video between locations. In addition, if you have a lot of video, the SRB can ease storage problems. With a centralised repository with all the video, various locations can then decide what videos they want to keep a local copy of, and they can pull down videos as needed, so that each location doesn't need enough storage to have the whole collection all the time.
3. Collaborative working
Multiple transcripts for each video can be made and synchronised with the video and analysis centres around a series of videos, the creation of collections and clips, and assignation of searchable keywords to the transcript sections of the clips: a researcher can view collections and clips and make changes to the database (for example, updating keywords, or a transcript) at the same time that someone else is doing this, even if that person is at a different location. Share analytic markup with distant colleagues to facilitate collaborative analysis. View graphical and text-based reports about analytic coding. Engage in complex data mining and hypothesis testing across large video collections.
Transana is the obvious choice of qualitative software tool if the data in a project (or part of a project) is primarily video - particularly if there are large quantities of data. The ability to create multiple transcripts/accounts/notes/summaries for each video is unusual and offers teams (or individuals) the chance to focus on different aspects of an activity for comparison. The logistics of simultaneous viewing have been thoroughly mastered via the SRB.
Transana was originally created by Chris Fassnacht. It is now developed and maintained by David K. Woods at the Wisconsin Center for Education Research, University of Wisconsin-Madison.
Initial statements about Dedoose:
- Dedoose is a web 2.0 application which is specifically designed for simultaneous collaborative work, but is also used by individual researchers. Web 2.0 applications are run from the originator’s website, thus there is nothing to download. Access to your project once created is via the internet.
- Designed to serve the management and analysis of qualitative and particularly mixed method research.
- Security: access to project is protected by encrypted transmission and password protection. Nightly back-ups are made. Encryption protocols are that of the US National Security Administration.
- It has an unusual sliding subscription pricing mechanism – minimum of 1 months use to ease the costs of stop and start working rhythms.
How collaborative work is enabled in Dedoose
1. Working jointly
Researchers can tap into a project easily at different times, simply by logging on. Dedoose has a project chat system that people can use to communicate in real time while online in a project at the same time. The memo system is another tool that can be used for communication among team members. Team members can effectively be working on the same document at the same time. When a task is actioned e.g. an excerpt is made, it is submitted to the joint project for others to see when the page is refreshed – i.e. when the user comes out of that document, page or window.
2. Qualitative analysis
Text can be split into excerpts which can overlap or be embedded (they are marked in different colours in the text and can be listed) each excerpt can be coded or tagged to numerous themes or topics – using a weighting device if necessary to scale strength or emphasis made by a respondent. The occurrence and frequency and relative weight of codes can be visualised in various interactive charts. Mixed methods: has all the capacity of the first group of software above to integrate quantitative with qualitative data e.g. demographic data about respondents via the application of descriptors, either manually or by the import of material in spreadsheets. Also deals with quantitative data in their own right.
Dedoose is the ultimate in simplicity - there is no installation and there are few logistical concerns except for the normal constraints of internet use. Basic coding and advanced interrogation tools – with sophisticated interactive visualisation tools to promote discussion, analysis (and presentation) software would be easy for the whole team to learn online with effective video tutorials and ready support from developers.
Dedoose was developed by Drs. Tom Weisner and Eli Lieber of the University of California, Los Angeles.
di Gregorio, S and Davidson J (2008) Qualitative Research for Software Users, McGraw Hill, Open University Press, UK
Glaser B. G. and Strauss A. (1967) The Discovery of Grounded Theory. Chicago: Aldine
Lewins, A. and Silver, C. (2007) Using Software In Qualitative Research: A Step By Step Guide. London Sage Publications
Strauss, A. and Corbin, J. (2015) The Basics of Qualitative Research: Techniques and procedures for Developing Grounded Theory, 4th edition, Thousand Oaks, Sage Publications
Detailed practical and step by step help for using software is included in Lewins and Silver (2007) for the use of ATLAS.ti, MAXqda and NVivo. See also useful online help menus in different CAQDAS packages.
Face to face training is available at training events held by the CAQDAS networking project.
Di Gregorio and Davidson (2008) provide an in-depth study of project design and CAQDAS. They use many project exemplars to illustrate how planning for CAQDAS and project design come together effectively.