Towards an international language for incontinence‐associated dermatitis (IAD): design and evaluation of psychometric properties of the Ghent Global IAD Categorization Tool (GLOBIAD) in 30 countries

Incontinence‐associated dermatitis (IAD) is a specific type of irritant contact dermatitis with different severity levels. An internationally accepted instrument to assess the severity of IAD in adults, with established diagnostic accuracy, agreement and reliability, is needed to support clinical practice and research.

Objectives To design the Ghent Global IAD Categorization Tool (GLOBIAD) and evaluate its psychometric properties. Methods The design was based on expert consultation using a three-round Delphi procedure with 34 experts from 13 countries. The instrument was tested using IAD photographs, which reflected different severity levels, in a sample of 823 healthcare professionals from 30 countries. Measures for diagnostic accuracy (sensitivity and specificity), agreement, interrater reliability (multirater Fleiss kappa) and intrarater reliability (Cohen's kappa) were assessed. Results The GLOBIAD consists of two categories based on the presence of persistent redness (category 1) and skin loss (category 2), both of which are subdivided based on the presence of clinical signs of infection. The agreement for differentiating between category 1 and category 2 was 0Á86 [95% confidence interval (CI) 0Á86-0Á87], with a sensitivity of 90% and a specificity of 84%. The overall agreement was 0Á55 (95% CI 0Á55-0Á56). The Fleiss kappa for differentiating between category 1 and category 2 was 0Á65 (95% CI 0Á65-0Á65). The overall Fleiss kappa was 0Á41 (95% CI 0Á41-0Á41). The Cohen's kappa for differentiating between category 1 and category 2 was 0Á76 (95% CI 0Á75-0Á77). The overall Cohen's kappa was 0Á61 (95% CI 0Á59-0Á62).
Conclusions The development of the GLOBIAD is a major step towards a better systematic assessment of IAD in clinical practice and research worldwide. However, further validation is needed.
What's already known about this topic?
• Incontinence-associated dermatitis (IAD) is an irritant contact dermatitis in adults with incontinence.
• Ten IAD severity categorization instruments have been developed, some of which have been found to be time-consuming and (linguistically) complex when used in clinical practice.
• A universal IAD classification system is needed to guide practice, inform educational platforms and support research.
What does this study add?
• The Ghent Global IAD Categorization Tool is based on input from international experts and was psychometrically tested by 823 healthcare professionals from 30 countries.
• The accuracy of differentiating between a diagnosis of erythema vs. skin loss was high when IAD was classified based on images.
• The identification of clinical signs of infection can be prone to error.
The prevention and treatment of diaper dermatitis in babies and small infants has been a recognized topic of dermatological research and practice for decades. 1 This cutaneous problem not only occurs in paediatric patients, but also is common in adults and is widely accepted as incontinence-associated dermatitis (IAD). 2 IAD is a specific type of irritant contact dermatitis caused by prolonged contact of the skin with urine or faeces, and is characterized by erythema and oedema of the perianal or genital skin. In some cases, the clinical picture is accompanied by bullae, erosion or secondary cutaneous infection. 3 The aetiology of IAD is complex and multifactorial. 4 Excessive skin surface moisture resulting in skin maceration, chemical irritation and physical irritation increases the skin surface pH and enhances the permeability of the skin, thereby compromising the skin barrier function. 5 Therefore, the skin is more permeable to irritants and pathogens. 6 The most common microorganisms associated with IAD are Candida albicans, Escherichia coli and Clostridium difficile from the gastrointestinal tract, Pseudomonas aeruginosa and Staphylococcus aureus from the perineal skin. [3][4][5]7 The epidemiology of IAD varies across different countries, healthcare settings and patient populations. The incidence of IAD is between 3Á4% and 50% and the prevalence of IAD is estimated to be between 5Á7% and 27%, with the highest prevalence in acute care settings. 3,8 While certain patient populations may be more vulnerable to IAD, wide variations in the prevalence of IAD could be explained by the lack of internationally agreed diagnostic criteria to differentiate IAD from other skin conditions such as superficial pressure ulcers. 9 In line with the National Pressure Ulcer Advisory Panel (NPUAP) and the European Pressure Ulcer Advisory Panel (EPUAP) pressure ulcer classification system, the systematic assessment of IAD using a valid and reliable international classification tool is recommended. 9 A recent Cochrane review revealed a substantial heterogeneity of reported outcomes and instruments in IAD research. 10 To date, 10 IAD-related instruments have been developed, [11][12][13][14][15][16][17][18][19][20] three of which were developed for IAD risk assessment, 14,19,20 nine for describing the severity of IAD, [11][12][13][15][16][17][18][19][20] and two instruments have been developed for the classification and treatment of IAD. 18,19 Five instruments propose global assessment and categorize IAD as mild, moderate or severe, 13,[15][16][17][18][19][20] whereas the other instruments use a (cumulative) scoring system to delineate the severity or risk on a continuum or dimension. 11,[13][14][15][16] Four instruments assess patient-specific symptoms, such as pain and burning. 11,12,19,20 An ideal instrument should measure IAD consistently and accurately. 21 Content validity was assessed by experts in only four instruments. [14][15][16]20 The psychometric properties of five instruments were tested through the assessment of patients 14,22 or photographs. 15,16,23 In addition, several instruments 11,13,17 were found to be time-consuming and complex when used in clinical practice. 24 Therefore, in 2015 an international expert panel proposed a simplified IAD severity categorization tool. 25 It included the following three categories: 'no redness and skin intact' (at risk, category 0), 'red but skin intact' (category 1) and 'red with skin breakdown' (signs can include vesicles, denudation and/or skin infection) (category 2). 25 However, this classification was not developed in a formal way and its psychometric properties have not been tested. The aim of this study was to develop this tool further and to evaluate its psychometric properties.

Materials and methods
A two-phase psychometric instrument development and validation study was conducted. Phase I included the design and content validation, and phase II included the evaluation of the psychometric properties of the instrument.

Phase I: instrument design and content validation
The initial version of the simplified tool was used for content validation. To achieve consensus on the content validity of the tool, the Delphi method was used to allow a panel of experts to provide feedback on the tool and present arguments in order to justify their viewpoints. The panel consisted of 34 experts from different fields of IAD expertise [clinical (n = 17), research (n = 21) and education (n = 11)] from Australia (n = 2), Austria (n = 4), Belgium (n = 4), Czech Republic (n = 1), France (n = 1), Germany (n = 1), Norway (n = 1), Italy (n = 2), South Africa (n = 1), Spain (n = 13), Turkey (n = 1), the U.K. (n = 2) and the U.S.A. (n = 1). In the first round, the expert panel was sent an invitation in an e-mail that included a link to an online survey (software package LimeSurvey â ; http://lime survey.org). The experts were asked whether they agreed with, or had any comments on, the proposed purpose, structure (e.g. number of items) and categories of the tool. Next, the experts were asked whether they had any comments concerning the definitions or the proposed diagnostic criteria for the three categories, and whether they had any additional comments. After the first round, the results were summarized and presented to the participants. In the second and third rounds, the participants were asked whether they agreed with, or had any comments on, the revised tool.

Phase II: evaluation of psychometric properties
The aim of this phase was to examine diagnostic accuracy of the instrument in addition to interrater and intrarater reliability and agreement. A total of 34 photographs were selected by two experts in IAD diagnostics who had extensive expertise in research and clinical practice (D.B. and S.S.). An online survey was developed (LimeSurvey) and translated into the 14 languages of the 30 participating countries by native speakers with extensive content expertise. Back translation was not performed. The survey included information on the procedure and confidentiality, demographic questions, the tool and the photographs. Diagnostic accuracy was measured by comparing the ratings of the participants with those of the two experts (reference standard). Interrater reliability and agreement was examined for the ratings of the participants. Intrarater reliability and agreement, with a 1-week interval between ratings, was examined for all participants.

Participants
An online survey was set up between January and March 2017 in a convenience sample of healthcare professionals. Participants were recruited in Australia, Austria, Belgium, Canada, Croatia, Czech Republic, Denmark, France, Germany, Hungary, Italy, Norway, Portugal, Saudi Arabia, Slovakia, Spain, the Netherlands, Turkey, the U.K. and the U.S.A. The call to participate, including the link to the online survey, was sent by e-mail to the EPUAP, the NPUAP, the European Wound Management Association, the Pan Pacific Pressure Injury Alliance (representing Wounds Australia, New Zealand Wound Care Society, Hong Kong Enterostomal Therapist Society and Wound Healing Society Singapore), the Wound, Ostomy and Continence Nurses Society, Wounds Canada, the Canadian Association for Enterostomal Therapy and the Wound Healing Association of Southern Africa. The wound care organizations disseminated the call by publishing an announcement on their websites or by e-mailing members.

Photographs
In total, 34 photographs of IAD were selected and categorized by two experts in IAD diagnostics (Table S1; see Supporting Information). This set of photographs included two photographs from patients with darkly pigmented skin. The sample size calculation was performed in the statistical software package R 26 using the function CI4Cats in the kappaSize Rlibrary (version 1.1) 27,28 to determine the number of photographs needed to study the interrater reliability with four outcome categories. The confidence interval (CI) approach was used to estimate the sample size for kappa calculation (K). A minimum of 33 photographs was required, based on an anticipated K-value of 0Á8 (based on previous research), 29 an expected lower bound for a one-sided 95% CI of 0Á7, and the prevalence rates per category (i.e. the estimated prevalence in daily practice: category 1A = 25%, category 1B = 15%, category 2A = 30%, category 2B = 30%).

Ethical considerations
The procedure was approved by the ethics committee of Ghent University Hospital (B670201627633). All participants received full information before the start of the study. In the questionnaires, the purpose and procedure were fully explained, and anonymity and confidentiality were assured. The return of a completed questionnaire was taken as an indication of consent to participate.

Data analysis
Diagnostic accuracy, agreement and reliability were calculated. The primary outcome measure was the four-category classification of the 34 photographs according to the Ghent Global IAD Categorization Tool (GLOBIAD) based on persistent redness, skin loss and clinical signs of infection. As secondary outcome measures, two binary measures were considered: firstly, the classification for persistent redness or skin loss; and secondly, the classification for cases with or without clinical signs of infection.
Summary measures of overall and specific agreement for all levels of the outcome measures were calculated. The summary measures were the estimated mean with 95% CI, the estimated median value and the interquartile range, and the 2Á5th and 97Á5th percentile of the characteristic, based on the evaluations of the individual raters with respect to the reference standard. The diagnostic accuracy for secondary outcome measures was assessed using summary measures for sensitivity and specificity for each rater with respect to the reference standard.
The interrater reliability and agreement among raters was assessed using Fleiss kappa for multiple raters. 30 The scores of the reference standard were not included in the multirater Fleiss kappa. The intrarater reliability and agreement was examined by comparing the first and second ratings of the same photographs for participants who participated twice within 1 week. No feedback was provided between the test and retest. The photographs were presented in a random order to reduce potential bias. Summary measures of Cohen's kappa, overall agreement and specific agreement for all levels of the outcome measures were calculated for each individual rater.
The criteria for the K coefficient by Landis and Koch were used to interpret the results (< 0Á00 = poor, 0Á00-0Á2 = slight, 0Á21-0Á40 = fair, 0Á41-0Á60 = moderate, 0Á61-0Á80 = substantial, and 0Á81-0Á99 = almost perfect). 31 All measures were calculated in R version 3.4.1. 26 The concordance function in the R-library raters version 2.0.1 was used to obtain Fleiss kappa and 95% CIs, and the kappa2 function in the interrater reliability and agreement R-library version 0.84 was used to obtain the Cohen's kappa.

Instrument design and content validation
The tool that emerged after the third Delphi round can be found in Figure S1 (see Supporting Information). An overview of the instrument design process is presented in Figure 1.
A first point of discussion was the purpose of the instrument. Several experts emphasized the need for a simplified and clear tool to classify IAD. The twofold purpose of the instrument was approved after the second Delphi round. During the Delphi procedure, different items were added to the categories (such as a range of clinical signs of infection). A number of items were included in a glossary of terms in order to enhance clarity. These terms were defined according to the terminology of the International League of Dermatological Societies and approved in the third Delphi round. 32 As determined by the experts, the addition of pain as one of the signs of inflammation, and other patient symptoms, emerged as very important factors to be included in each category. A final point of discussion was the inclusion or exclusion of category 0, which described patients who had intact skin but were considered to be at risk. After the second Delphi round, it was decided that category 0 should be deleted in order to be in line with existing disease classifications currently used in medicine. The absence of a condition is rarely classified and would cause difficulties during psychometric evaluation.
The GLOBIAD consists of the following two main categories: persistent redness (category 1) and skin loss (category 2). Each category is subdivided into IAD (A) without and (B) with clinical signs of infection. Next to these critical criteria, additional criteria are given. Each category is visualized with characteristic images. Category 1A is displayed in Figure 2.

General characteristics of the participants
A total of 823 participants (84Á6% women) completed the first step and 463 completed the second step (Table 1). More detailed information about the countries where the participants worked can be found in Table S2 (see Supporting Information).

Diagnostic accuracy and agreement
The diagnostic accuracy and agreement between participants and the reference standard is presented in Table 2. The

International dissemination of survey (34 photographs)
Step 1 Diagnostic accuracy Overall proportion of agreement

Proportion of specific agreement
Inter-rater reliability Test-retest procedure with one week interval Step 2

Intra-rater reliability
Overall proportion of agreement Proportion of specific agreement average overall agreement ranged from 0Á55 (95% CI 0Á55-0Á56) for all categories, to 0Á64 (95% CI 0Á64-0Á65) for differentiating between categories A and B, to 0Á86 (95% CI 0Á86-0Á87) for differentiating between categories 1 and 2. The lowest mean specific agreement was found for categories 1B and 2B [0Á47 (95% CI 0Á45-0Á48) and 0Á47 (95% CI 0Á46-0Á48), respectively]. The highest mean specific agreement was found for category 1A (0Á72, 95% CI 0Á71-0Á73). A mean sensitivity of 90% (95% CI 0Á89-0Á91) and a mean specificity of 84% (95% CI 0Á83-0Á85) was found for classifying categories 1 and 2. Sensitivity and specificity for classifying categories A and B was much lower. A higher overall agreement was found in participants who described themselves as experts, ranging from 0Á61 for all categories, to 0Á70 for differentiating between categories A and B, to 0Á88 for differentiating category 1 from category 2.

Discussion
IAD is highly prevalent among individuals with urinary and/ or faecal incontinence. 3 The heterogeneity of reported outcomes and instruments points towards a need for standardized classification. 10 The aim of this study was to develop the GLO-BIAD and evaluate its psychometric properties; the input from a group of international experts and clinicians was used to create an internationally agreed description of IAD and standardize the documentation for clinical practice and research.
The content and face validity of the GLOBIAD were supported by international expert review and input. The key diagnostic criteria for IAD are persistent redness, skin loss and clinical signs of infection. The agreement among experts after the Delphi process was 100%. IAD was classified as persistent redness or skin loss, two of the most distinguishing features of IAD according to the opinions of 34 international experts. The clinical presentation of skin loss and erythema could be explained by the underlying pathophysiology of IAD. [3][4][5] The presence of erythema and skin loss is also consistently reflected in all available IAD assessment tools. [11][12][13][14][15][16][17][18][19][20] The assessment of clinical signs of infection was considered important and clinically relevant by the experts when categorizing IAD, as this affects the choice of intervention. This is in line with the high prevalence of cutaneous infections (between 19% and 63%). 6,7,[33][34][35] Finally, because the purpose of the tool did not include risk assessment, category 0 was deleted.
In this study, the diagnostic accuracy and reliability of GLO-BIAD was examined in an international sample of 823 healthcare professionals. Sensitivity and specificity estimates indicated a high degree of diagnostic accuracy for distinguishing between intact but erythematous skin and skin loss when healthcare professionals applied this tool to the presented images. There seemed to be a lower degree of diagnostic accuracy when assessing clinical signs of infection. Local signs indicating an infection include erythema, warmth, swelling, purulent exudate and pain, 36 some of which cannot be assessed using photographs. As it is difficult to diagnose wound infection based on clinical observation alone, a (semi-) quantitative swab of the wound could be considered. 36,37 However, this technique is time-consuming, expensive and of limited accuracy. 38 The correct and early detection of clinical   signs of infection by a healthcare professional is crucial in the management of IAD. 38,39 Inadequate treatment can cause delayed wound healing, prolonged hospitalization and an increase in costs. 40 The results of the interrater reliability estimates can be interpreted in a similar way. The participants were better able to distinguish between intact and eroded skin compared with identifying signs of infection. For content validity reasons, it was decided that the clinical signs of infection should be included in the final tool. Intrarater reliability and agreement across all four categories was 'substantial' according to the proposed interpretation by Landis and Koch. However, this might be too low to be used for individual clinical decisionmaking, as one may expect an almost perfect agreement when diagnosing the severity of IAD. 41 The strengths of the study were the sound content and the face validation by a large group of international stakeholders, which will facilitate and contribute to the global dissemination of the tool. This study had limitations. The use of photographs provides only a two-dimensional perspective, and important clinical signs of infection like warmth, swelling, pain and itching were not detectable. Further validation in clinical practice (including patients affected by IAD) and other methods for validity testing are required. In addition, it is also well known that the 'base rate' (Table S1; see Supporting Information) influences the reliability estimates. 41 As the number of images with clinical signs of infection was lower (based on an estimated prevalence in clinical practice), sensitivity and specificity, and reliability may have been affected. In addition, there were only two images of darkly pigmented skin. This may limit the applicability of the results to all skin phototypes. Translations were carried out by native speakers with extensive content experience in the field of IAD but back translation was not performed. 42 IAD and pressure ulcers are frequently classified incorrectly. 9,29,43 In this study, a higher interrater agreement and reliability was found among more experienced and more highly educated clinicians. The correct classification of IAD requires a profound knowledge and clear understanding of the pathophysiology, signs and symptoms of this condition. 43 The reliability of IAD assessment and the level of correct scoring will improve when sufficient and adequate education and training are provided. 43 The GLOBIAD was developed as a simple, straightforward and time-saving instrument that can be easily implemented by educators. 24 More research is needed to evaluate the reliability of GLOBIAD and to find out whether better classification skills would improve IAD prevention and treatment.