Context
In 2015, the updated Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for the detection of prostate cancer (PCa) was established. Since then, several studies assessing the value of PI-RADSv2 have been published.
Objective
To review the diagnostic performance of PI-RADSv2 for the detection of PCa.
Evidence acquisition
MEDLINE and EMBASE databases were searched up to December 7, 2016. We included diagnostic accuracy studies that used PI-RADSv2 for PCa detection, using prostatectomy or biopsy as the reference standard. The methodological quality was assessed by two independent reviewers using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Sensitivity and specificity of all studies were calculated. Results were pooled and plotted in a hierarchical summary receiver operating characteristic plot with further exploration using meta-regression and multiple subgroup analyses. Head-to-head comparison between PI-RADSv1 and PI-RADSv2 was performed for available studies.
Evidence synthesis
Twenty-one studies (3857 patients) were included. The pooled sensitivity was 0.89 (95% confidence interval [CI] 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83) for PCa detection. Proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting heterogeneity ( p < 0.01). Multiple subgroup analyses showed consistent results. In six studies performing head-to-head comparison, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity was not significantly different (0.73 [95% CI 0.47–0.89] vs 0.75 [95% CI 0.36–0.94], respectively; p = 0.90).
Conclusions
PI-RADSv2 shows good performance for the detection of PCa. PI-RADSv2 has higher pooled sensitivity than PI-RADSv1 without significantly different specificity.
Patient summary
We reviewed all previous studies using Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for prostate cancer detection. We found that the updated PI-RADSv2 shows significant improvement compared with the original PI-RADSv1.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.
First author (year) | Origin | Duration of patient recruitment | Patients ( n ) | Patients with PCa ( n ) | Age (yr) | PSA (ng/ml) | Gleason score | No. of previous biopsies | PCa diagnosis before MRI | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Country | Institution | Median | Range | Median | Range | Median | Range | ||||||
Auer (2016) [16] | Austria | Medical University of Innsbruck, Medizinische Hochschule Hannover | NR | 50 | NR | 63 a | NR | 7.3 a | NR | 7 (3 + 4) | 6–10 | ≥1 | Yes (all) |
Baldisserotto (2016) [17] | Brazil | Pontifícia Universidade Católica do Rio Grande do Sul | 2013.6–2015.6 | 54 | 33 | 65.9 a | 53–81 | 8.4 a | 3–31 | 7 (3 + 4) | 6–9 | NR | Yes (some) |
De Visschere (2017) [18] | Belgium | Ghent University Hospital | 2011.5–2014.12 | 245 | 144 | 66 | 44–85 | 9 | 1.4–935.5 | 7 (3 + 4) | ≥6 | 0 | No |
El-Samei (2016) [19] | Egypt | Al-Azher University, El-Minia University | 2014.5–2015.10 | 55 | 38 | 62 a | 51–79 | NR | NR | 7 | 6–10 | NR | No |
Feng (2016) [20] | China | Tongji Hospital | 2013.6–2015.7 | 401 | 150 | 64.4 a | 34–88 | 10.7 | 0.2–1763 | 7 (4 + 3) | ≤6–≥8 | NR | NR |
Kasel-Seibert (2016) [21] | Germany | University Hospital Jena | 2013.7–2015.3 | 82 | 31 | 65 | 48–81 | 13 | 1–111 | 7 (3 + 4) | 6–9 | 1–5 | No |
Lin (2016) [22] | Brazil, Taiwan | Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine | 2011.5–2014.6 | 49 | 49 | 63 | 46–73 | 13.27 | 1.8–41.4 | ≥1 | Yes (all) | ||
Martorana (2016) [23] | Italy | University of Modena and Reggio Emilia, Perugia University | 2014.1–2016.2 | 157 | 79 | 65 | 47–79 b | 10.7 | 1–75 | 6 | 6–8 | 1–3 | No |
Mertan (2016) [24] | USA | National Cancer Institute | 2015.3–2015.9 | 62 | 38 | 65.5 | 50.3–76.6 | 7.1 | 0.5–863 | 7 (3 + 4) | 6–9 | NR | NR |
Muller (2015) [25] | USA, the Netherlands | National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine | 2011.12–2014.5 | 94 | 94 | 62 | 37–79 | 8.51 | 0.7–51.1 | 7 (3 + 4) | 6–10 | 0 | No |
Park (2016) [26] | South Korea | Samsung Medical Center | 2012.1–2014.12 | 456 | 456 | 65 | 46–81 | 3.97 | 3.8–4.2 | NR | ≥6 | ≥1 | Yes (all) |
Park (2016) [27] | South Korea | Yonsei University College of Medicine | 2012.1–2013.3 | 425 | 425 | NR | NR | NR | 1.4–156.9 | 5–6 | 5–10 | ≥1 | Yes (all) |
Polanec (2016) [28] | Vienna | Medical University of Vienna, Confraternität Vienna | 2011.6–2015.9 | 65 | NR | 65.3 a | 62.3–87.4 | 10.8 a | 4.2–74.5 | 7 (4 + 3) | 6–9 | ≥0 | No |
Rastinehad (2015) [29] | USA | Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine | 2012.2–2014.11 | 312 | 202 | 65.1 | 60.3–70.3 b | 7.3 | 5.0–11.4 b | NR | NR | NR | No |
Rosenkrantz (2016) [30] | USA | NYU Langone Medical Center | 2013.9–2015.2 | 343 | 134 | 64 | NR | 5.8 | NR | 7 (3 + 4) | 6–9 | NR | NR |
Stanzione (2016) [31] | Italy | University “Federico II”, Ospedale S. Maria delle Grazie | NR | 82 | 34 | 65 a | 43–84 | 8.8 a | NR | 7 (3 + 4) | 6–9 | NR | No |
Tan (2016) [32] | USA, Taiwan | UCLA, Cathay General Hospital, Hoag Hospital | 2013.3–2016.12 | 106 | 63 | 66.5 | 43–79 | 7.9 | 5.6–10.6 | 7 (3 + 4) | 6–10 | ≥0 | Yes (some) |
Tewes (2016) [33] | Germany | Hannover Medical School, Klinikumder Region Hannover | 2012.12–2014.12 | 54 | 31 | 69.6 a | NR | 8.7 a | NR | 6 | 6–9 | ≥0 | No |
Washino (2017) [34] | Japan | Jichi Medical University Saitama Medical Center | 2010.6–2014.4 | 288 | 159 | 69 | 64–74 b | 7.5 | 5.5–11 b | NR | NR | 0 | No |
Woo (2016) [35] | South Korea | Seoul National University College of Medicine | 2011.1–2013.12 | 105 | 105 | 69 | 49–79 | 8.22 | 0.9–44.2 | 7 | 6–7 | ≥1 | Yes (all) |
Zhao (2016) [36] | China | Peking University First Hospital | 2010.11–2013.12 | 372 | 185 | 68.5 a | NR | 15 a | NR | 7 | 6–10 | NR | No |
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .
Study | Reader | MRI | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
First author (year) | Design | Consecutive enrollment | Reference standard | MRI–reference standard interval | No. of readers | Experience (yr) | Blinding | Magnet strength (T) | Vendor | Model | Endorectal coil | PI-RADSv2 application | Cutoff values | Localization | Type of analysis | Outcome assessed | Definition of csPCa |
Auer (2016) [16] | Retrospective | Yes | RP | NR | 1 | >5 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 (4 + 3) |
Baldisserotto (2016) [17] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP | NR | 2 Independent |
>10/1 | Yes | 3 | GE | SignaHDxt | No | Strict | 3, 4 | Whole | Patient | Any | |
De Visschere (2017) [18] | Retrospective | NR | STRUSGB or RP | NR | NR | NR | NR | 3 | Siemens | Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
El-Samei (2016) [19] | NR | NR | Targeted MRI–TRUS biopsy | ≤2 wk | NR | NR | NR | 1.5 | Phillips | Gyroscan | Yes | Strict | 4 | Whole | Lesion | Any | |
Feng (2016) [20] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy (cognitive) | 1–35 d | 2 Consensus |
5/4 | Yes | 3 | Siemens | Skyra | No | Strict | 4 | PZ, TZ | Patient | Any | |
Kasel-Seibert (2016) [21] | Retrospective | NR | Targeted MRGB | ≤7 d | 2 Independent |
10/<1 | Yes | 1.5 | Siemens | Avanto | Yes | Strict | 4 | Whole | Lesion | Any + csPCa | GS ≥7 |
Lin (2016) [22] | Retrospective | NR | RP | ≤6 mo | 2 Independent |
12/5 | Yes a | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | csPCa | GS ≥7 and >0.5 cc |
Martorana (2016) [23] | Retrospective | Yes | TTB + targeted MRI–TRUS biopsy | 30–78 | 2 Consensus |
4/4 | Yes | 1.5 | Phillips | Achieva | Yes | Strict | 3, 4 | Whole | Lesion | Any | GS ≥7, >0.5 cc, or EPE |
Mertan (2016) [24] | Prospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | NR | 1 | >8 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any + csPCa | GS ≥7 |
Muller (2015) [25] | Retrospective | Yes | STRUSGB + targeted MRI–TRUS biopsy | ≤6 wk | 5 Independent |
12/7/1/1/0.5 | Yes | 3 | Phillips | Achieva | Yes | Strict | 4 | PZ, TZ | Lesion | Any | GS ≥7 (4 + 3) |
Park (2016) [26] | Retrospective | Yes | RP | 21–48 | 2 Independent |
14/3 | Yes a | 3 | Phillips | Achieva | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Park (2016) [27] | Retrospective | NR | RP | NR | 2 Independent |
9/4 | Yes | 3 | GE, Philips, Siemens | DiscoveryMR750, Achieva, Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7, ≥0.5 cc, or EPE |
Polanec (2016) [28] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
NR/NR | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ, TZ | Lesion | Any | |
Rastinehad (2015) [29] | Prospective | NR | STRUSGB + targeted MRI–TRUS biopsy | NR | 3 Consensus |
NR | Yes | 3 | Siemens | Verio | Yes | Calculated b | 3, 4 | Whole | Patient | Any + csPCa | Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy |
Rosenkrantz (2016) [30] | Retrospective | NR | Targeted MRI–TRUS biopsy | NR | 2 Independent |
8/3 | NR | 3 | Siemens | biobraphmMR/Prisma/Skyra/Trio | No | Strict | 4 | PZ, TZ | Lesion | csPCa | GS ≥7 |
Stanzione (2016) [31] | Prospective | NR | STRUSGB (±targeted TRUSGB) | 20–30 d | 2 Independent |
14/10 | Yes | 3 | Siemens | Trio | No | Strict | 4 | PZ | Patient | Any | |
Tan (2016) [32] | Retrospective | Yes | Targeted MRGB | NR | 3 consensus |
16/10 | NR | 3 | Siemens | Skyra, Trio, Verio | No | Strict | 3 | Whole | Lesion | Any + csPCa | GS ≥7 |
Tewes (2016) [33] | Retrospective | Yes | Targeted MRGB | NR | 2 Independent |
5/2 | Yes | 3 | Siemens | Skyra | No | Strict | 3 d | PZ, TZ | Patient | Any | |
Washino (2017) [34] | Retrospective | NR | TTB + targeted MRI–TRUS biopsy (cognitive) | 0.5–1.6 mo | 1 | 14 | Yes | 1.5 or 3 | Toshiba | Excelart Vantage/Vantage Titan 3T | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 or maximum core length ≥4 mm |
Woo (2016) [35] | Retrospective | NR | RP | ≤6 mo | 2 Consensus |
22/10 | Yes a | 3 | Philips, Siemens | Ingenia, Verio/Trio | No | Strict | 4 | Whole | Patient | csPCa | GS ≥7 |
Zhao (2016) [36] | Retrospective | NR | STRUSGB + targeted MRI–TRUS biopsy | ≤3 mo | 2 Independent |
NR | NR | 3 | NR | NR | No | Strict | 3 | Whole | Patient | csPCa | GS ≥7 |
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.