Upcoming event

Diagnostic Performance of Prostate Imaging Reporting and Data System Version 2 for Detection of Prostate Cancer: A Systematic Review and Diagnostic Meta-analysis

  • Sungmin Woo 1,
  • Chong Hyun Suh 2,
  • Sang Youn Kim 1,
  • Jeong Yeon Cho 1,
  • Seung Hyup Kim 1
1 Department of Radiology, Seoul National University College of Medicine, Seoul, Korea 2 Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine, Asan Medical Center, Seoul, Republic of Korea

Take home message

In this meta-analysis, we looked at all previous studies that used PI-RADSv2. PI-RADSv2 shows good performance for the detection of prostate cancer with pooled sensitivity of 0.89 and specificity of 0.73. It has higher sensitivity than PI-RADSv1 without significantly different specificity.

Publication: European Urology, Volume 72, Issue 2, August 2017, Pages 177-188

PII: S0302-2838(17)30067-2

DOI: 10.1016/j.eururo.2017.01.042

Context

In 2015, the updated Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for the detection of prostate cancer (PCa) was established. Since then, several studies assessing the value of PI-RADSv2 have been published.


Objective

To review the diagnostic performance of PI-RADSv2 for the detection of PCa.


Evidence acquisition

MEDLINE and EMBASE databases were searched up to December 7, 2016. We included diagnostic accuracy studies that used PI-RADSv2 for PCa detection, using prostatectomy or biopsy as the reference standard. The methodological quality was assessed by two independent reviewers using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Sensitivity and specificity of all studies were calculated. Results were pooled and plotted in a hierarchical summary receiver operating characteristic plot with further exploration using meta-regression and multiple subgroup analyses. Head-to-head comparison between PI-RADSv1 and PI-RADSv2 was performed for available studies.

Evidence synthesis

Twenty-one studies (3857 patients) were included. The pooled sensitivity was 0.89 (95% confidence interval [CI] 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83) for PCa detection. Proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting heterogeneity ( p < 0.01). Multiple subgroup analyses showed consistent results. In six studies performing head-to-head comparison, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity was not significantly different (0.73 [95% CI 0.47–0.89] vs 0.75 [95% CI 0.36–0.94], respectively; p = 0.90).

Conclusions

PI-RADSv2 shows good performance for the detection of PCa. PI-RADSv2 has higher pooled sensitivity than PI-RADSv1 without significantly different specificity.

Patient summary

We reviewed all previous studies using Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for prostate cancer detection. We found that the updated PI-RADSv2 shows significant improvement compared with the original PI-RADSv1.

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/

Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1] . With the recent technological advancements and growing availability, multiparametric magnetic resonance imaging (mpMRI) currently is increasingly being used for guiding several aspects of PCa management, including detection, staging, and treatment planning [2] . Despite abundant evidence in the literature reporting high accuracy of mpMRI for PCa diagnosis, widespread acceptance has been hampered by several factors including difficulty of interpretation, lack of standardized criteria for interpretation (ie, use of Likert scales based on the radiologist's subjective level of suspicion for PCa), and resulting substantial inter-reader variability [3 4] .

To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5] . PI-RADS was generated based on expert consensus and provides a detailed scoring system for each MRI sequence (T2-weighted imaging [T2WI], diffusion-weighted imaging [DWI], dynamic contrast-enhanced MRI [DCE-MRI], and MR spectroscopy) for the presence of clinically significant PCa (csPCa). Several investigators have validated the accuracy and reproducibility of PI-RADS, and a recent meta-analysis reported pooled sensitivity and specificity of 0.78 and 0.79, respectively [6] . However, as there was no guideline for the generation of an overall score, different research groups utilized various measures for this purpose—some used a sum of the scores from each sequence (ranging from 3 to 15), whereas others used an overall score of 1–5 [7 8] . Furthermore, emerging data questioned the value of curve-type analysis of DCE-MRI [9] . In addition, investigators suggested that some sequences may be more important in determining the likelihood of PCa (ie, DWI in the peripheral zone [PZ] and T2WI in the transition zone [TZ]) rather than equal weighting for all sequences [10] .

To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11] . The main changes from PI-RADSv1 to PI-RADSv2 are the following: (1) introduction of dominant sequences according to zonal anatomy (DWI for the PZ and T2WI for the TZ), (2) limited contribution of DCE-MRI data as merely presence and absence of early focal enhancement, and (3) generation of an overall score (1–5) integrating findings across all MRI sequences.

Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.

This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12] .

2.1 Literature search

A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.

2.2 Study selection

2.2.1 Inclusion criteria

Studies were included if they satisfied all the following requirements according to the PICOS criteria [12] : (1) included patients with suspected or diagnosed PCa; (2) for index test, mpMRI of the prostate including all required sequences of T2WI, DWI, and DCE-MRI was performed and assessed with a PI-RADSv2 scoring system; (3) for comparison, a reference standard based on the histopathological examination of radical prostatectomy or biopsy was used; (4) results were reported in sufficient detail for the reconstruction of 2 × 2 tables and determination of sensitivity and specificity at specified cutoff values for evaluating the diagnostic performance of PI-RADSv2; and (5) studies had to be original articles.

2.2.2 Exclusion criteria

Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.

Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).

2.3 Data extraction and quality assessment

We extracted the following data regarding study design and results from the selected studies using a standardized form:

  • 1.

    Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)

  • 2.

    Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI

  • 3.

    Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)

  • 4.

    Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information

  • 5.

    Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)

  • 6.

    Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)

The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13] . Data extraction and quality assessment were performed independently by two reviewers (S.W. and C.H.S.). All disagreements were resolved by consensus through discussion with the third reviewer (S.Y.K.).

2.4 Data synthesis and analysis

The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.

Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14] . For graphical presentation of the results, an HSROC curve with 95% confidence region and prediction region was plotted. Publication bias was evaluated using the Deeks’ funnel plot, and statistical significance was tested with the Deeks’ asymmetry test [15] .

We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.

3.1 Literature search

A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . No additional studies were identified via screening the bibliographies of these 21 studies. Among them, 15 studies including 3099 patients dealt with PI-RADSv2 alone, whereas six studies including 758 patients provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2 [16 20 21 28 32 33] . The detailed study selection process is described in Fig. 1 .

gr1

Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 . MRI = magnetic resonance imaging; PI-RADS = Prostate Imaging Reporting and Data System; PRISMA = Preferred Reporting Items for Systematic Reviews and Meta-analyses.

3.2 Characteristics of included studies

Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study populations in seven studies [16 17 22 26 27 32 35] . Biopsy was performed before MRI in seven studies [16 21 22 23 26 27 35] , all patients were biopsy-naïve in three studies [18 25 34] , both patient types were included in three studies [28 32 33] , and data regarding previous biopsy were not reported in the remaining eight studies.

First author (year) Origin Duration of patient recruitment Patients ( n ) Patients with PCa ( n ) Age (yr) PSA (ng/ml) Gleason score No. of previous biopsies PCa diagnosis before MRI
Country Institution Median Range Median Range Median Range
Auer (2016) [16] Austria Medical University of Innsbruck, Medizinische Hochschule Hannover NR 50 NR 63 a NR 7.3 a NR 7 (3 + 4) 6–10 ≥1 Yes (all)
Baldisserotto (2016) [17] Brazil Pontifícia Universidade Católica do Rio Grande do Sul 2013.6–2015.6 54 33 65.9 a 53–81 8.4 a 3–31 7 (3 + 4) 6–9 NR Yes (some)
De Visschere (2017) [18] Belgium Ghent University Hospital 2011.5–2014.12 245 144 66 44–85 9 1.4–935.5 7 (3 + 4) ≥6 0 No
El-Samei (2016) [19] Egypt Al-Azher University, El-Minia University 2014.5–2015.10 55 38 62 a 51–79 NR NR 7 6–10 NR No
Feng (2016) [20] China Tongji Hospital 2013.6–2015.7 401 150 64.4 a 34–88 10.7 0.2–1763 7 (4 + 3) ≤6–≥8 NR NR
Kasel-Seibert (2016) [21] Germany University Hospital Jena 2013.7–2015.3 82 31 65 48–81 13 1–111 7 (3 + 4) 6–9 1–5 No
Lin (2016) [22] Brazil, Taiwan Ribeirão Preto School of Medicine, China Medical University, Ribeirão Preto School of Medicine 2011.5–2014.6 49 49 63 46–73 13.27 1.8–41.4 ≥1 Yes (all)
Martorana (2016) [23] Italy University of Modena and Reggio Emilia, Perugia University 2014.1–2016.2 157 79 65 47–79 b 10.7 1–75 6 6–8 1–3 No
Mertan (2016) [24] USA National Cancer Institute 2015.3–2015.9 62 38 65.5 50.3–76.6 7.1 0.5–863 7 (3 + 4) 6–9 NR NR
Muller (2015) [25] USA, the Netherlands National Cancer Institute, AMC University Hospital, Edward Hébert School of Medicine 2011.12–2014.5 94 94 62 37–79 8.51 0.7–51.1 7 (3 + 4) 6–10 0 No
Park (2016) [26] South Korea Samsung Medical Center 2012.1–2014.12 456 456 65 46–81 3.97 3.8–4.2 NR ≥6 ≥1 Yes (all)
Park (2016) [27] South Korea Yonsei University College of Medicine 2012.1–2013.3 425 425 NR NR NR 1.4–156.9 5–6 5–10 ≥1 Yes (all)
Polanec (2016) [28] Vienna Medical University of Vienna, Confraternität Vienna 2011.6–2015.9 65 NR 65.3 a 62.3–87.4 10.8 a 4.2–74.5 7 (4 + 3) 6–9 ≥0 No
Rastinehad (2015) [29] USA Icahn School of Medicine at Mount Sinai, Fox Chase Cancer Center, National Institutes of Health, Hofstra North Shore LIJ School of Medicine 2012.2–2014.11 312 202 65.1 60.3–70.3 b 7.3 5.0–11.4 b NR NR NR No
Rosenkrantz (2016) [30] USA NYU Langone Medical Center 2013.9–2015.2 343 134 64 NR 5.8 NR 7 (3 + 4) 6–9 NR NR
Stanzione (2016) [31] Italy University “Federico II”, Ospedale S. Maria delle Grazie NR 82 34 65 a 43–84 8.8 a NR 7 (3 + 4) 6–9 NR No
Tan (2016) [32] USA, Taiwan UCLA, Cathay General Hospital, Hoag Hospital 2013.3–2016.12 106 63 66.5 43–79 7.9 5.6–10.6 7 (3 + 4) 6–10 ≥0 Yes (some)
Tewes (2016) [33] Germany Hannover Medical School, Klinikumder Region Hannover 2012.12–2014.12 54 31 69.6 a NR 8.7 a NR 6 6–9 ≥0 No
Washino (2017) [34] Japan Jichi Medical University Saitama Medical Center 2010.6–2014.4 288 159 69 64–74 b 7.5 5.5–11 b NR NR 0 No
Woo (2016) [35] South Korea Seoul National University College of Medicine 2011.1–2013.12 105 105 69 49–79 8.22 0.9–44.2 7 6–7 ≥1 Yes (all)
Zhao (2016) [36] China Peking University First Hospital 2010.11–2013.12 372 185 68.5 a NR 15 a NR 7 6–10 NR No

a Mean.

b Interquartile range.

MRI = magnetic resonance imaging; NR = not reported; PCa = prostate cancer; PSA = prostate-specific antigen.

Table 1Patient characteristics

Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16 17 18 20 24 25 26 27 28 29 30 31 32 33 35 36] , 1.5-T scanners in four studies [19 21 22 23 , and either 3 or 1.5 T in one study [34] . Endorectal coils were used in seven studies [19 21 22 23 24 25 29] . In all studies, the mpMRI protocol consisted of T2WI, DWI, and DCE-MRI. The reference standard was radical prostatectomy in five studies [16 22 26 27 35] , a combination of systematic and targeted biopsies in seven studies [17 20 23 25 29 34 36] , and only targeted biopsy in seven studies [19 21 24 28 30 32 33] ; the reference standard was not consistent throughout the study population in two studies [18 31] . PI-RADSv2 scoring was performed by one to five radiologists, either in consensus or independently. The level of experience of the radiologists was heterogeneous, ranging from 4 to 22 yr of experience in the prostate. In most studies, the readers were blinded; however, in three studies, the radiologists were aware that the patients had biopsy-proven PCa [22 26 35] , and five studies were not explicit regarding blinding [18 19 30 32 36] . In the majority of the studies, the interval between MRI and the reference standard was less than 6 mo; however, the details were not reported in 10 studies 16 17 18 24 27 28 29 30 32 33] . PCa was separately assessed according to zonal anatomy in seven studies [20 24 25 28 30 31 33] . However, in one study [31] , only PCa in the PZ could be evaluated, as no detailed data were provided in the article and the attempt to contact the authors for provision of further information was unsuccessful. Regarding the outcome assessed, seven studies evaluated any cancer [17 19 20 23 25 28 31 33] , eight evaluated clinically significant cancer [18 22 26 27 30 34 35 36 , and six evaluated both [16 21 24 29 32] . With regard to cutoff values, 13 studies used ≥4 [16 18 19 20 21 24 25 26 27 28 30 31 35] , four studies used ≥3 32 33 34 36] , and four studies used both [17 22 23 29] . The location of PCa was separately reported by the PZ and TZ in six studies [20 24 25 28 30 33] .

Study Reader MRI
First author (year) Design Consecutive enrollment Reference standard MRI–reference standard interval No. of readers Experience (yr) Blinding Magnet strength (T) Vendor Model Endorectal coil PI-RADSv2 application Cutoff values Localization Type of analysis Outcome assessed Definition of csPCa
Auer (2016) [16] Retrospective Yes RP NR 1 >5 Yes 3 Siemens Skyra No Strict 4 Whole Lesion Any + csPCa GS ≥7 (4 + 3)
Baldisserotto (2016) [17] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) or RP NR 2
Independent
>10/1 Yes 3 GE SignaHDxt No Strict 3, 4 Whole Patient Any
De Visschere (2017) [18] Retrospective NR STRUSGB or RP NR NR NR NR 3 Siemens Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
El-Samei (2016) [19] NR NR Targeted MRI–TRUS biopsy ≤2 wk NR NR NR 1.5 Phillips Gyroscan Yes Strict 4 Whole Lesion Any
Feng (2016) [20] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy (cognitive) 1–35 d 2
Consensus
5/4 Yes 3 Siemens Skyra No Strict 4 PZ, TZ Patient Any
Kasel-Seibert (2016) [21] Retrospective NR Targeted MRGB ≤7 d 2
Independent
10/<1 Yes 1.5 Siemens Avanto Yes Strict 4 Whole Lesion Any + csPCa GS ≥7
Lin (2016) [22] Retrospective NR RP ≤6 mo 2
Independent
12/5 Yes a 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion csPCa GS ≥7 and >0.5 cc
Martorana (2016) [23] Retrospective Yes TTB + targeted MRI–TRUS biopsy 30–78 2
Consensus
4/4 Yes 1.5 Phillips Achieva Yes Strict 3, 4 Whole Lesion Any GS ≥7, >0.5 cc, or EPE
Mertan (2016) [24] Prospective Yes STRUSGB + targeted MRI–TRUS biopsy NR 1 >8 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any + csPCa GS ≥7
Muller (2015) [25] Retrospective Yes STRUSGB + targeted MRI–TRUS biopsy ≤6 wk 5
Independent
12/7/1/1/0.5 Yes 3 Phillips Achieva Yes Strict 4 PZ, TZ Lesion Any GS ≥7 (4 + 3)
Park (2016) [26] Retrospective Yes RP 21–48 2
Independent
14/3 Yes a 3 Phillips Achieva No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Park (2016) [27] Retrospective NR RP NR 2
Independent
9/4 Yes 3 GE, Philips, Siemens DiscoveryMR750, Achieva, Trio No Strict 4 Whole Patient csPCa GS ≥7, ≥0.5 cc, or EPE
Polanec (2016) [28] Retrospective Yes Targeted MRGB NR 2
Independent
NR/NR Yes 3 Siemens Trio No Strict 4 PZ, TZ Lesion Any
Rastinehad (2015) [29] Prospective NR STRUSGB + targeted MRI–TRUS biopsy NR 3
Consensus
NR Yes 3 Siemens Verio Yes Calculated b 3, 4 Whole Patient Any + csPCa Epstein's criteria c for systematic and GS ≥7 or >0.5 cc for targeted biopsy
Rosenkrantz (2016) [30] Retrospective NR Targeted MRI–TRUS biopsy NR 2
Independent
8/3 NR 3 Siemens biobraphmMR/Prisma/Skyra/Trio No Strict 4 PZ, TZ Lesion csPCa GS ≥7
Stanzione (2016) [31] Prospective NR STRUSGB (±targeted TRUSGB) 20–30 d 2
Independent
14/10 Yes 3 Siemens Trio No Strict 4 PZ Patient Any
Tan (2016) [32] Retrospective Yes Targeted MRGB NR 3
consensus
16/10 NR 3 Siemens Skyra, Trio, Verio No Strict 3 Whole Lesion Any + csPCa GS ≥7
Tewes (2016) [33] Retrospective Yes Targeted MRGB NR 2
Independent
5/2 Yes 3 Siemens Skyra No Strict 3 d PZ, TZ Patient Any
Washino (2017) [34] Retrospective NR TTB + targeted MRI–TRUS biopsy (cognitive) 0.5–1.6 mo 1 14 Yes 1.5 or 3 Toshiba Excelart Vantage/Vantage Titan 3T No Strict 3 Whole Patient csPCa GS ≥7 or maximum core length ≥4 mm
Woo (2016) [35] Retrospective NR RP ≤6 mo 2
Consensus
22/10 Yes a 3 Philips, Siemens Ingenia, Verio/Trio No Strict 4 Whole Patient csPCa GS ≥7
Zhao (2016) [36] Retrospective NR STRUSGB + targeted MRI–TRUS biopsy ≤3 mo 2
Independent
NR NR 3 NR NR No Strict 3 Whole Patient csPCa GS ≥7

a Blinded but aware that patients had biopsy-proven PCa.

b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.

c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.

d For transition zone, cutoff value = 4.

csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.

Table 2Study characteristics

3.3 Quality assessment

Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16 17 18 20 21 22 23 25 26 27 28 30 32 33 34 35 36 . Seven studies were considered to have high concern for applicability, as all or some of the patients had a pathological diagnosis of PCa prior to MRI [16 17 22 26 27 32 35] . Regarding the index test domain, there was a high risk of bias in nine studies. In three of nine studies, reviewers were aware that patients had biopsy-proven PCa [22 26 35] . In the other six studies, the cutoff value for determining PCa was not specified prior to interpretation [16 20 29 31 33 34] . Only one study had concern for applicability, as PI-RADSv2 scores were indirectly generated from existing clinical radiological reports based on PI-RADSv1 or an in-house scoring system [29] . Regarding the reference standard domain, eight studies had a high risk of bias. Seven were based on either only systematic biopsy or targeted biopsy [18 19 21 28 30 32 33] ; in one study, targeted biopsy was performed, but on lesions that were suspicious on ultrasonography, and not on MRI [31] . Those in which radical prostatectomy or a systematic plus targeted biopsy (MRI guided, MRI–transrectal ultrasound fusion, or cognitive) was used as the reference standard were considered to have a low risk of bias. In 10 studies, the definition of clinically significant cancer did not abide by those described in the PI-RADSv2 guidelines, and therefore showed high concern for applicability [16 21 22 24 29 30 32 34 35 36 . Regarding the flow and timing domain, two studies had a high risk of bias as patients did not receive the same reference standard [17 18] .

gr2

Fig. 2
Grouped bar charts show (A) risk of bias and (B) concerns for applicability of 21 included studies using QUADAS-2. QUADAS-2 = Quality Assessment of Diagnostic Accuracy Studies-2.

3.4 Diagnostic accuracy of PI-RADSv2

The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.

gr3

Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.

For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).

gr4

Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.

gr5

Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.

For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).

3.5 Heterogeneity exploration

As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).

3.6 Subgroup analysis

As there were four studies that used both ≥3 and ≥4 as cutoff values [17 22 23 29] , or determined both any cancer and clinically significant cancer as outcomes [16 21 24 29 32] , multiple subgroup analyses were performed in order to assess various clinical settings (Supplementary Table 2 ). Regarding cutoff values, the pooled sensitivity was 0.89 (95% CI 0.84–0.92) with specificity of 0.74 (95% CI 0.58–0.85) for 17 studies using ≥4 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 35] , whereas these were 0.95 (95% CI 0.89–0.97) and 0.47 (95% CI 0.21–0.74) in eight studies using ≥3 [17 22 23 29 32 33 34 36] . When we stratified studies according to the outcome assessed, the following results were yielded: (1) cutoff of ≥4 for determining any PCa, sensitivity of 0.89 (95% CI 0.83–0.93) with specificity of 0.80 (95% CI 0.62–0.90); (2) cutoff of ≥3 for determining any PCa, sensitivity of 0.96 (95% CI 0.93–0.98) with specificity of 0.49 (0.29–0.70); (3) for determining csPCa regardless of cutoff values, sensitivity of 0.89 (95% CI 0.84–0.92) with specificity of 0.64 (95% CI 0.46–0.78); (4) cutoff of ≥4 for determining csPCa, sensitivity of 0.90 (95% CI 0.85–0.94) with specificity of 0.62 (95% CI 0.45–0.77); and (5) cutoff of ≥3 for determining csPCa, sensitivity of 0.96 (95% CI 0.87–0.99) with specificity of 0.29 (0.05–0.77). When the studies using a cutoff value of ≥4 were separately assessed according to the type of analysis, per-patient analysis in eight studies [17 18 20 26 27 29 31 35] yielded pooled sensitivity of 0.89 (95% CI 0.81–0.93) with specificity of 0.76 (95% CI 0.60–0.88), whereas per-lesion analysis in nine studies [16 19 21 22 23 24 25 28 30] yielded pooled sensitivity of 0.87 (95% CI 0.83–0.91) with specificity of 0.70 (95% CI 0.44–0.88).

Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20 24 25 28 30 31 33] . In identical studies, except for that by Stanzione et al [31] , which analyzed TZ cancers, the pooled sensitivity and specificity were 0.88 (95% CI 0.77–0.94) and 0.75 (95% CI 0.59–0.86), respectively.

Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.

3.7 Discussion

In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37] , which evaluated seven studies using a combination of T2WI, DWI, and DCE-MRI, the pooled sensitivity and specificity were 0.74 (95% CI 0.66–0.81) and 0.88 (95% CI 0.82–0.92), respectively. In a more recent meta-analysis by Hamoen et al [6] which analyzed 14 studies using PI-RADSv1, the pooled sensitivity and specificity were 0.78 (95% CI 0.70–0.84) and 0.79 (95% CI 0.68–0.86), respectively. However, the comparison between the three studies merely provided an indirect comparison. In order to address this issue, we separately assessed a subgroup of studies using both PI-RADSv1 and PI-RADSv2. In a head-to-head comparison between them, PI-RADSv2 demonstrated higher pooled sensitivity (0.95) compared with PI-RADSv1 (0.88, p = 0.04) without a statistically significant difference in specificity (0.73 vs 0.75, p = 0.90). This increase in sensitivity compared with its predecessor may imply that the revisions undertaken during the development of PI-RADSv2, including the introduction of dominant sequences according to zonal anatomy, limited contribution of DCE-MRI secondary to DWI and T2WI, and specific guidelines for deriving an integrated overall score, were, in fact, on the right track. Especially, we speculate that the use of dominant sequences, that is, DWI for the PZ and DCE-MRI for the TZ, may have been crucial for the improved sensitivity without a loss in specificity, as suggested by Baur et al [10] .

Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11] . Only one study formed PI-RADSv2 scores from existing clinical radiological reports that were based on PI-RADSv1 or an in-house scoring system [29] . This is an improvement when compared with prior studies conducted using PI-RADSv1, where investigators used varying methods in determining the overall score (overall five-point score or sum of the scores from each modality) [6] . Still, there is a need for further clarification regarding the cutoff value for detecting PCa. In the studies included in our meta-analysis, cutoff values were predefined in only six studies, while the majority (15/21) were exploratory in nature, testing multiple criteria. When using a cutoff value of ≥4, sensitivity (0.89) and specificity (0.74) were generally good, whereas using ≥3 yielded excellent sensitivity (0.95) and poor specificity (0.47). These results may be taken into consideration when generating the next updated PI-RADS. For instance, using the former may be adequate for general use of PI-RADS, whereas the latter could be proposed to be indicated when a higher cancer detection rate is clinically required (ie, persistently high PSA level despite a previously negative biopsy).

In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3 or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension) [11] . Most others used one or two of the three criteria. Including only the former three studies may have provided more robust results; yet it was not only pragmatic to include all available studies, but this approach would present a general overview of the existing literature, as it is the first meta-analysis of studies currently dealing with PI-RADSv2.

In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well established, and the overall benefit of using an endorectal coil is not evident [38 39] . The PI-RADSv2 guidelines currently recommend either usage, and the results of our study provide additional evidence to support this.

Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.

Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased diagnostic sensitivity [40] . In addition, not only was performing a meta-analysis using only three prospective studies technically unfeasible, but the derived results would not be representative of the existing literature on PI-RADSv2 as well. Furthermore, we used validated methods for the systematic review and reported the data using standard reporting guidelines, including PRISMA and the guidelines of the Handbook for Diagnostic Test Accuracy Reviews published by the Cochrane Collaboration [12 41] . Another limitation is considerable heterogeneity in our pooled analysis, which affected the general applicability of our summary estimates. To explore the heterogeneity of our data, we performed meta-regression and multiple subgroup analyses. According to the analyses, the proportion of patients with PCa, the magnetic field strength, and the reference standard were significant factors affecting the heterogeneity. Especially, the reference standard included various methods, including radical prostatectomy and a combination of systematic and targeted biopsies (ie, MRI guided, MRI-transrectal ultrasound fusion, or cognitive). Furthermore, the fact that various definitions were used for clinically significant cancer needs to be emphasized. Our analyses using meta-regression and subgroup analyses may explain some of the heterogeneity, but a portion remains unexplained. Another important limitation is the small number of included studies for head-to-head comparison between PI-RADSv1 and PI-RADSv2. However, we were able to elucidate statistically significant difference in the sensitivity between the two versions using only six studies, which have been published until now.

PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.

Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study concept and design: Woo, Suh, S.Y. Kim.

Acquisition of data: Woo, Suh, S.Y. Kim.

Analysis and interpretation of data: Woo, Suh, S.Y. Kim.

Drafting of the manuscript: Woo.

Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.

Statistical analysis: Suh.

Obtaining funding: None.

Administrative, technical, or material support: None.

Supervision: S.Y. Kim, Cho, S.H. Kim.

Other: None.

Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.

Funding/Support and role of the sponsor: None.

  • [1] American Cancer Society. Key statistics for prostate cancer. http://www.cancer.org/cancer/prostatecancer/detailedguide/prostate-cancer-key-statistics .
  • [2] C.M. Hoeks ,J.O. Barentsz ,T. Hambrock ,et al. Prostate cancer: multiparametric MR imaging for detection, localization, and staging. Radiology. 2011;261 :46-66 Crossref
  • [3] O. Ruprecht ,P. Weisser ,B. Bodelle ,H. Ackermann ,T.J. Vogl. MRI of the prostate: interobserver agreement compared with histopathologic outcome after radical prostatectomy. Eur J Radiol. 2012;81 :456-460 Crossref
  • [4] L. Dickinson ,H.U. Ahmed ,C. Allen ,et al. Scoring systems used for the interpretation and reporting of multiparametric MRI for prostate cancer detection, localization, and characterization: could standardization lead to improved utilization of imaging within the diagnostic pathway?. J Magn Reson Imaging. 2013;37 :48-58 Crossref
  • [5] J.O. Barentsz ,J. Richenberg ,R. Clements ,et al. ESUR prostate MR guidelines 2012. Eur Radiol. 2012;22 :746-757 Crossref
  • [6] E.H. Hamoen ,M. de Rooij ,J.A. Witjes ,J.O. Barentsz ,M.M. Rovers. Use of the Prostate Imaging Reporting and Data System (PI-RADS) for prostate cancer detection with multiparametric magnetic resonance imaging: a diagnostic meta-analysis. Eur Urol. 2015;67 :1112-1121 Crossref
  • [7] M. Abd-Alazeez ,H.U. Ahmed ,M. Arya ,et al. The accuracy of multiparametric MRI in men with negative biopsy and elevated PSA level—can it rule out clinically significant prostate cancer?. Urol Oncol. 2014;32 45 e17–22
  • [8] A.B. Rosenkrantz ,S. Kim ,R.P. Lim ,et al. Prostate cancer localization using multiparametric MR imaging: comparison of Prostate Imaging Reporting and Data System (PI-RADS) and Likert scales. Radiology. 2013;269 :482-492
  • [9] B.G. Hansford ,Y. Peng ,Y. Jiang ,et al. Dynamic contrast-enhanced MR imaging curve-type analysis: is it helpful in the differentiation of prostate cancer from healthy peripheral zone?. Radiology. 2015;275 :448-457 Crossref
  • [10] A.D. Baur ,A. Maxeiner ,T. Franiel ,et al. Evaluation of the Prostate Imaging Reporting and Data System for the detection of prostate cancer by the results of targeted biopsy of the prostate. Invest Radiol. 2014;49 :411-420 Crossref
  • [11] J.C. Weinreb ,J.O. Barentsz ,P.L. Choyke ,et al. PI-RADS Prostate Imaging-Reporting and Data System: 2015, version 2. Eur Urol. 2016;69 :16-40 Crossref
  • [12] A. Liberati ,D.G. Altman ,J. Tetzlaff ,et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: explanation and elaboration. J Clin Epidemiol. 2009;62 :e1-e34 Crossref
  • [13] P. Whiting ,A.W. Rutjes ,J.B. Reitsma ,P.M. Bossuyt ,J. Kleijnen. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol. 2003;3 :25 Crossref
  • [14] C.H. Suh ,S.H. Park. Successful publication of systematic review and meta-analysis of studies evaluating diagnostic test accuracy. Korean J Radiol. 2016;17 :5-6 Crossref
  • [15] J.J. Deeks ,P. Macaskill ,L. Irwig. The performance of tests of publication bias and other sample size effects in systematic reviews of diagnostic test accuracy was assessed. J Clin Epidemiol. 2005;58 :882-893 Crossref
  • [16] T. Auer ,M. Edlinger ,J. Bektic ,et al. Performance of PI-RADS version 1 versus version 2 regarding the relation with histopathological results. World J Urol. 2017;35 :687-693 Crossref
  • [17] M. Baldisserotto ,E.J. Neto ,G. Carvalhal ,et al. Validation of PI-RADS v.2 for prostate cancer diagnosis with MRI at 3T using an external phased-array coil. J Magn Reson Imaging. 2016;44 :1354-1359 Crossref
  • [18] P. De Visschere ,N. Lumen ,P. Ost ,K. Decaestecker ,E. Pattyn ,G. Villeirs. Dynamic contrast-enhanced imaging has limited added value over T2-weighted imaging and diffusion-weighted imaging when using PI-RADSv2 for diagnosis of clinically significant prostate cancer in patients with elevated PSA. Clin Radiol. 2017;72 :23-32 Crossref
  • [19] El-Samei HAE-KA ,M.F. Amin ,E.E. Hassan. Assessment of the accuracy of multi-parametric MRI with PI-RADS 2.0 scoring system in the discrimination of suspicious prostatic focal lesions. Egypt J Radiol Nucl Med. 2016;47 :1075-1082
  • [20] Z.Y. Feng ,L. Wang ,X.D. Min ,S.G. Wang ,G.P. Wang ,J. Cai. Prostate cancer detection with multiparametric magnetic resonance imaging: Prostate Imaging Reporting and Data System version 1 versus version 2. Chin Med J. 2016;129 :2451-2459
  • [21] M. Kasel-Seibert ,T. Lehmann ,R. Aschenbach ,et al. Assessment of PI-RADS v2 for the detection of prostate cancer. Eur J Radiol. 2016;85 :726-731 Crossref
  • [22] W.C. Lin ,V.F. Muglia ,G.E. Silva ,S. Chodraui Filho ,R.B. Reis ,A.C. Westphalen ,M.R.I. Multiparametric. of the prostate: diagnostic performance and interreader agreement of two scoring systems. Br J Radiol. 2016;89 :20151056 Crossref
  • [23] Martorana E, Pirola GM, Scialpi M, et al. Lesion volume predicts prostate cancer risk and aggressiveness: validation of its value alone and matched with Prostate Imaging Reporting and Data System score. BJU Int. In press. doi:10.1111/bju.13649 .
  • [24] F.V. Mertan ,M.D. Greer ,J.H. Shih ,et al. Prospective evaluation of the Prostate Imaging Reporting and Data System version 2 for prostate cancer detection. J Urol. 2016;196 :690-696 Crossref
  • [25] B.G. Muller ,J.H. Shih ,S. Sankineni ,et al. Prostate cancer: interobserver agreement and accuracy with the revised Prostate Imaging Reporting and Data System at multiparametric MR imaging. Radiology. 2015;277 :741-750 Crossref
  • [26] J.J. Park ,B.K. Park. Role of PI-RADSv2 with multiparametric MRI in determining who needs active surveillance or definitive treatment according to PRIAS. J Magn Reson Imaging. 2017;45 :1753-1759 Crossref
  • [27] S.Y. Park ,D.C. Jung ,Y.T. Oh ,et al. Prostate cancer: PI-RADS version 2 helps preoperatively predict clinically significant cancers. Radiology. 2016;280 :108-116 Crossref
  • [28] S. Polanec ,T.H. Helbich ,H. Bickel ,et al. Head-to-head comparison of PI-RADS v2 and PI-RADS v1. Eur J Radiol. 2016;85 :1125-1131 Crossref
  • [29] A.R. Rastinehad ,N. Waingankar ,B. Turkbey ,et al. Comparison of multiparametric MRI scoring systems and the impact on cancer detection in patients undergoing MR US fusion guided prostate biopsies. PLoS One. 2015;10 :e0143404 Crossref
  • [30] Rosenkrantz AB, Babb JS, Taneja SS, Ream JM. Proposed adjustments to PI-RADS version 2 decision rules: impact on prostate cancer detection. Radiology. In press. doi:10.1148/radiol.2016161124 .
  • [31] A. Stanzione ,M. Imbriaco ,S. Cocozza ,et al. Biparametric 3T magnetic resonance imaging for prostatic cancer detection in a biopsy-naive patient population: a further improvement of PI-RADS v2?. Eur J Radiol. 2016;85 :2269-2274 Crossref
  • [32] Tan N, Lin WC, Khoshnoodi P, et al. In-Bore 3-T MR-guided transrectal targeted prostate biopsy: Prostate Imaging Reporting and Data System version 2-based diagnostic performance for detection of prostate cancer. Radiology. In press. doi:10.1148/radiol.2016152827 .
  • [33] S. Tewes ,N. Mokov ,D. Hartung ,et al. Standardized reporting of prostate MRI: comparison of the Prostate Imaging Reporting and Data System (PI-RADS) version 1 and version 2. PLoS One. 2016;11 :e0162879 Crossref
  • [34] S. Washino ,T. Okochi ,K. Saito ,et al. Combination of PI-RADS score and PSA density predicts biopsy outcome in biopsy naive patients. BJU Int. 2017;119 :225-233 Crossref
  • [35] S. Woo ,S.Y. Kim ,J. Lee ,S.H. Kim ,J.Y. Cho. PI-RADS version 2 for prediction of pathological downgrading after radical prostatectomy: a preliminary study in patients with biopsy-proven Gleason Score 7 (3 + 4) prostate cancer . Eur Radiol. 2016;26 :3580-3587 Crossref
  • [36] C. Zhao ,G. Gao ,D. Fang ,et al. The efficiency of multiparametric magnetic resonance imaging (mpMRI) using PI-RADS Version 2 in the diagnosis of clinically significant prostate cancer. Clin Imaging. 2016;40 :885-888 Crossref
  • [37] M. de Rooij ,E.H. Hamoen ,J.J. Futterer ,J.O. Barentsz ,M.M. Rovers. Accuracy of multiparametric MRI for prostate cancer detection: a meta-analysis. Am J Roentgenol. 2014;202 :343-351 Crossref
  • [38] Z.K. Shah ,S.N. Elias ,R. Abaza ,et al. Performance comparison of 1.5-T endorectal coil MRI with 3.0-T nonendorectal coil MRI in patients with prostate cancer. Acad Radiol. 2015;22 :467-474 Crossref
  • [39] Y. Mazaheri ,H.A. Vargas ,G. Nyman ,O. Akin ,H. Hricak. Image artifacts on prostate diffusion-weighted magnetic resonance imaging: trade-offs at 1.5 Tesla and 3.0 Tesla. Acad Radiol. 2013;20 :1041-1047 Crossref
  • [40] J.J. Deeks. Systematic reviews in health care: systematic reviews of evaluations of diagnostic and screening tests. BMJ. 2001;323 :157-162 Crossref
  • [41] J.J.W.S. Deeks ,C. Davenport. Guide to the contents of a Cochrane diagnostic test accuracy protocol. J.J. Deeks, P.M. Bossuyt, C. Gatsonis. Cochrane handbook for systematic reviews of diagnostic test accuracy version 1.0.0. Chapter 4. (The Cochrane Collaboration, 2013) http://srdta.cochrane.org/