Context
In 2015, the updated Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for the detection of prostate cancer (PCa) was established. Since then, several studies assessing the value of PI-RADSv2 have been published.
Objective
To review the diagnostic performance of PI-RADSv2 for the detection of PCa.
Evidence acquisition
MEDLINE and EMBASE databases were searched up to December 7, 2016. We included diagnostic accuracy studies that used PI-RADSv2 for PCa detection, using prostatectomy or biopsy as the reference standard. The methodological quality was assessed by two independent reviewers using the Quality Assessment of Diagnostic Accuracy Studies-2 tool. Sensitivity and specificity of all studies were calculated. Results were pooled and plotted in a hierarchical summary receiver operating characteristic plot with further exploration using meta-regression and multiple subgroup analyses. Head-to-head comparison between PI-RADSv1 and PI-RADSv2 was performed for available studies.
Evidence synthesis
Twenty-one studies (3857 patients) were included. The pooled sensitivity was 0.89 (95% confidence interval [CI] 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83) for PCa detection. Proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting heterogeneity ( p < 0.01). Multiple subgroup analyses showed consistent results. In six studies performing head-to-head comparison, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity was not significantly different (0.73 [95% CI 0.47–0.89] vs 0.75 [95% CI 0.36–0.94], respectively; p = 0.90).
Conclusions
PI-RADSv2 shows good performance for the detection of PCa. PI-RADSv2 has higher pooled sensitivity than PI-RADSv1 without significantly different specificity.
Patient summary
We reviewed all previous studies using Prostate Imaging Reporting and Data System version 2 (PI-RADSv2) for prostate cancer detection. We found that the updated PI-RADSv2 shows significant improvement compared with the original PI-RADSv1.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.
Prostate cancer (PCa) is the second leading cause of cancer-related mortality in Western men [1]
To bring standardization to the evaluation and reporting of mpMRI of the prostate, the European Society of Urogenital Radiology (ESUR) published a guideline termed Prostate Imaging Reporting and Data System (PI-RADS) in 2012 [5]
To address these issues, the ESUR and American College of Radiology recently released the updated PI-RADS version 2 (PI-RADSv2) [11]
Since then, several studies assessing the value of PI-RADSv2 have been published. However, the diagnostic performance of this new scoring system has not been evaluated systematically. Therefore, the purpose of our study was to assess the diagnostic performance of PI-RADSv2 for the detection of PCa. In addition, we aimed to compare the diagnostic performance of PI-RADSv1 and PI-RADSv2 in studies available for head-to-head comparison.
This meta-analysis was performed and written according to the preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines [12]
A computerized search of MEDLINE and EMBASE up to December 7, 2016, was performed in order to identify studies evaluating the diagnostic performance of PI-RADSv2 for the detection of PCa. The search query combined synonyms for PCa, MRI, and PI-RADS as follows: (prostate cancer OR prostatic cancer OR prostate neoplasm OR prostatic neoplasm OR prostate tumor OR prostatic tumor OR prostate carcinoma OR prostatic carcinoma OR PCa) AND (magnetic resonance imaging OR MRI OR MR) AND (prostate imaging reporting and data system OR pi-rads OR pi rads OR pirads). Bibliographies of identified articles were also screened to expand the scope of search. Our search was limited to publications in English.
Studies were included if they satisfied all the following requirements according to the PICOS criteria [12]
Studies were excluded if any of the following criteria were met: (1) studies involving <10 patients; (2) review articles, guidelines, consensus statements, letters, editorials, and conference abstracts; (3) studies using only PI-RADSv1 for the evaluation of mpMRI of the prostate; (4) studies focusing on topics other than using the PI-RADSv2 system for diagnosing PCa (ie, staging and prediction of biochemical recurrence); and (5) studies with overlapping patient population.
Two reviewers (S.W. and C.H.S., with 3 yr of experience in performing systematic reviews and meta-analyses) independently evaluated the eligibility of the selected studies from the literature. Disagreements, if present between the two reviewers, were resolved by consensus via discussion with a third reviewer (S.Y.K.).
We extracted the following data regarding study design and results from the selected studies using a standardized form:
Study characteristics—authors, year of publication, country and institution of origin, duration of patient recruitment, and study design (prospective vs retrospective and consecutive or not)
Demographic and clinical characteristics—sample size, number of patients with PCa, patient age, prostate-specific antigen (PSA) level and Gleason score, number of previous biopsies, and PCa diagnosis prior to mpMRI
Technical characteristics of mpMRI—scanner model and manufacturer, magnetic field strength (1.5 vs 3 T), coil type (endorectal vs pelvic phased array), and specific sequences used (T2WI, DWI, DCE, or MR spectroscopy)
Interpretation of mpMRI—number of reviewers and experience in prostate mpMRI, independent or consensus reading, and blinding to clinicopathological information
Reference standard—type of reference standard (radical prostatectomy, targeted biopsy, or systematic biopsy), interval between MRI and pathology, outcomes assessed (any PCa vs csPCa), definition of csPCa (studies assessing “clinically significant”, “aggressive”, or “high-grade” PCa were all considered to assess csPCa; however, only studies that used the definition as provided by the PI-RADSv2 guideline [Gleason score >7 [3 + 4], volume >0.5 ml, or extraprostatic extension] were considered not to have concern for applicability), separate analysis for the PZ and TZ, and type of analysis (per patient vs per lesion)
Diagnostic performance of PI-RADSv2 including criteria or cutoff values (in case of multiple readers, the results of the most experienced reader were extracted for this meta-analysis)
The methodological quality of the included studies was assessed using tailored questionnaires and criteria provided by Quality Assessment of Diagnostic Accuracy Studies-2 [13]
The diagnostic performance of PI-RADSv2 for the detection of PCa was the primary outcome for this meta-analysis. In addition, a comparison between the diagnostic performance of PI-RADSv2 and that of PI-RADSv1 using studies that reported head-to-head comparison data of the two PI-RADS versions was considered a secondary outcome.
Pooled estimates of sensitivity and specificity were calculated using hierarchical logistic regression modeling including bivariate modeling and hierarchical summary receiver operating characteristic (HSROC) modeling [14]
We performed meta-regression analyses to investigate the cause of heterogeneity. The following covariates were considered for the bivariate model: (1) proportion of patients with PCa (>50% vs ≤50%), (2) magnet strength of MRI (3 vs 1.5 T), (3) use of endorectal coil, (4) cutoff value (≥4 vs ≥3), (5) reference standard (radical prostatectomy vs biopsy), and (6) type of analysis (per patient vs per lesion). In addition, multiple subgroup analyses were performed for cutoff value, outcome, and previous biopsy history to assess various clinical settings: (1) a cutoff value of ≥4 for all studies, (2) a cutoff value of ≥3 for all studies, (3) a cutoff value of ≥4 for determining any PCa, (4) a cutoff value of ≥3 for determining any PCa, (5) a cutoff value of ≥4 for determining csPCa, (6) a cutoff value of ≥3 for determining csPCa, (7) a cutoff value of ≥4 in studies using per-patient analysis, (8) a cutoff value of ≥4 in studies using per-lesion analysis, (9) studies analyzing PZ PCa, (10) studies analyzing TZ PCa, (11) patients without previous biopsies, and (12) patients with previous biopsies. The “metandi” and “midas” modules in Stata 10.0 (StataCorp LP, College Station, TX, USA) and “mada” package in R software version 3.2.1 (R Foundation for Statistical Computing, Vienna, Austria) were used for statistical analyses, with p < 0.05 signifying statistical significance.
A systematic literature search initially identified 287 articles. After removing 46 duplicates, screening of the 241 titles and abstracts yielded 105 potentially eligible articles. Full-text reviews were performed, and 84 studies were excluded
for the following reasons: not in the field of interest ( n = 80, including 68 studies that used only PI-RADSv1), insufficient data to reconstruct 2 × 2 tables ( n = 2), and shared study population with other studies ( n = 2). Ultimately, 21 original articles including a total of 3857 patients assessing the diagnostic performance of PI-RADSv2 were included in the meta-analysis 16
Fig. 1
PRISMA flow diagram showing study selection process for meta-analysis. a Included original articles for qualitative and quantitative analyses are references 16
Patient characteristics are shown in Table 1 . The size of the study population ranged from 49 to 456 patients, with the percentage of
those with PCa ranging from 37% to 100%. The patients had a median age of 62–69.6 yr, median PSA of 3.97–15 ng/ml, and a Gleason score ranging from 5 to 10. Patients had already been diagnosed with PCa prior to MRI in all or some of the study
populations in seven studies [16
Table 1Patient characteristics
Characteristics of the studies are summarized in Table 2 . MRI was performed using 3-T scanners in 16 studies 16
a Blinded but aware that patients had biopsy-proven PCa.
b PI-RADSv2 scores were generated from reports based on PI-RADSv1 and simplified qualitative system.
c Epstein's criteria = Gleason pattern ≥4, or Gleason 3 + 3 disease with core length ≥50% and/or >2 cores positive.
d For transition zone, cutoff value = 4.
csPCa = clinically significant prostate cancer; GS = Gleason score; EPE = extraprostatic extension; MRGB = magnetic resonance imaging-guided biopsy; MRI = magnetic resonance imaging; MRI–TRUS = fusion of magnetic resonance imaging and transrectal ultrasound images; NR = not reported; PCa = prostate cancer; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2; PZ = peripheral zone; RP = radical prostatectomy; STRUSGB = systematic transrectal ultrasound-guided biopsy; TTB = transperineal template biopsy; TZ = transition zone.Table 2Study characteristics
Overall, the quality of the studies was not considered high, mainly due to the patient selection domain ( Fig. 2 ). Regarding the patient
selection domain, there was generally a high risk of bias as all but four of the studies were retrospective in nature 16
The sensitivity and specificity of individual studies were 73–100% and 8–100%, respectively. The Q-test revealed that substantial heterogeneity was present ( p < 0.001). The Higgins I 2 statistics demonstrated substantial heterogeneity in terms of the sensitivity ( I 2 = 85.55%) and considerable heterogeneity in terms of the specificity ( I 2 = 95.30%). The coupled forest plot of the sensitivity and specificity demonstrated the absence of a threshold effect ( Fig. 3 ). The Spearman correlation coefficient between the sensitivity and the false-positive rate was 0.45 (95% confidence interval [CI] 0.023–0.738), also indicating the lack of a threshold effect.
Fig. 3
Coupled forest plots of pooled sensitivity and specificity. Numbers are pooled estimates with 95% CI in parentheses. Corresponding heterogeneity statistics are provided at bottom right corners. Horizontal lines indicate 95% CIs. CI = confidence intervals.
For all 21 studies combined, the pooled sensitivity was 0.89 (95% CI 0.86–0.92) with specificity of 0.73 (95% CI 0.60–0.83; Fig. 3 ). In the HSROC curve, there was a large difference between the 95% confidence region and the 95% prediction region, thus indicating heterogeneity between the studies ( Fig. 4 ). The area under the HSROC curve was 0.91 (95% CI 0.88–0.93). According to the Deeks’ funnel plot, the likelihood of publication bias was low, with a p value of 0.75 for the slope coefficient ( Fig. 5 ).
Fig. 4
Hierarchical summary receiver operating characteristic curve of the diagnostic performance of PI-RADSv2 for detecting prostate cancer. HSROC = hierarchical summary receiver operating characteristic; PI-RADSv2 = Prostate Imaging Reporting and Data System version 2.
Fig. 5
Deeks’ funnel plot. A p value of 0.75 suggests that the likelihood of publication bias is low.
For the six studies that provided a head-to-head comparison between PI-RADSv1 and PI-RADSv2, PI-RADSv2 demonstrated higher pooled sensitivity of 0.95 (95% CI 0.85–0.98) compared with 0.88 (95% CI 0.80–0.93) for PI-RADSv1 ( p = 0.04). However, the pooled specificity did not show a significant difference between the two versions of PI-RADS: 0.73 (95% CI 0.47–0.89) for v2 and 0.75 (95% CI 0.36–0.94) for v1 ( p = 0.90).
As we found a considerable heterogeneity among the included studies, meta-regression analyses were performed (Supplementary Table 1 ). Among several potential variables, proportion of patients with PCa, magnetic field strength, and reference standard were significant factors affecting the heterogeneity ( p < 0.01 for all). However, among these three, only specificity according to the proportion of patients with PCa showed a statistically significant and clinically meaningful difference: 0.65 (95% CI 0.52–0.78) in studies ≤50% of patients with PCa versus 0.86 (0.75–0.97) in studies with >50% of patients with PCa. Otherwise, no clinically meaningful differences were seen: for magnet strength (3 vs 1.5 T), sensitivity of 0.90 (95% CI 0.86–0.94) versus 0.89 (95% CI 0.81–0.97, p = 0.03) and specificity of 0.73 (95% CI 0.59–0.86) versus 0.72 (95% CI 0.44–1.00, p = 0.81); for reference standard (radical prostatectomy vs biopsy), sensitivity of 0.89 (95% CI 0.83–0.95) versus 0.91 (95% CI 0.88–0.94, p < 0.01) and specificity of 0.65 (95% CI 0.37–0.94) versus 0.73 (95% CI 0.58–0.87, p = 0.48). Other variables, including the cutoff value, use of endorectal coil, and type of analysis (per patient vs per lesion), were not significant factors ( p = 0.32–0.70).
As there were four studies that used both ≥3 and ≥4 as cutoff values [17
Based on the localization of PCa, the pooled sensitivity was 0.93 (95% CI 0.87–0.96) with specificity of 0.68 (95% CI 0.43–0.86) in seven studies analyzing PZ cancers [20
Studies including patients without previous biopsies yielded sensitivity of 0.82 (95% CI 0.72–0.90) and specificity of 0.75 (95% CI 0.65–0.83), whereas the diagnostic performance values were 0.87 (95% CI 0.80–0.92) and 0.71 (95% CI 0.42–0.89) in studies including patients with a history of previous biopsy.
In our meta-analysis, we assessed the diagnostic accuracy of PI-RADSv2 for detecting PCa. The pooled sensitivity and specificity of all 21 studies were 0.89 (95% CI 0.86–0.92) and 0.73 (95% CI 0.60–0.83), respectively. When comparing our data
with the only two existing meta-analyses using mpMRI for detecting PCa, a trend toward higher sensitivity and lower specificity can be inferred. In the study by de Rooij et al [37]
Considering that one of the main intentions for the generation of PI-RADS was to standardize reporting of mpMRI in order to decrease variability and bring about widespread acceptance and implementation in daily practice, it was promising to find
that nearly all (20 of 21) studies used PI-RADSv2 strictly according to published guidelines [11]
In the current study, subgroup analyses were performed to account for differences in outcomes (any cancer vs clinically significant cancer). There was no significant difference for using either outcome irrespective of whether the criteria of ≥3
or ≥4 were used. However, the definition of clinically significant cancer was different among the 13 studies. Only three studies defined csPCa strictly according to the PI-RADSv2 guidelines (Gleason score >7 [3 + 4], volume >0.5 ml,
or extraprostatic extension) [11]
In this meta-analysis, we looked into the technical aspects of MRI. Meta-regression analyses revealed that the use of endorectal coil was not a statistically significant factor. Furthermore, although magnet strength showed statistically significant
differences between 3 and 1.5 T, this did not reveal to be clinically meaningful (sensitivity of 0.90 vs 0.89, p = 0.03, respectively). Although there had been debate over these two issues in the past, both 3 and 1.5 T are now well
established, and the overall benefit of using an endorectal coil is not evident [38
Regarding the methods of analysis in the studies, there was significant heterogeneity regarding reference standard and type of analysis. Radical prostatectomy was the reference standard in five studies, while the majority were based on a combination of systematic and targeted biopsies. The possibility of PCa despite negative biopsy results in the latter group should be kept in mind. In addition, approximately half of the studies each reported outcomes in a per-patient ( n = 11) and per-lesion ( n = 10) Manner. Per-lesion analysis is known to take into account the performance of localizing the disease; however, this was not shown to be a significant factor in the meta-regression analysis.
Our meta-analysis had some limitations. Nearly all studies were retrospective in study design, resulting in a high risk of bias for patient selection. It is possible that pooling data from predominantly retrospective studies may have led to increased
diagnostic sensitivity [40]
PI-RADSv2 shows good performance for the detection of PCa with pooled sensitivity of 0.89 and specificity of 0.73. PI-RADSv2 has higher pooled sensitivity compared with PI-RADSv1 without significantly different specificity.
Author contributions: Sang Youn Kim had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Study concept and design: Woo, Suh, S.Y. Kim.
Acquisition of data: Woo, Suh, S.Y. Kim.
Analysis and interpretation of data: Woo, Suh, S.Y. Kim.
Drafting of the manuscript: Woo.
Critical revision of the manuscript for important intellectual content: Suh, S.Y. Kim, Cho, S.H. Kim.
Statistical analysis: Suh.
Obtaining funding: None.
Administrative, technical, or material support: None.
Supervision: S.Y. Kim, Cho, S.H. Kim.
Other: None.
Financial disclosures: Sang Youn Kim certifies that all conflicts of interest, including specific financial interests and relationships and affiliations relevant to the subject matter or materials discussed in the manuscript (eg, employment/affiliation, grants or funding, consultancies, honoraria, stock ownership or options, expert testimony, royalties, or patents filed, received, or pending), are the following: None.
Funding/Support and role of the sponsor: None.