Artificial intelligence for diagnosis and grading of prostate cancer in biopsies: a population-based, diagnostic study

Peter Ström,
Kimmo Kartasalo,
Henrik Olsson,
Leslie Solorzano,
Brett Delahunt,
Daniel M Berney,
David G Bostwick,
Andrew J Evans,
David J Grignon,
Peter A Humphrey,
Kenneth A Iczkowski,
James G Kench,
Glen Kristiansen,
Theodorus H van der Kwast,
Katia R M Leite,
Jesse K McKenney,
Jon Oxley,
Chin-Chen Pan,
Hemamali Samaratunga,
John R Srigley,
Hiroyuki Takahashi,
Toyonori Tsuzuki,
Murali Varma,
Ming Zhou,
Johan Lindberg,
Cecilia Lindskog,
Pekka Ruusuvuori,
Carolina Wählby,
Henrik Grönberg,
Mattias Rantalainen,
Lars Egevad,
Martin Eklund

Publication: Lancet Oncology, January 2020

https://www.thelancet.com/journals/lanonc/article/PIIS1470-2045(19)30738-7/fulltext#articleInformation

Background

An increasing volume of prostate biopsies and a worldwide shortage of urological pathologists puts a strain on pathology departments. Additionally, the high intra-observer and inter-observer variability in grading can result in overtreatment and undertreatment of prostate cancer. To alleviate these problems, we aimed to develop an artificial intelligence (AI) system with clinically acceptable accuracy for prostate cancer detection, localisation, and Gleason grading.

Methods

We digitised 6682 slides from needle core biopsies from 976 randomly selected participants aged 50–69 in the Swedish prospective and population-based STHLM3 diagnostic study done between May 28, 2012, and Dec 30, 2014 (ISRCTN84445406), and another 271 from 93 men from outside the study. The resulting images were used to train deep neural networks for assessment of prostate biopsies. The networks were evaluated by predicting the presence, extent, and Gleason grade of malignant tissue for an independent test dataset comprising 1631 biopsies from 246 men from STHLM3 and an external validation dataset of 330 biopsies from 73 men. We also evaluated grading performance on 87 biopsies individually graded by 23 experienced urological pathologists from the International Society of Urological Pathology. We assessed discriminatory performance by receiver operating characteristics and tumour extent predictions by correlating predicted cancer length against measurements by the reporting pathologist. We quantified the concordance between grades assigned by the AI system and the expert urological pathologists using Cohen’s kappa.

Findings

The AI achieved an area under the receiver operating characteristics curve of 0·997 (95% CI 0·994–0·999) for distinguishing between benign (n=910) and malignant (n=721) biopsy cores on the independent test dataset and 0·986 (0·972–0·996) on the external validation dataset (benign n=108, malignant n=222). The correlation between cancer length predicted by the AI and assigned by the reporting pathologist was 0·96 (95% CI 0·95–0·97) for the independent test dataset and 0·87 (0·84–0·90) for the external validation dataset. For assigning Gleason grades, the AI achieved a mean pairwise kappa of 0·62, which was within the range of the corresponding values for the expert pathologists (0·60–0·73).

Interpretation

An AI system can be trained to detect and grade cancer in prostate needle biopsy samples at a ranking comparable to that of international experts in prostate pathology. Clinical application could reduce pathology workload by reducing the assessment of benign biopsies and by automating the task of measuring cancer length in positive biopsy cores. An AI system with expert-level grading performance might contribute a second opinion, aid in standardising grading, and provide pathology expertise in parts of the world where it does not exist.

Funding

Swedish Research Council, Swedish Cancer Society, Swedish eScience Research Center, EIT Health.

In order to determine whether someone has prostate cancer or not, a doctor will take a prostate biopsy. This biopsy is read under microscopic examination by the doctor to differentiate harmless from malignant tissue. Not only does the doctor base her/his diagnosis on this tissue, he/she also uses it to develop a treatment plan. However, logical human inaccuracies originating from a simple situation such as one doctor drawing different conclusions from observing the same material more than once may lead to over- or undertreatment of prostate cancer.

In this study, for the first time ever, the authors evaluated a tool which we only knew from science fiction stories until recently: Artificial Intelligence (AI). The AI-assisted prostate evaluation serves the same function as the doctor’s microscopic examination. The authors found that the AI system performed similarly to experienced doctors in detecting prostate cancer and grading the malignant tissue. They concluded that the implementation of such an AI system in the daily routine could reduce the number of missed cancer diagnoses and could decrease the doctor’s workload.

Dr. Ploussard

The performance of the pathology assessment of prostate biopsies for the diagnosis of prostate cancer is influenced by a high inter-observer and intra-observer variability in grading between pathologists. This can result in overtreatment and undertreatment of prostate cancer. However, to date, no automate estimation of tumour burden in biopsies has been reported.

In the present series, the authors used the prospective, population-based STHLM3 diagnostic study in Sweden to evaluate the usefulness of an artificial intelligence (AI) system for diagnostic and grading purposes. In that study, patients under went a 10-12-core, systematic biopsy scheme.

The deep neural networks were trained on needle core biopsies from more than one thousand of participants in order to assess the presence, grading, and extent of malignant tissue. Then, the performance of the AI system was tested in an independent dataset comprising 1361 biopsies from 246 men from the STHLM3 study and an external validation was done in a dataset of 330 biopsies from 73 men. The grading performance was evaluated by 23 experienced urological pathologists. Concordance between expert pathologists and AI system assessment was quantified by a Cohen’s kappa index.

The authors found that the trained AI system was able to achieved high performance levels in the detection of prostate cancer. The area under the receiver operating characteristics was 0.997 and 0.986 in the independent and the external validation datasets. The correlation coefficient for the cancer length assessment was 0.96 (0.87 in the external validation dataset). For Gleason grading, the mean pairwise kappa achieved by the AI system was 0.62 which was comparable to that obtained by expert pathologists (values ranging from 0.60 to 0.73).

This study reports for the first time the potential benefits of an AI-assisted prostate pathology evaluation. The algorithm obtained reached a performance comparable to experienced pathologists in the detection, tumour burden estimation, and grading of the malignant tissue.

Thus, an AI system could assist pathologists in routine practice by decreasing workload by an automated pre-screening, by improving the detection of prostate cancer, and by reducing variability in grading. Prospective validation is warranted.