In-depth Retrospective Review of Originally Negative Screening Mammograms from Women with Confirmed Breast Cancer

Objectives: We aim to contribute to the assessment of the screening performance in Flanders (Belgium) and to identify valuable mammograms for subsequent studies and training. Materials and Methods: Initially negative prior screening mammograms (sMx) of 210 women with confirmed breast cancer detected by the Flemish screening programme between 2011–2013 were reviewed by a highly experienced radiologist. The review of the prior sMx was performed in three steps: 1) only prior mammograms available; 2) with index sMx (=subsequent positive sMx) present; 3) with index sMx and clinical information present. Results: The radiological review yielded 94 (45%) mammograms ‘without suspicious lesions’, 77 (37%) ‘with minimal signs in at least one breast’, and 39 (19%) ‘with clearly visible tumours’. In univariate analyses, the reclassification of prior sMx was significantly associated with the date of the prior sMx, the need for a third reader for arbitration, image quality and the detector system used (computed radiography versus direct readout digital radiography), and it was not associated with the interval between screening rounds, age at prior sMx, breast density, or tumour characteristics (<T2 versus ≥T2, in situ versus invasive). In multivariate analyses, the date of the prior sMx (p = 0.001), need for arbitration (p = 0.001) and image quality (p < 0.001) remained significantly associated with the reclassification. Conclusion: This retrospective review reclassified 19% of the sMx as clearly visible tumours. With this, the Flemish screening programme performs in accordance with similar studies. The sMx reviewed in this study, form a valuable set of mammograms for training and further research.


INTRODUCTION
Breast cancer screening programmes have substantially increased the number of early detected cancers [1]. However, studies have made clear that current screening programmes only capture about 70% of all breast cancers that occur in participating women [2][3][4].
To improve cancer detection by mammography screening the European guidelines advise quality control using predefined performance indicators and quality assurance including review and training. An important performance indicator is rating interval cancers (breast cancers arising after a negative screening episode and before the next scheduled screening round). Performing a radiological review of prior screening mammograms (sMx) of interval cancers is part of the quality assurance and also an important teaching tool [1]. Screen-detected cancers have different characteristics than interval cancers [3,5], and it is therefore useful to also review the priors of screen-detected cancers in order to improve the programme's quality [1,6].
This study comprises a review of confirmed breast cancer cases detected by the Flemish screening programme. The aims were to quantify the proportion of visible tumours on the prior sMx and to gather insight into associated variables that may hinder cancer detection, such as breast density, age, image quality, imaging technique, tumour size, type of tumour, need of arbitration, screening interval, and date of prior sMx. The study also aimed to identify a valuable set of sMx for training and subsequent studies.

MATERIALS AND METHODS
In the breast cancer screening programme in Flanders, biennial two-view mammographic screening is offered free of charge to women aged 50-69 years. Two radiologists (first and second reader) independently evaluate the screening mammograms, with third reader arbitration if needed.
Between 2009-2013, 254,350 women participated in the Flemish Breast Cancer Screening Programme. From this group, cases for review were selected based on the following inclusion criteria: 1) informed consent for use of data in scientific research, 2) participation in minimum two consecutive screening rounds, 3) a screening interval of 16-30 months, 4) the index sMx (latest sMx) in 2011, 2012, or 2013 resulted in a referral for further diagnostic workup confirming and correctly documenting breast cancer, 5) where the prior sMx (previous sMx) was considered negative, 6) where the index and prior sMx were digital and available in the PACS (Picture Archiving and Communication System) at the Centre for Prevention and Early Detection of Cancer. In total, 292 cases met these inclusion criteria. From those a predefined sample size of 210 was selected by standard SPSS algorithms for random selection.
The 210 prior sMx were thoroughly reviewed by a single, highly experienced radiologist (reading > 10,000 sMx/ year since 2006). The review followed a stepwise procedure: 1) review of prior sMx, in the absence of other images or information, 2) review of prior sMx with index sMx (subsequent positive screening mammogram) present, and 3) review of prior sMx, where index sMx and clinical information on tumour localization and characteristics (size, type, and stage) from diagnostic follow up were present. All steps were performed per case in succession. The expert radiologist reviewed all prior sMx for the presence of malignancy, the image quality, and breast density. The reviewing radiologist was not informed of the purpose of the study.
Because of the limited number of clearly visible tumours in the intermediate and final classification, bootstrap validation with bias correction and accelerated bootstrap interval was performed. Statistical significance was set at p < 0.05.
In the multivariate analysis, the group of clearly visible tumours was first compared with the compound group of minimal and no signs, subsequently the group of clearly visible tumours was compared with the no signs group only. Table 1 lists data from prior and index sMx and diagnostic follow up.

DESCRIPTIVE CHARACTERISTICS OF SMX
The sMx dataset contained images of 102 left, 103 right, and 5 bilateral breast cancers.

EXPERT REVIEW OF PRIOR SMX
The results of the expert review are summarized in Table 2.
By reviewing prior sMx alone (step 1), 24 of the sMx (11.4%) were labelled 'probably malignant' and might have been referred. The intermediate classification (step 2), prior sMx with index sMx present, identified 25 cases (11.9%) with 'clearly visible tumours'. The final classification of prior sMx (step 3), including the use of index images and clinical information, revealed 39 'clearly visible tumours' (18.6%).

UNIVARIATE ANALYSES
The intermediate classification was significantly associated with the date of prior sMx (p =< 0.001) and the need of arbitration on the prior sMx (p = 0.002). The final classification was significantly associated with the date of the prior sMx (p =< 0.001); the need of arbitration (p = 0.004), also with the image quality (p = 0.004) and the detector system used (CR versus DR) (p = 0.036). See Table 3. More 'clearly visible tumours' were detected in older sMx, sMx that required arbitration, in sMx of inferior quality, and in those using CR-technique.  Table 2 The results of the expert review of the prior sMx.

MULTIVARIATE ANALYSES
When clearly visible tumours were compared to the compound group of minimal and no signs, the need of arbitration on the prior sMx (p = 0.005) and the date of the prior images (p = 0.044) were independently significantly associated with false negative clearly visible tumours in step 2 (i.e., only using prior and index images). When clearly visible tumours were compared only to the group of no signs, the significance level for the need of arbitration (p = 0.001) and date of priors (p = 0.004) appeared even higher.
In step 3, the final classification (i.e., with prior and index images and clinical information available), the need of arbitration (p = 0.001) and the date of the prior images (p = 0.006) were still independently significantly associated with false negative clearly visible tumours. Furthermore, the image quality was statistically significant (p < 0.001). These conclusions held, whether comparing to the compound group of minimal and no signs or only to the no signs group. See Table 4.
All statistically significant associations were confirmed by bootstrap validation.

DISCUSSION
This review of a substantial set of 'initially negative' prior sMx resulted in 39 (19%) being labelled as 'clearly visible tumours'. This result is in accordance with similar studies [6,7]. It concerns tumours missed twice during the normal screening procedure (by the first and second reader, or if arbitration was necessary, by the third reader and one of first two readers) and are therefore very valuable for training. The 19% missed tumours cannot automatically be considered 'screening errors', for several reasons:

the proportion of cases with 'clearly visible tumours'
based on image review alone was 1/3 lower, at 12%. The availability of clinical information is known to alter the reading outcome [8,9]. 2. Even if we tried to reproduce the conditions of routinely assessing sMx in the screening programme, the radiologist's attention was presumably triggered by the clustering of challenging image sets, the slightly different protocol form, the specific categorization, and the stepwise assessment for the review [3,9]. 3. The normal response of the human mind to low probability events (i.e., the low prevalence of cancer in the sMx) can be a substantial contributor to false negative errors in breast cancer screening [4].
Therefore, the clustering of challenging sMx in this study may have affected the reader's awareness and the results of the review. The image quality was significantly associated with the final categorisation of clearly visible tumours. This confirms the importance of a good image quality and therefore requires special attention [1].
In order to obtain a sufficient number of prior sMx we had to include sMx from the early stages of digital mammography screening in Flanders. The 'date of screening' effect may reflect a learning curve for the radiologists involved in the screening programme.
In several studies, DR detector systems seem to be superior to CR detector systems, also in clinical screening  Table 4 Multivariate analyses: Variables associated with the interim or final classification after reviewing prior mammograms.
performance. Often higher sensitivity is found with higher cancer detection rates and less interval cancers, especially in dense breasts [2,10,11]. Since this review was performed by a single -albeit highly experienced -radiologist, the results of this retrospective review could not be corrected for interobserver variability. This is a major limitation of this study.
The screening mammograms assessed in this review are valuable for training and subsequent studies.

DATA ACCESSIBILITY STATEMENT
All relevant documentation or data in order to verify the validity of the results presented is available, but not openly. Due to the nature of this research, participants of this study did not agree for their data to be shared publicly.

ETHICS AND CONSENT
All participants gave their written informed consent for the Breast Cancer screening programme in Flanders, including its quality assessment. This research project was approved by the Ethics Committee of Ghent University hospital (B670201318961).