Evaluation of Abdominal CT Obtained Using a Deep Learning-Based Image Reconstruction Engine Compared with CT Using Adaptive Statistical Iterative Reconstruction

Purpose: To compare the image quality of CT obtained using a deep learning-based image reconstruction (DLIR) engine with images with adaptive statistical iterative reconstruction-V (AV). Materials and Methods: Using a phantom, the noise power spectrum (NPS) and task-based transfer function (TTF) were measured in images with different reconstructions (filtered back projection [FBP], AV30, 50, 100, DLIR-L, M, H) at multiple doses. One hundred and twenty abdominal CTs with 30% dose reduction were processed using AV30, AV50, DLIR-L, M, H. Objective and subjective analyses were performed. Results: The NPS peak of DLIR was lower than that of AV30 or AV50. Compared with AV30, the NPS average spatial frequencies were higher with DLIR-L or DLIR-M. For lower contrast objects, TTF in images with DLIR were higher than those with AV. The standard deviation in DLIR-H and DLIR-M was significantly lower than AV30 and AV50. The overall image quality was the best for DLIR-M (p < 0.001). Conclusions: DLIR showed improved image quality and decreased noise under a decreased radiation dose.

Recently, image denoising algorithms using artificial neural networks, termed deep learning-based denoising algorithms (DLA), have been developed to overcome the drawbacks of IR [11,12]. Shin et al. showed that although their DLAs achieved less noise than filtered back projection (FBP) and advanced modeled iterative reconstruction (ADMIRE) in low-dose CT, they did not maintain spatial resolution [13]. Jensen et al. reported that TrueFidelity, a type of DLA, improves image quality through noise reduction and increased contrast-to-noise ratio (CNR) in routine-dose CT [14].
Therefore, this study aimed to assess the quality, including noise and spatial resolution, of phantom and abdominal CT with decreased radiation dose using a deep learning-based image reconstruction (DLIR) engine (TrueFidelity, GE Healthcare) with CT using AV, commonly used in abdominal CT.

PHANTOM STUDIES
The raw data were reconstructed in seven different axial images: FBP and ASIR-V with blending factors of 30%, 50%, or 100% (AV30, AV50, and AV100, respectively). The noise power spectrum (NPS), calculated by the standard Fourier transform technique, determined the amount of noise (magnitude) and noise characteristics (texture) in the spatial frequency domain [15][16][17]. To measure the NPS, we calculated the peak average spatial frequency of module 3 of the American College of Radiology (ACR) phantom (Gammex 464, Sun Nuclear, Middleton, WI, USA) at multiple doses (Figure 1). Computed tomography (CT) was performed using following parameters: peak kilovoltage (kVp), 100; beam collimation, 0.625 × 64mm; tube current modulation range 50-250 mAs. The taskbased transfer function (TTF) is a representative metric of spatial resolution [13]. We measured TTF in two materials (bone and acrylic) in module 1. To quantify TTF, the spatial frequency (TTF 50% ) was calculated at the point where the Y-axis value became 0.5 in the measured TTF curve. The NPS was implemented and calculated using MATLAB (Version R2017a, The MathWorks, Inc., Natick, MA, USA), and the TTF used imQuest (Duke University) software implemented in MATLAB.

PATIENT STUDIES
This retrospective study was approved by the Institutional Review Board. Two hundred and three patients had undergone abdominal CT (Revolution CT; GE Healthcare) from February 2020 to April 2020. CT scans with 70 different combination of reconstructions, eight large hepatic lesions > 2 cm, and five poor image quality were excluded. The CT of 120 individuals were retrospectively reviewed ( Table 1). The mean body mass index of patients in this study was 23.6 ± 3.6 (SD).

CT EXAMINATION AND POSTPROCESSING
All patients underwent abdominal CT using a CT system (Revolution, GE Healthcare) that could reconstruct both the AV and DLIR engines. CT was performed using the following parameters: peak kilovoltage (kVp), 100; beam collimation, 0.625 × 128 mm; tube current modulation range 100-550 mAs; noise index, 17; gantry rotation time, 0.6 s; coverage speed, 132.29 mm/s; pitch, 0.992:1; and slice thickness, 2.5 mm. The mean volume CT dose index was 5.06 ± 1.85 (SD) mGy, and the mean dose length product (DLP) was 281.29 ± 92.69 (SD) mGy. cm. A nonioninated contrast medium (Ioversol 320 mg/ mL; 2 mL/kg body weight) was administered for contrast enhancement. The timing of the portal venous phase scan was a fixed time-delay technique of 90 s after contrast administration. The raw data were reconstructed in six different reconstructions: FBP, AV30, AV50, and DLIR (DLIR-Low, DLIR-Medium, and DLIR-High).

QUANTITATIVE ANALYSIS
One radiologist placed three circular ROIs to measure the mean attenuation (HU) and noise (SD) (Figure 2). Three ROIs were placed within the liver right lobe of right portal vein level, abdominal aorta below both renal artery branches, and subcutaneous fat in right buttock. Each ROI was noted to avoid confounding structures, such as large vessels.  was held for the participating radiologists. Readers were blinded to reconstruction methods and the order of image sets was randomized for each patient. Each reader independently graded the pair-wise approach using a two-monitor high-resolution PACS workstation (EIZO RX 240). The results of one radiologist were used, and those of the other were used to evaluate the interreader agreement. Each image set was ranked against one another on a comparative scale for overall image quality, image noise, and image sharpness. A score of 5 was assigned to the images with the best quality. The image sharpness was rated in the evaluation of the liver parenchyma, the pancreas contour, and the kidneys.

PHANTOM STUDIES
The CTDI vol (mGy) was 2.1, 4.2, 6.3, 8.4, and 10.5. The NPS peak decreased in the order of DLIR-L, M, H. Overall, the NPS peak of DLIR was smaller than that of AV30 or AV50 ( Table 2).
The highest values of the NPS average spatial frequency were obtained for FBP. The NPS spatial frequency decreased as the percentage of AV factor increased and decreased as the DLIR level increased (Figure 3). Compared with AV30, the NPS spatial frequencies were 5 to 10% higher with DLIR-L or DLIR-M. Compared with AV50, the NPS spatial frequencies were 10 to 20% higher for all DLIR levels.
For lower-contrast objects, TTF values in images with DLIR were higher than those with AV ( Table 3). The differences in TTF were greater at low doses. For highercontrast objects, TTF values did not show significant differences between images with DLIR and those with AV.

PATIENT STUDIES
The mean HU showed no significant difference between the six different reconstructions. The SD of the liver and  aorta showed significant differences (p < 0.001) ( Table 4). The SD of fat showed significant differences in different protocols, except between AV50 and DLIR-L (p < 0.001).
A higher factor in AV (AV30 < AV50) and higher strength in DLIR (DLIR-L<DLIR-M<DLIR-H) showed significantly lower SD. Comparison of DLIR images with AV images showed that the SD in DLIR-H and DLIR-M was 10 to 50 % lower than both AV30 and AV50 (p < 0.001).

QUALITATIVE ANALYSIS
Five reconstruction protocols showed significant differences (p < 0.001). The overall image quality was the best for the DLIR-M (p < 0.001) (

DISCUSSION
Our study demonstrated that CT reconstructed with DLIR showed lower noise magnitude and noise texture and image sharpness similar to those with FBP using a phantom and abdominal CT comparing those with AV30 or AV50. The DLIR was designed to differentiate the signal from noise without changing its texture [18]. In the phantom study, DLIR images with any level showed decreased noise magnitude compared with images with AV30 or 50, which are commonly used in clinical settings for abdominal CT. According to NPS spatial frequency, images with all DLIR levels showed better texture, similar to those with FBP, compared with those of AV50 or AV100. Moreover, images DLIR-L or M showed better texture with those of AV30 and DLIR-H results comparable to those of AV30.
For lower-contrast objects, images with DLIR showed better image sharpness than those with AV. For highercontrast objects, there were no significant differences between the AV and DLIR images. Previous studies reported that the image sharpness between DLIR and AV50, AV100 was greater for low-contrast objects; however, it also showed differences for high-contrast objects [19]. As our study did not include extremely low doses, different results were obtained.
In the patient study, the measurement of noise with DLIR-M or DLIR-H had lower noise than that with AV30, AV50. CT with DLIR-L did not show significantly different noise compared to AV50. These results were different from those of our phantom study, which showed significantly lower noise in the DLIR-L images.  100 mAs (b), 150 mAs (c), and 200mAs (d). FBP, filtered back projection; AV30, and AV50 = ASIR-V with a blending factor of 30% and 50%, respectively; DLIR-L, DLIR-M, and DLIR-H, a deep learning-based image reconstruction with low, medium, or high levels, respectively; NPS, noise power spectrum.  Table 3 TTF-50s (mm-1) of the 25% ACR phantom CT according to different discs (bone; 955 HU, acrylic; 120 HU) and reconstructions. TTF, task-based transfer function; ACR, American College of Radiology; FBP, filtered back projection; AV30, and AV50 = ASIR-V with blending factors of 30%, and 50%, respectively; DLIR-L, DLIR-M, and DLIR-H, a deep learning-based image reconstruction with low, medium, or high levels, respectively. Table 4 Mean image noise (HU) according to the image reconstruction method.
Data are presented as mean ± standard deviation. The subscripts represent the same group of post hoc analysis (alphabetical order indicates the order, starting from the lowest mean value). P-values were calculated using repeated-measures ANOVA among the six groups.
FBP, filtered back projection; AV30, ASIR-V with a blending factor of 30%; AV50, ASIR-V with a blending factor of 50%; DLIR-L, DLIR-M, and DLIR-H, deep learning-based image reconstruction images with low, medium, or high strength levels, respectively; HU, Hounsfield unit; SD, standard deviation. In the qualitative analysis, DLIR effectively eliminated noise. Jenson et al. showed that readers evaluated images with DLIR-H as the best overall image quality [14]. The authors performed CT with a noise index of 10 [14]. In this study, we performed CT with the noise index of 17. CT with DLIR-M showed the best overall image quality, although DLIR-H showed lower noise. This could be due to image sharpness and texture characteristics. In the phantom study, compared with AV30, NPS spatial frequency were higher with DLIR-L and DLIR-M. It did not show statistically significant differences with DLIR-H. In patient studies, the evaluation of spatial resolution showed a fair inter-reader agreement. Further research is needed on this. The time required for reconstruction is similar between DLIR and AV. Our study showed that DLIR is sufficient for reconstruction as the first option in daily practice.

RECONSTRUCTION
The present study had several limitations. First, the phantom we used is not in conditions that are very close to the human body. Acrylic insert is a material with a lower HU than bone, and we thought that it could replace the material between water and bone. Further studies are needed for low-contrast materials. Second, this study did not compare the diagnostic capabilities.
In conclusion, phantom data suggests that DLIR showed improved spatial resolution, FBP-like image texture, and effective noise reduction under a decreased radiation dose. Patient data suggests that DLIR showed effective noise reduction while preserving image quality. DLIR-M showed better rankings in both image quality and image sharpness comparing AV-30 or AV-50 in abdominal CT.   Table 5 Image quality assessment ranking of the image reconstruction methods Data are mean ranking score ± standard deviation. FBP, filtered back projection; AV30, ASIR-V with a blending factor of 30%; AV50, ASIR-V with a blending factor of 50%; DLIR-L, DLIR-M, and DLIR-H, a deep learning-based image reconstruction image with low, medium, or high strength levels.