Original Article
OPEN ACCESS

Hepatocellular Carcinoma Risk Stratification for Cirrhosis Patients: Integrating Radiomics and Deep Learning Computed Tomography Signatures of the Liver and Spleen into a Clinical Model

Rong Fan^1,#,*,
Ya-Ru Shi^1,#,
Lei Chen^2,#,
Chuan-Xin Wang³,
Yun-Song Qian⁴,
Yan-Hang Gao⁵,
Chun-Ying Wang⁶,
Xiao-Tang Fan⁷,
Xiao-Long Liu⁸,
Hong-Lian Bai⁹,
Dan Zheng¹⁰,
Guo-Qing Jiang¹¹,
Yan-Long Yu¹²,
Xie-Er Liang¹,
Jin-Jun Chen¹,
Wei-Fen Xie¹³,
Lu-Tao Du³,
Hua-Dong Yan⁴,
Yu-Jin Gao⁶,
Hao Wen¹⁴,
Jing-Feng Liu^8,15,
Min-Feng Liang⁹,
Fei Kong⁵,
Jian Sun¹,
Sheng-Hong Ju¹⁶,
Hong-Yang Wang^2,* and
Jin-Lin Hou^1,*

Author information

1Department of Infectious Diseases, Nanfang Hospital, Southern Medical University; Guangdong Provincial Key Laboratory for Prevention and Control of Major Liver Diseases; Guangdong Provincial Clinical Research Center for Viral Hepatitis; Key Laboratory of Infectious Diseases Research in South China, Ministry of Education, Guangzhou, Guangdong, China

2International Cooperation Laboratory on Signal Transduction, National Center for Liver Cancer, Eastern Hepatobiliary Surgery Institute/Hospital, Shanghai, China

3Department of Clinical Laboratory, The Second Hospital, Cheeloo College of Medicine, Shandong University, Jinan, Shandong, China

4Hepatology Department, Ningbo Hwamei Hospital, University of Chinese Academy of Sciences, Ningbo, Zhejiang, China

5The First Hospital of Jilin University, Changchun, Jilin, China

6Xuzhou Infectious Diseases Hospital, Xuzhou, Jiangsu, China

7Department of Hepatology, First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China

8The United Innovation of Mengchao Hepatobiliary Technology Key Laboratory of Fujian Province, Mengchao Hepatobiliary Hospital of Fujian Medical University, Fuzhou, Fujian, China

9The Department of Infectious Disease, The First People’s Hospital of Foshan, Foshan, Guangdong, China

10Department of Gastroenterology, The Central Hospital of Wuhan, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China

11Department of Hepatobiliary Surgery, Clinical Medical College, Yangzhou University, Yangzhou, Jiangsu, China

12Chifeng Clinical Medical School of Inner Mongolia Medical University, Chifeng, Inner Mongolia, China

13Department of Gastroenterology, Changzheng Hospital, Naval Medical University, Shanghai, China

14State Key Laboratory of Pathogenesis, Prevention and Treatment of High Incidence Diseases in Central Asia, First Affiliated Hospital of Xinjiang Medical University, Urumqi, Xinjiang, China

15Department of Hepatopancreatobiliary Surgery, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, Fujian, China

16Nurturing Center of Jiangsu Province for State Laboratory of AI Imaging & Interventional Radiology (Southeast University); Department of Radiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, Jiangsu, China

Journal of Clinical and Translational Hepatology 2025;13(9):743-753

DOI: 10.14218/JCTH.2025.00091

Abstract

Background and Aims

Given the high burden of hepatocellular carcinoma (HCC), risk stratification in patients with cirrhosis is critical but remains inadequate. In this study, we aimed to develop and validate an HCC prediction model by integrating radiomics and deep learning features from liver and spleen computed tomography (CT) images into the established age-male-ALBI-platelet (aMAP) clinical model.

Methods

Patients were enrolled between 2018 and 2023 from a Chinese multicenter, prospective, observational cirrhosis cohort, all of whom underwent 3-phase contrast-enhanced abdominal CT scans at enrollment. The aMAP clinical score was calculated, and radiomic (PyRadiomics) and deep learning (ResNet-18) features were extracted from liver and spleen regions of interest. Feature selection was performed using the least absolute shrinkage and selection operator.

Results

Among 2,411 patients (median follow-up: 42.7 months [IQR: 32.9–54.1]), 118 developed HCC (three-year cumulative incidence: 3.59%). Chronic hepatitis B virus infection was the main etiology, accounting for 91.5% of cases. The aMAP-CT model, which incorporates CT signatures, significantly outperformed existing models (area under the receiver-operating characteristic curve: 0.809–0.869 in three cohorts). It stratified patients into high-risk (three-year HCC incidence: 26.3%) and low-risk (1.7%) groups. Stepwise application (aMAP → aMAP-CT) further refined stratification (three-year incidences: 1.8% [93.0% of the cohort] vs. 27.2% [7.0%]).

Conclusions

The aMAP-CT model improves HCC risk prediction by integrating CT-based liver and spleen signatures, enabling precise identification of high-risk cirrhosis patients. This approach personalizes surveillance strategies, potentially facilitating earlier detection and improved outcomes.

Keywords

Hepatocellular carcinoma, Liver cirrhosis, Radiomics, Deep learning, Machine learning, Prediction algorithms

Introduction

Hepatocellular carcinoma (HCC) is a major global health challenge and ranks as the third leading cause of cancer-related deaths worldwide.¹ Most HCC cases develop in the context of liver cirrhosis. Regular surveillance for these patients enables early detection, diagnosis, and treatment, which enhances treatment efficacy and reduces mortality. However, current HCC surveillance strategies for cirrhotic patients, which rely on biannual ultrasound (US) and alpha-fetoprotein testing, have limited sensitivity, missing one-third of early-stage HCC cases.^2,3 This highlights the need for more effective risk stratification approaches to identify high-risk cirrhotic individuals and optimize monitoring, thereby improving the cost-effectiveness of HCC screening programs.

Current risk stratification models for predicting HCC have made significant progress.^4–8 Our group developed the age-male-ALBI-platelet (aMAP) score for HCC risk prediction using data from 11 global prospective cohorts of individuals with chronic hepatitis.⁹ However, the aMAP score, like other predictive models, faces challenges, particularly in patients with cirrhosis, where its performance is diminished (C-index of 0.74). To address this, our team recently developed the aMAP-2 Plus, which utilizes cell-free DNA (cfDNA) and has demonstrated excellent predictive performance in patients with cirrhosis.¹⁰ Despite its promise, the aMAP-2 Plus faces challenges due to the limited availability and high costs of cfDNA, potentially restricting its practical utility.^11,12 Therefore, efforts should focus on developing a non-invasive model using more accessible and advanced biomarkers as substitutes for cfDNA, thereby meeting the early-warning needs of cirrhosis patients.

Emerging imaging-based surveillance strategies show promise by capturing comprehensive gene expression patterns through advanced medical imaging modalities.¹³ Artificial intelligence (AI), particularly deep learning, a subset of machine learning, enables computers to learn from medical images, identify hidden patterns, and assist clinicians in the diagnosis and prognosis of liver disease.¹⁴ We hypothesize that a biomarker based on image signatures extracted through radiomics and deep learning could significantly enhance stratification performance. Integrating these advanced techniques into the aMAP score could provide more accurate and individualized risk assessments.

Given the potential of imaging biomarkers, selecting the optimal imaging modality, whether US, computed tomography (CT), or magnetic resonance imaging (MRI), is critical for generating reliable indicators. A meta-analysis of prospective cohorts shows that biphasic CT has significantly higher sensitivity than US for detecting very early-stage HCC, as US is highly operator-dependent.^15,16 Additionally, compared to MRI, CT is a more commonly used tool for diagnosing cirrhosis, assessing decompensation, and evaluating the risk of HCC progression.¹⁷

Therefore, we aimed to develop and validate a non-invasive HCC risk predictive model for cirrhosis patients by integrating liver and spleen CT image signatures utilizing AI technology into the aMAP score based on a nationwide cohort.

Methods

This study followed the CLEAR checklist to ensure comprehensive and standardized reporting.¹⁸ The study was approved by the Ethics Committee of Nanfang Hospital (approval Number: NFEC-2018-101) and was conducted in accordance with the guidelines of the Declaration of Helsinki. Patient informed consent was waived given the retrospective design, and all data were de-identified.

Study population

This retrospective study was based on a prospective multicenter observational cirrhotic cohort in China (PreCar cohort, NCT03588442). In this cohort, 4,692 adults with cirrhosis were enrolled from June 2018 to January 2020 at 16 centers across 11 provinces in China. The main etiology of cirrhosis was chronic hepatitis B virus (HBV) infection, and all HBV-infected patients received antiviral therapy during the follow-up period. Upon enrollment, all patients underwent contrast-enhanced CT or MRI according to protocol to rule out pre-existing HCC. Diagnoses of cirrhosis and HCC were based on standard histological and/or compatible radiological findings. For detailed information, please refer to the Supplementary File 1. Subsequently, all patients underwent biannual protocol follow-up.^10,19

For this study, we excluded patients who met any of the following criteria: (1) loss to follow-up or tumorigenesis within 3 months before/after enrollment, or uncertain outcomes; (2) lack of available CT images at enrollment; (3) incomplete clinical data; (4) poor-quality or incomplete CT images; and (5) history of splenectomy. Finally, patients from 11 centers in the PreCar cohort were included (Supplementary Table 1).

Data and modeling

To ensure generalizability, patients from multiple centers were divided into training and validation cohorts (7:3 ratio), while patients from the center with the largest sample size (Nanfang Hospital) were assigned to the test cohort. Liver and spleen CT images from the arterial, venous, and delayed phases were retrieved, preprocessed, and normalized to enhance image consistency. Image characteristics across different centers are detailed in Supplementary Table 2. Clinical data, specifically the aMAP score, were calculated during follow-up visits. Segmentation of the liver and spleen was achieved using nnU-Net,²⁰ employing a two-step process of pre-training and formal training for accurate delineation of regions of interest. Radiomics and deep learning features (Supplementary Fig. 1) were extracted from the CT images, with least absolute shrinkage and selection operator (LASSO) regression applied for feature selection. Details regarding radiomics feature extraction followed the Image Biomarker Standardisation Initiative guidelines (Supplementary Table 3). Logistic regression was used to construct the image signature score, which was then combined with the aMAP model through another logistic regression to develop the aMAP-CT model. Model evaluation encompassed the area under the receiver operating characteristic curve (AUC), net reclassification improvement, calibration, subgroup analyses, decision curve analysis, and comparisons with existing scores, demonstrating the superior performance of the aMAP-CT model.Further details on data preparation, segmentation methodology, feature extraction, and model evaluation are available in the Supplementary File 1.

Statistical analysis

Statistical analyses were conducted using R software (version 4.3.0; http://www.r-project.org ) and Python (version 3.9; https://www.python.org ). Descriptive results are presented as medians (IQR) for continuous variables and as numbers (percentages) for categorical data. Patient characteristics at enrollment were compared among the three subsets of the cohort using the Kruskal–Wallis H test for continuous variables and the chi-squared test for categorical variables. All statistical tests were two-sided, with p < 0.01 considered statistically significant unless otherwise specified.

Results

Patient characteristics

A total of 2,411 patients from 11 centers in the PreCar cohort were included after excluding those without definitive outcomes or available CT images, along with other exclusion criteria (Supplementary Fig. 2). All patients had confirmed cirrhosis. Chronic HBV infection was the main etiology, accounting for 91.5% of cases. During a median follow-up of 42.7 (IQR 32.9–54.1) months, 118 patients developed HCC, with a three-year cumulative incidence of 3.59% (Supplementary Fig. 3). The clinical characteristics at enrollment are shown in Table 1. The three cohorts had similar distributions of clinical features.

Table 1

Clinical characteristics of the patients at enrollment

Characteristics	Overall	Training cohort	Validation cohort	Test cohort	p-value
Total patients, n	2,411	809	348	1,254	-
Follow-up time, months	42.7 [32.9, 54.1]	33.2 [15.9, 37.2]	33.8 [15.8, 37.3]	53.8 [50.0, 58.0]	0.981
Age, years	49.67 [42.8, 56.3]	52.48 [45.2, 59.5]	50.52 [44.3, 57.1]	47.74 [41.1, 54.6]	0.057
Male, n (%)	1,879 (77.9)	544 (67.2)	260 (74.7)	1,075 (85.7)	0.014
Etiology, n (%)					0.010
HBV	2,206 (91.5)	685 (84.7)	315 (90.5)	1,206 (96.2)
Other^a	205 (8.5)	124 (15.3)	33 (9.5)	48 (3.8)
ALT, IU/L	28.00 [21.0, 40.0]	28.00 [20.0, 41.0]	27.00 [20.0, 39.1]	29.00 [21.0, 40.0]	0.522
TBIL, µmol/L	16.30 [12.0, 24.0]	17.19 [12.7, 26.0]	17.00 [12.9, 23.7]	15.50 [11.1, 22.4]	0.699
Albumin, g/L	43.40 [39.9, 46.3]	43.00 [38.9, 46.0]	43.25 [38.9, 46.3]	43.70 [40.7, 46.4]	0.363
PLT, ×10³/mm³	115.00 [77.0, 156.0]	105.00 [71.0, 145.0]	102.00 [74.8, 148.3]	125.00 [83.0, 165.8]	0.916
AFP, ng/ml	2.93 [1.8, 5.2]	3.22 [2.1, 5.81]	3.10 [2.0, 4.9]	2.66 [1.6, 4.8]	0.153
aMAP score^b	58.70 [54.2, 63.3]	60.66 [56.6, 65.2]	60.07 [56.4, 65.2]	56.89 [52.4, 61.5]	0.022
aMAP HCC risk, n (%)					0.036
Low-risk (<50)	245/2,411 (10.2)	49/809 (6.1)	16/348 (4.6)	180/1,254 (14.4)
Medium-risk (50–60)	1,152/2,411 (47.8)	324/809 (40.1)	157/348 (45.1)	671/1,254 (53.5)
High-risk (>60)	1,014/2,411 (42.1)	436/809 (53.9)	175/348 (50.3)	403/1,254 (32.1)
HCC cases during follow-up, n (%)	118 (4.9)	24 (3.0)	11 (3.2)	83 (6.6)	<0.001

The PreCar cohort was used to develop and validate the model. The p-value measures the difference across all three datasets, with p < 0.01 considered statistically significant. The values in square brackets indicate the IQR of variables with a non-normal distribution, while in the case of a normal distribution, they represent the SD. ^aOther etiologies include hepatitis C virus infection, alcoholic fatty liver disease, non-alcoholic fatty liver disease, and unknown. ^baMAP score is an index reflecting the underlying HCC development risk calculated by age, sex, albumin, total bilirubin, and platelet, which has been proven to have excellent predictive performance among patients with different etiology and ethnicity in an international cohort collaboration. ALT, alanine aminotransferase; TBIL, total bilirubin; PLT, platelet; HBV, hepatitis B virus; HCC, hepatocellular carcinoma; IQR, interquartile range; SD, standard deviation; aMAP, age-male-ALBI-platelet.

Model construction

Based on the state-of-the-art nnU-Net, the Dice scores for liver and spleen segmentation reached 0.974 and 0.979, respectively, during formal training. Original images and masks were cropped to the maximal 3D segmentation dimensions (Supplementary Fig. 4).

After segmentation and preprocessing, the image signature score was constructed. A total of 8,184 features were extracted, including 2,556 radiomics features and 1,536 deep features for the liver, and 2,556 and 1,536, respectively, for the spleen. Subsequently, using LASSO regression models (with three-fold cross-validation), the optimal features with non-zero weights were selected (Supplementary Fig. 5). Using logistic regression, the selected features were quantitatively integrated into the image signature score. The features and their corresponding coefficients are shown in Supplementary Table 4.

Subsequently, the CT image signature scores for both liver and spleen were added to the aMAP model using logistic regression (termed the aMAP-CT model), resulting in the final formula: aMAP − CT score = 0.52 × CT image score + 1.07 × aMAP score – 4.26.

Discrimination and calibration performance of the model

The aMAP-CT score demonstrated superior discrimination performance across all three cohorts. It achieved an AUC of 0.869 (95% confidence interval (CI), 0.789–0.931) in the training cohort, 0.809 (95% CI, 0.686–0.927) in the validation cohort, and 0.815 (95% CI, 0.762–0.868) in the test cohort, all significantly higher than those of the aMAP model and the models involving only aMAP and liver signatures (Fig. 1). This enhancement was further supported by net reclassification improvement values of 0.41 (95% CI, 0.21–0.60) in the training cohort, 0.06 (95% CI, 0.00–0.16) in the validation cohort, and 0.40 (95% CI, 0.27–0.50) in the test cohort, all with p-values < 0.05. The sensitivity, specificity, positive predictive value, negative predictive value, accuracy, and F1-score of the aMAP-CT model were also satisfactory (Table 2). Additionally, the calibration curve showed excellent agreement between predicted and observed probabilities for HCC development across all cohorts (Supplementary Fig. 6).

ROC curves of aMAP-CT (aMAP + liver + spleen), model with aMAP and liver image signatures (aMAP + liver), and aMAP to predict HCC occurrence in the training cohort (A), validation cohort (B), and test cohort (C).

Fig. 1 ROC curves of aMAP-CT (aMAP + liver + spleen), model with aMAP and liver image signatures (aMAP + liver), and aMAP to predict HCC occurrence in the training cohort (A), validation cohort (B), and test cohort (C).

AUC, the area under the receiver operating characteristic curve; CI, confidence interval; HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Table 2

Performance evaluation of the aMAP-CT model

Cohort	n	HCC, n (%)	AUC	SEN	SPE	PPV	NPV	ACC	F1-score
Training cohort	809	24 (3.0)	0.869 [0.789 0.931]	0.792 [0.663, 0.885]	0.789 [0.756, 0.820]	0.103 [0.070, 0.150]	0.992 [0.984, 0.997]	0.789 [0.759, 0.818]	0.182 [0.122, 0.242]
Validation cohort	348	11 (3.2)	0.809 [0.686 0.927]	0.727 [0.601, 0.853]	0.780 [0.713, 0.847]	0.098 [0.079, 0.117]	0.989 [0.973, 1.000]	0.779 [0.712, 0.846]	0.172 [0.144, 0.200]
Test cohort	1,254	83 (6.6)	0.815 [0.762 0.868]	0.602 [0.511, 0.693]	0.878 [0.803, 0.953]	0.259 [0.220, 0.298]	0.969 [0.951, 0.987]	0.860 [0.785, 0.935]	0.362 [0.318, 0.406]

The values in square brackets represent the 95% CI. SEN, sensitivity; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; ACC, accuracy; HCC, hepatocellular carcinoma; CI, confidence interval; aMAP, age-male-ALBI-platelet; CT, computed tomography.

HCC risk stratification based on the aMAP-CT model

Using the optimal cut-off value (0.37), patients were classified into low- and high-risk groups. In the training cohort (n = 809), 61 patients (7.5%) were classified as high-risk, while the remaining 748 (92.5%) were categorized as low-risk by the aMAP-CT model. The three-year cumulative incidence of HCC was 20.3% in the high-risk group and 2.2% in the low-risk group (p <0.0001) (Fig. 2A). Similar results were observed in the validation and test cohorts (Fig. 2B and C). There was a greater distinction between the low- and high-risk groups identified by aMAP-CT (hazard ratio (HR): 12.3; 95% CI: 5.8–26.0), compared with the aMAP score (HR: 3.1; 95% CI: 2.2–4.5), the model involving only aMAP and spleen signatures (HR: 3.9; 95% CI: 1.7–8.7), and the model involving only aMAP and liver signatures (HR: 4.0; 95% CI: 2.2–7.1) (Supplementary Figs. 7–9, Supplementary Table 5).

Fig. 2 Cumulative incidence of HCC in the training (A), validation (B), and test (C) cohorts stratified by the aMAP-CT model.

HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Decision curves were plotted to evaluate the clinical utility of models for three-year HCC risk prediction (Supplementary Fig. 10). In all three cohorts, the aMAP-CT model demonstrated superior net clinical benefit compared to the reference strategies, as evidenced by its higher overall net benefit values. The aMAP-CT model significantly outperformed the aMAP model in net clinical benefit, underscoring the value of incorporating image signatures.

Subgroup analysis

The predictive accuracy of the aMAP-CT model in subgroups of each cohort is shown in Table 3 and Supplementary Table 6. In all three cohorts, the combined score performed well across most subgroups regardless of sex, age, and aMAP risk grades. However, due to low HCC occurrence in certain subgroups (e.g., females in the validation and test cohorts), AUC and sensitivity were lower—an issue that could be addressed by collecting more data. Notably, among aMAP-defined medium- to high-risk subgroups, time-to-event risk curves showed that the aMAP-CT score could clearly further stratify patients into two groups with significant differences in HCC risk (Fig. 3).

Table 3

Performance of the aMAP-CT model and related subgroup analysis in the training, validation, and test cohorts

	n	HCC, n (%)	AUC	SEN	SPE	PPV	NPV	ACC	F1-score
Training cohort	809	24 (3.0)	0.869	0.792	0.789	0.103	0.992	0.789	0.182
Males	544	16 (2.9)	0.836	0.750	0.777	0.092	0.990	0.776	0.164
Females	265	8 (3.0)	0.930	0.875	0.813	0.127	0.995	0.815	0.222
Age, years
≤45	200	3 (1.5)	0.942	0.667	0.898	0.091	0.994	0.895	0.160
45–55	293	6 (2.1)	0.852	0.667	0.805	0.067	0.991	0.802	0.121
≥55	316	15 (4.8)	0.838	0.867	0.701	0.126	0.991	0.709	0.220
aMAP score
low-risk	49	1 (2.0)	1.000	1.000	0.958	0.333	1.000	0.959	0.500
medium-risk	324	5 (1.5)	0.953	0.800	0.912	0.125	0.997	0.910	0.216
high-risk	436	18 (4.1)	0.809	0.778	0.675	0.093	0.986	0.679	0.167
Validation cohort	348	11 (3.2)	0.809	0.727	0.780	0.098	0.989	0.779	0.172
Males	259	10 (3.9)	0.801	0.727	0.767	0.121	0.985	0.765	0.208
Females	89	1 (1.1)	0.352	0.000	0.818	0.000	0.986	0.809	n.a.
Age, years
≤45	96	2 (2.1)	0.718	0.500	0.883	0.083	0.988	0.875	0.143
45–55	136	4 (2.9)	0.780	0.500	0.818	0.077	0.982	0.809	0.133
≥55	116	5 (4.3)	0.877	1.000	0.649	0.114	1.000	0.664	0.204
aMAP score
low-risk	16	1 (6.3)	0.933	1.000	0.867	0.333	1.000	0.875	0.500
medium-risk	157	3 (1.9)	0.742	0.333	0.922	0.077	0.986	0.911	0.125
high-risk	175	7 (4.0)	0.816	0.857	0.643	0.091	0.991	0.651	0.164
Test cohort	1,254	83 (6.6)	0.815	0.602	0.878	0.259	0.969	0.860	0.362
Males	1,075	78 (7.3)	0.826	0.628	0.876	0.283	0.968	0.858	0.390
Females	179	5 (2.8)	0.640	0.200	0.891	0.050	0.975	0.872	0.080
Age, years
≤45	479	16 (3.3)	0.734	0.375	0.948	0.200	0.978	0.929	0.261
45–55	477	34 (7.1)	0.881	0.706	0.871	0.296	0.975	0.860	0.417
≥55	298	33 (11.1)	0.718	0.606	0.766	0.244	0.940	0.748	0.348
aMAP score
low-risk	180	5 (2.8)	0.725	0.000	0.989	0.000	0.972	0.961	n.a.
medium-risk	671	26 (3.9)	0.803	0.423	0.953	0.268	0.976	0.933	0.328
high-risk	403	52 (12.9)	0.765	0.750	0.684	0.260	0.949	0.692	0.386

SEN, sensitivity; SPE, specificity; PPV, positive predictive value; NPV, negative predictive value; ACC, accuracy; HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Cumulative incidence of HCC in aMAP-defined medium- to high-risk patients in the training (A), validation (B), and test (C) cohorts stratified by the aMAP-CT model.

Fig. 3 Cumulative incidence of HCC in aMAP-defined medium- to high-risk patients in the training (A), validation (B), and test (C) cohorts stratified by the aMAP-CT model.

HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Comparison of the predictive performance of the aMAP-CT model with other existing HCC risk scores

Existing HCC risk scores, including aMAP-2, aMAP-2 Plus, CU-HCC, LSM-HCC, PAGE-B, mPAGE-B, and THRI, were calculated for all patients. Compared with the aMAP-2 Plus score, the aMAP-CT model showed no significant difference in terms of AUC values (p > 0.1) and sensitivity (p > 0.01) for predicting HCC occurrence within 18 months after enrollment. Furthermore, the aMAP-CT score demonstrated superior performance in predicting HCC risk compared to the other scores mentioned above, with significantly higher AUC and sensitivity values (Table 4; Supplementary Table 7).

Table 4

Comparison of the AUC values of the aMAP-CT model with other existing HCC risk scores in predicting HCC development among each cohort

Model	LSM-HCC	CU-HCC	PAGE-B	mPAGE-B	THRI
Training cohort	0.512 (0.394, 0.632)*	0.544 (0.408, 0.647)*	0.594 (0.470, 0.707)*	0.614 (0.495, 0.733)*	0.694 (0.575, 0.816)*
Validation cohort	0.450 (0.287, 0.616)*	0.572 (0.394, 0.748)	0.748 (0.569, 0.892)	0.739 (0.541, 0.897)	0.715 (0.534, 0.882)
Test cohort	0.350 (0.293, 0.402)*	0.651 (0.594, 0.729)*	0.676 (0.618, 0.734)*	0.671 (0.608, 0.736)*	0.688 (0.627, 0.763)*

Model	aMAP	aMAP-2	aMAP-2 plus (18 months)^a	aMAP-CT (18 months)^b	aMAP-CT
Training cohort	0.643 (0.517, 0.757)*	0.782 (0.689, 0.865)	0.943 (0.894, 0.979)#	0.882 (0.809, 0.954)	0.869 (0.789, 0.931)
Validation cohort	0.686 (0.472, 0.873)	0.649 (0.437, 0.833)	0.773 (0.634, 0.890)#	0.824 (0.686, 0.951)	0.815 (0.686, 0.927)
Test cohort	0.692 (0.630, 0.750)*	0.759 (0.702, 0.809)	0.922 (0.886, 0.950)#	0.897 (0.863, 0.932)	0.809 (0.762, 0.868)

^acfDNA signatures were only available within the first 12 months after enrollment in the PreCar cohort, thus the performance of the aMAP-2 Plus score was evaluated for HCC risk within 18 months. ^bIn comparison with aMAP-2 plus, aMAP-CT was assessed for HCC risk at the same 18-month time. *p-value (vs. aMAP-CT score) <0.05 (DeLong test). #p-value (vs. aMAP-CT score) > 0.1 (DeLong test). AUC, the area under the receiver operating characteristic curve; HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Stepwise application of aMAP and aMAP-CT

Considering cost-effectiveness, we adopted a stepwise approach using the aMAP score and the aMAP-CT score (aMAP → aMAP-CT) (Fig. 4). This approach was designed to achieve two key objectives: (1) to further refine the identification of super high-risk patients for more intensive monitoring, and (2) to exclude low-risk individuals who only require routine screening. Specifically, the aMAP-CT model stratified the medium- and high-risk groups initially identified by the aMAP score, pinpointing a subset of individuals at super high risk for HCC.

Fig. 4 Stepwise application of aMAP → aMAP-CT.

(A) Sankey plot for stepwise application. (B) Time-to-event risk analysis of HCC incidence of high- and low-risk groups among the overall cohort classified by the stepwise application of aMAP → aMAP-CT (compared using the log-rank test). HCC, hepatocellular carcinoma; aMAP, age-male-ALBI-platelet; CT, computed tomography.

Figure 4A illustrates the reclassification of patients using the stepwise approach, emphasizing the additional value provided by the aMAP-CT model in enhancing risk stratification. Figure 4B depicts the cumulative incidence of HCC in the reclassified groups. Notably, the stepwise application enriched 169 individuals, accounting for only 7% of the cohort, into the super high-risk group, who exhibited a significantly higher three-year HCC incidence of 27.2%, compared to 1.8% in the low-risk group (p < 0.0001).

Discussion

In this nationwide, multicenter study, we developed and externally validated the aMAP-CT model for HCC risk prediction by integrating liver and spleen CT image signatures with the aMAP model, using data from 2,411 cirrhosis patients across 11 centers in mainland China. Adding both liver and spleen image signatures enhanced robustness and patient stratification. The stepwise application of the aMAP and aMAP-CT scores improved cost-effectiveness by enriching a more targeted population at higher risk for intensive HCC surveillance. To our knowledge, this is the first HCC risk model to incorporate liver and spleen CT image signatures, thus supporting more precise screening strategies.

As our outcome of interest is HCC development among cirrhotic patients, the aMAP-CT model is designed to capture features truly predictive of HCC occurrence rather than merely reflecting cirrhosis severity. The aMAP-CT model achieved an AUC of 0.809–0.869, outperforming the aMAP score. It serves as an alternative to the cfDNA-dependent aMAP-2 Plus, which faces cost and availability limitations.^9,10 All model training and tuning relied exclusively on heterogeneous data from multiple centers, strengthening generalizability and minimizing overfitting. Interestingly, the test set outperformed both the training and validation sets, likely due to differences in data distribution rather than data leakage. The training set, derived from multiple centers, enhances generalizability but may introduce noise or spurious patterns, while the more homogeneous test set from one center exhibited less noise and a more balanced distribution, contributing to better performance.

Although CT is not routinely used as a screening tool for cirrhotic patients, they often undergo CT scans for various clinical reasons. Clinical guidelines recognize CT as a superior modality for evaluating liver size, cirrhosis progression, and screening high-risk patients for HCC, particularly those with virus-related cirrhosis.²¹ Research has also shown that CT provides critical information for assessing complications such as portal vein thrombosis and evaluating the risk of upper gastrointestinal bleeding and liver venous pressure gradients non-invasively.^22–24 Our study demonstrates that a single CT scan can accurately assess HCC risk, making it a more practical and cost-effective alternative to aMAP-2 Plus.

In addition, CT imaging offers several other advantages when combined with AI, which improves the detection of microscopic lesions. More importantly, AI can recognize subtle anomalies, insights imperceptible to humans, allowing for the prediction of disease progression or treatment response.^25,26 The ALARM model, developed by our team using similar CT-based techniques, accurately predicts HCC onset three to twelve months in advance, confirming the superiority of integrating AI and medical imaging through unique pattern recognition capabilities.²⁷ This synergy of AI and CT enables personalized treatment planning, accelerates radiologic workflows, and improves intervention timing.

Beyond liver image signatures, our study also incorporated spleen information, an often underappreciated but critical factor in hepatocarcinogenesis. Beyond functioning as a reservoir of immune cells, alterations in the splenic immune microenvironment, driven by chronic inflammation and portal hypertension, can contribute to tumor-promoting systemic immunosuppression.^28–30 Both clinical and experimental evidence suggest that spleen-derived immune modulation influences HCC progression and therapeutic responsiveness.^31–37 AI-extracted spleen features can predict HCC development and have been associated with late recurrence after curative-intent resection in patients with HCC and cirrhosis.^38,39 In our study, overfitting observed in the model with only aMAP and liver signatures (aMAP + liver model) (AUC 0.853 in the training cohort vs. 0.691 in the validation cohort) was mitigated by including spleen signatures (AUC 0.869 in the training cohort vs. 0.809 in the validation cohort) (Fig. 1). Time-to-event risk analysis further confirmed that the aMAP-CT model more effectively identifies high-risk patients, highlighting the enhanced robustness and predictive power achieved through integrating spleen signatures.

The radiomic features selected in this study demonstrated potential associations with key biological processes of HCC, thereby enhancing the model’s pathophysiological interpretability. For instance, liver features such as Coarseness and Gray Level Non-Uniformity reflect increased parenchymal texture heterogeneity and spatial heterogeneity of intrahepatic angiogenesis. Among the selected spleen features, Skewness represents asymmetry in the gray-level distribution, suggesting tissue remodeling potentially caused by chronic portal hypertension. Although no existing literature directly links the radiomic features identified in our study to inflammation or portal hypertension, prior studies have indirectly demonstrated associations between CT-based spleen/liver texture features and these pathological processes.^22,40 While splenic volume is a recognized prognostic factor in HCC,³⁹ it was not retained in our LASSO-selected feature set. Additional analysis including splenic volume (Supplementary Table 8) revealed that its inclusion did not significantly enhance model performance, suggesting that volume-related information may have been implicitly captured by deep learning features.

Considering cost-effectiveness, we applied a stepwise analysis (aMAP → aMAP-CT) to stratify cirrhosis patients into two groups. This strategy enriched the high-risk group to only 7.0% of the overall cirrhosis population, with an increased annual HCC incidence of 13.2%. In contrast, the low-risk group constituted 93.0% of the overall cirrhosis population, exhibiting an annual HCC incidence of just 0.8%. Patients classified as low-risk can continue routine surveillance with US and alpha-fetoprotein testing every six months, whereas super–high-risk individuals require intensified surveillance strategies. The ideal monitoring interval (e.g., every three months), alternating use of contrast-enhanced MRI and CT, and the best combination of serum biomarkers remain to be determined through well-designed prospective randomized controlled trials. This approach optimizes resource allocation by focusing intensive surveillance and care on the small high-risk subset while minimizing unnecessary interventions for low-risk patients. At the same time, it is important to note that HCC risk evolves dynamically with disease progression, underscoring the need to monitor low-risk patients who may progress over time. To address this, it is essential to adhere to standard monitoring protocols for low-risk populations, integrate dynamic mechanisms to update risk factors, and develop short-term warning models to complement long-term predictions.

The key strength and innovation of our study lie in integrating liver and spleen CT image signatures to introduce a novel visual capability to traditional models, substantially enhancing their performance and unlocking new potential for early detection and timely intervention. However, it is important to acknowledge its limitations. First, this study focused on cirrhosis patients within the Chinese population, primarily those with HBV-related cirrhosis due to the national epidemiological profile. Therefore, further validation in non-HBV dominant populations, such as those with MASLD or HCV, is an important direction for future research. Second, despite including patients from 11 different institutions to assess reproducibility, as a retrospective study, potential selection bias was unavoidable. Integrating prospective studies will be crucial to verify the model’s performance in the future.

Third, while variability in acquisition parameters across different CT scanners may have contributed to the generalizability of our model, scanner-induced variability in radiomic features remains a limitation. Although standardized preprocessing and resampling steps were applied, more advanced harmonization methods, such as ComBat, have been shown to effectively reduce scanner-related bias in a multicenter radiomics study.⁴¹ Future research will explore the implementation and comparison of such harmonization frameworks to further minimize variability. While the aMAP-CT model is built on established radiomics and deep learning techniques, it lacks significant algorithmic innovations. Future work will focus on integrating advanced algorithms to enhance feature extraction, address overfitting, and improve predictive accuracy to ensure broader applicability in diverse populations. Looking ahead, developing software or online tools that integrate radiomics and deep learning for broader population analysis will also be necessary.

Conclusions

Incorporating liver and spleen image signatures into the aMAP score using AI techniques offers a more accessible and superior approach for individualized HCC risk prediction in cirrhosis patients. The stepwise application of the aMAP and aMAP-CT scores enhances enrichment strategies, effectively identifying 7% of cirrhosis patients at very high risk for HCC. This method provides a powerful tool for guiding individualized HCC surveillance, potentially improving early detection and patient outcomes.

Supporting information

Supplementary File 1

Supplementary Methods.

(DOCX)

Hepatocellular Carcinoma Risk Stratification for Cirrhosis Patients: Integrating Radiomics and Deep Learning Computed Tomography Signatures of the Liver and Spleen into a Clinical Model

Abstract

Background and Aims

Methods

Results

Conclusions

Keywords

Introduction

Methods

Study population

Data and modeling

Statistical analysis

Results

Patient characteristics

Model construction

Discrimination and calibration performance of the model

HCC risk stratification based on the aMAP-CT model

Subgroup analysis

Comparison of the predictive performance of the aMAP-CT model with other existing HCC risk scores

Stepwise application of aMAP and aMAP-CT

Discussion

Conclusions

Supporting information

Supplementary File 1

Supplementary Table 1

Supplementary Table 2

Supplementary Table 3

Supplementary Table 4

Supplementary Table 5

Supplementary Table 6

Supplementary Table 7

Supplementary Table 8

Supplementary Fig. 1

Supplementary Fig. 2

Supplementary Fig. 3

Supplementary Fig. 4

Supplementary Fig. 5

Supplementary Fig. 6

Supplementary Fig. 7

Supplementary Fig. 8

Supplementary Fig. 9

Supplementary Fig. 10

Declarations

Acknowledgement

Ethical statement

Data sharing statement

Funding

Conflict of interest

Authors’ contributions

References

Table of Contents

Article History

Metrics

Citation