Introduction
Hepatocellular carcinoma (HCC) is a major global health challenge and ranks as the third leading cause of cancer-related deaths worldwide.1 Most HCC cases develop in the context of liver cirrhosis. Regular surveillance for these patients enables early detection, diagnosis, and treatment, which enhances treatment efficacy and reduces mortality. However, current HCC surveillance strategies for cirrhotic patients, which rely on biannual ultrasound (US) and alpha-fetoprotein testing, have limited sensitivity, missing one-third of early-stage HCC cases.2,3 This highlights the need for more effective risk stratification approaches to identify high-risk cirrhotic individuals and optimize monitoring, thereby improving the cost-effectiveness of HCC screening programs.
Current risk stratification models for predicting HCC have made significant progress.4–8 Our group developed the age-male-ALBI-platelet (aMAP) score for HCC risk prediction using data from 11 global prospective cohorts of individuals with chronic hepatitis.9 However, the aMAP score, like other predictive models, faces challenges, particularly in patients with cirrhosis, where its performance is diminished (C-index of 0.74). To address this, our team recently developed the aMAP-2 Plus, which utilizes cell-free DNA (cfDNA) and has demonstrated excellent predictive performance in patients with cirrhosis.10 Despite its promise, the aMAP-2 Plus faces challenges due to the limited availability and high costs of cfDNA, potentially restricting its practical utility.11,12 Therefore, efforts should focus on developing a non-invasive model using more accessible and advanced biomarkers as substitutes for cfDNA, thereby meeting the early-warning needs of cirrhosis patients.
Emerging imaging-based surveillance strategies show promise by capturing comprehensive gene expression patterns through advanced medical imaging modalities.13 Artificial intelligence (AI), particularly deep learning, a subset of machine learning, enables computers to learn from medical images, identify hidden patterns, and assist clinicians in the diagnosis and prognosis of liver disease.14 We hypothesize that a biomarker based on image signatures extracted through radiomics and deep learning could significantly enhance stratification performance. Integrating these advanced techniques into the aMAP score could provide more accurate and individualized risk assessments.
Given the potential of imaging biomarkers, selecting the optimal imaging modality, whether US, computed tomography (CT), or magnetic resonance imaging (MRI), is critical for generating reliable indicators. A meta-analysis of prospective cohorts shows that biphasic CT has significantly higher sensitivity than US for detecting very early-stage HCC, as US is highly operator-dependent.15,16 Additionally, compared to MRI, CT is a more commonly used tool for diagnosing cirrhosis, assessing decompensation, and evaluating the risk of HCC progression.17
Therefore, we aimed to develop and validate a non-invasive HCC risk predictive model for cirrhosis patients by integrating liver and spleen CT image signatures utilizing AI technology into the aMAP score based on a nationwide cohort.
Methods
This study followed the CLEAR checklist to ensure comprehensive and standardized reporting.18 The study was approved by the Ethics Committee of Nanfang Hospital (approval Number: NFEC-2018-101) and was conducted in accordance with the guidelines of the Declaration of Helsinki. Patient informed consent was waived given the retrospective design, and all data were de-identified.
Study population
This retrospective study was based on a prospective multicenter observational cirrhotic cohort in China (PreCar cohort, NCT03588442). In this cohort, 4,692 adults with cirrhosis were enrolled from June 2018 to January 2020 at 16 centers across 11 provinces in China. The main etiology of cirrhosis was chronic hepatitis B virus (HBV) infection, and all HBV-infected patients received antiviral therapy during the follow-up period. Upon enrollment, all patients underwent contrast-enhanced CT or MRI according to protocol to rule out pre-existing HCC. Diagnoses of cirrhosis and HCC were based on standard histological and/or compatible radiological findings. For detailed information, please refer to the Supplementary File 1. Subsequently, all patients underwent biannual protocol follow-up.10,19
For this study, we excluded patients who met any of the following criteria: (1) loss to follow-up or tumorigenesis within 3 months before/after enrollment, or uncertain outcomes; (2) lack of available CT images at enrollment; (3) incomplete clinical data; (4) poor-quality or incomplete CT images; and (5) history of splenectomy. Finally, patients from 11 centers in the PreCar cohort were included (Supplementary Table 1).
Data and modeling
To ensure generalizability, patients from multiple centers were divided into training and validation cohorts (7:3 ratio), while patients from the center with the largest sample size (Nanfang Hospital) were assigned to the test cohort. Liver and spleen CT images from the arterial, venous, and delayed phases were retrieved, preprocessed, and normalized to enhance image consistency. Image characteristics across different centers are detailed in Supplementary Table 2. Clinical data, specifically the aMAP score, were calculated during follow-up visits. Segmentation of the liver and spleen was achieved using nnU-Net,20 employing a two-step process of pre-training and formal training for accurate delineation of regions of interest. Radiomics and deep learning features (Supplementary Fig. 1) were extracted from the CT images, with least absolute shrinkage and selection operator (LASSO) regression applied for feature selection. Details regarding radiomics feature extraction followed the Image Biomarker Standardisation Initiative guidelines (Supplementary Table 3). Logistic regression was used to construct the image signature score, which was then combined with the aMAP model through another logistic regression to develop the aMAP-CT model. Model evaluation encompassed the area under the receiver operating characteristic curve (AUC), net reclassification improvement, calibration, subgroup analyses, decision curve analysis, and comparisons with existing scores, demonstrating the superior performance of the aMAP-CT model.Further details on data preparation, segmentation methodology, feature extraction, and model evaluation are available in the Supplementary File 1.
Statistical analysis
Statistical analyses were conducted using R software (version 4.3.0; http://www.r-project.org ) and Python (version 3.9; https://www.python.org ). Descriptive results are presented as medians (IQR) for continuous variables and as numbers (percentages) for categorical data. Patient characteristics at enrollment were compared among the three subsets of the cohort using the Kruskal–Wallis H test for continuous variables and the chi-squared test for categorical variables. All statistical tests were two-sided, with p < 0.01 considered statistically significant unless otherwise specified.
Results
Patient characteristics
A total of 2,411 patients from 11 centers in the PreCar cohort were included after excluding those without definitive outcomes or available CT images, along with other exclusion criteria (Supplementary Fig. 2). All patients had confirmed cirrhosis. Chronic HBV infection was the main etiology, accounting for 91.5% of cases. During a median follow-up of 42.7 (IQR 32.9–54.1) months, 118 patients developed HCC, with a three-year cumulative incidence of 3.59% (Supplementary Fig. 3). The clinical characteristics at enrollment are shown in Table 1. The three cohorts had similar distributions of clinical features.
Table 1Clinical characteristics of the patients at enrollment
Characteristics | Overall | Training cohort | Validation cohort | Test cohort | p-value |
---|
Total patients, n | 2,411 | 809 | 348 | 1,254 | - |
Follow-up time, months | 42.7 [32.9, 54.1] | 33.2 [15.9, 37.2] | 33.8 [15.8, 37.3] | 53.8 [50.0, 58.0] | 0.981 |
Age, years | 49.67 [42.8, 56.3] | 52.48 [45.2, 59.5] | 50.52 [44.3, 57.1] | 47.74 [41.1, 54.6] | 0.057 |
Male, n (%) | 1,879 (77.9) | 544 (67.2) | 260 (74.7) | 1,075 (85.7) | 0.014 |
Etiology, n (%) | | | | | 0.010 |
HBV | 2,206 (91.5) | 685 (84.7) | 315 (90.5) | 1,206 (96.2) | |
Othera | 205 (8.5) | 124 (15.3) | 33 (9.5) | 48 (3.8) | |
ALT, IU/L | 28.00 [21.0, 40.0] | 28.00 [20.0, 41.0] | 27.00 [20.0, 39.1] | 29.00 [21.0, 40.0] | 0.522 |
TBIL, µmol/L | 16.30 [12.0, 24.0] | 17.19 [12.7, 26.0] | 17.00 [12.9, 23.7] | 15.50 [11.1, 22.4] | 0.699 |
Albumin, g/L | 43.40 [39.9, 46.3] | 43.00 [38.9, 46.0] | 43.25 [38.9, 46.3] | 43.70 [40.7, 46.4] | 0.363 |
PLT, ×103/mm3 | 115.00 [77.0, 156.0] | 105.00 [71.0, 145.0] | 102.00 [74.8, 148.3] | 125.00 [83.0, 165.8] | 0.916 |
AFP, ng/ml | 2.93 [1.8, 5.2] | 3.22 [2.1, 5.81] | 3.10 [2.0, 4.9] | 2.66 [1.6, 4.8] | 0.153 |
aMAP scoreb | 58.70 [54.2, 63.3] | 60.66 [56.6, 65.2] | 60.07 [56.4, 65.2] | 56.89 [52.4, 61.5] | 0.022 |
aMAP HCC risk, n (%) | | | | | 0.036 |
Low-risk (<50) | 245/2,411 (10.2) | 49/809 (6.1) | 16/348 (4.6) | 180/1,254 (14.4) | |
Medium-risk (50–60) | 1,152/2,411 (47.8) | 324/809 (40.1) | 157/348 (45.1) | 671/1,254 (53.5) | |
High-risk (>60) | 1,014/2,411 (42.1) | 436/809 (53.9) | 175/348 (50.3) | 403/1,254 (32.1) | |
HCC cases during follow-up, n (%) | 118 (4.9) | 24 (3.0) | 11 (3.2) | 83 (6.6) | <0.001 |
Model construction
Based on the state-of-the-art nnU-Net, the Dice scores for liver and spleen segmentation reached 0.974 and 0.979, respectively, during formal training. Original images and masks were cropped to the maximal 3D segmentation dimensions (Supplementary Fig. 4).
After segmentation and preprocessing, the image signature score was constructed. A total of 8,184 features were extracted, including 2,556 radiomics features and 1,536 deep features for the liver, and 2,556 and 1,536, respectively, for the spleen. Subsequently, using LASSO regression models (with three-fold cross-validation), the optimal features with non-zero weights were selected (Supplementary Fig. 5). Using logistic regression, the selected features were quantitatively integrated into the image signature score. The features and their corresponding coefficients are shown in Supplementary Table 4.
Subsequently, the CT image signature scores for both liver and spleen were added to the aMAP model using logistic regression (termed the aMAP-CT model), resulting in the final formula: aMAP − CT score = 0.52 × CT image score + 1.07 × aMAP score – 4.26.
Discrimination and calibration performance of the model
The aMAP-CT score demonstrated superior discrimination performance across all three cohorts. It achieved an AUC of 0.869 (95% confidence interval (CI), 0.789–0.931) in the training cohort, 0.809 (95% CI, 0.686–0.927) in the validation cohort, and 0.815 (95% CI, 0.762–0.868) in the test cohort, all significantly higher than those of the aMAP model and the models involving only aMAP and liver signatures (Fig. 1). This enhancement was further supported by net reclassification improvement values of 0.41 (95% CI, 0.21–0.60) in the training cohort, 0.06 (95% CI, 0.00–0.16) in the validation cohort, and 0.40 (95% CI, 0.27–0.50) in the test cohort, all with p-values < 0.05. The sensitivity, specificity, positive predictive value, negative predictive value, accuracy, and F1-score of the aMAP-CT model were also satisfactory (Table 2). Additionally, the calibration curve showed excellent agreement between predicted and observed probabilities for HCC development across all cohorts (Supplementary Fig. 6).
Table 2Performance evaluation of the aMAP-CT model
Cohort | n | HCC, n (%) | AUC | SEN | SPE | PPV | NPV | ACC | F1-score |
---|
Training cohort | 809 | 24 (3.0) | 0.869 [0.789 0.931] | 0.792 [0.663, 0.885] | 0.789 [0.756, 0.820] | 0.103 [0.070, 0.150] | 0.992 [0.984, 0.997] | 0.789 [0.759, 0.818] | 0.182 [0.122, 0.242] |
Validation cohort | 348 | 11 (3.2) | 0.809 [0.686 0.927] | 0.727 [0.601, 0.853] | 0.780 [0.713, 0.847] | 0.098 [0.079, 0.117] | 0.989 [0.973, 1.000] | 0.779 [0.712, 0.846] | 0.172 [0.144, 0.200] |
Test cohort | 1,254 | 83 (6.6) | 0.815 [0.762 0.868] | 0.602 [0.511, 0.693] | 0.878 [0.803, 0.953] | 0.259 [0.220, 0.298] | 0.969 [0.951, 0.987] | 0.860 [0.785, 0.935] | 0.362 [0.318, 0.406] |
HCC risk stratification based on the aMAP-CT model
Using the optimal cut-off value (0.37), patients were classified into low- and high-risk groups. In the training cohort (n = 809), 61 patients (7.5%) were classified as high-risk, while the remaining 748 (92.5%) were categorized as low-risk by the aMAP-CT model. The three-year cumulative incidence of HCC was 20.3% in the high-risk group and 2.2% in the low-risk group (p <0.0001) (Fig. 2A). Similar results were observed in the validation and test cohorts (Fig. 2B and C). There was a greater distinction between the low- and high-risk groups identified by aMAP-CT (hazard ratio (HR): 12.3; 95% CI: 5.8–26.0), compared with the aMAP score (HR: 3.1; 95% CI: 2.2–4.5), the model involving only aMAP and spleen signatures (HR: 3.9; 95% CI: 1.7–8.7), and the model involving only aMAP and liver signatures (HR: 4.0; 95% CI: 2.2–7.1) (Supplementary Figs. 7–9, Supplementary Table 5).
Decision curves were plotted to evaluate the clinical utility of models for three-year HCC risk prediction (Supplementary Fig. 10). In all three cohorts, the aMAP-CT model demonstrated superior net clinical benefit compared to the reference strategies, as evidenced by its higher overall net benefit values. The aMAP-CT model significantly outperformed the aMAP model in net clinical benefit, underscoring the value of incorporating image signatures.
Subgroup analysis
The predictive accuracy of the aMAP-CT model in subgroups of each cohort is shown in Table 3 and Supplementary Table 6. In all three cohorts, the combined score performed well across most subgroups regardless of sex, age, and aMAP risk grades. However, due to low HCC occurrence in certain subgroups (e.g., females in the validation and test cohorts), AUC and sensitivity were lower—an issue that could be addressed by collecting more data. Notably, among aMAP-defined medium- to high-risk subgroups, time-to-event risk curves showed that the aMAP-CT score could clearly further stratify patients into two groups with significant differences in HCC risk (Fig. 3).
Table 3Performance of the aMAP-CT model and related subgroup analysis in the training, validation, and test cohorts
| n | HCC, n (%) | AUC | SEN | SPE | PPV | NPV | ACC | F1-score |
---|
Training cohort | 809 | 24 (3.0) | 0.869 | 0.792 | 0.789 | 0.103 | 0.992 | 0.789 | 0.182 |
Males | 544 | 16 (2.9) | 0.836 | 0.750 | 0.777 | 0.092 | 0.990 | 0.776 | 0.164 |
Females | 265 | 8 (3.0) | 0.930 | 0.875 | 0.813 | 0.127 | 0.995 | 0.815 | 0.222 |
Age, years | | | | | | | | | |
≤45 | 200 | 3 (1.5) | 0.942 | 0.667 | 0.898 | 0.091 | 0.994 | 0.895 | 0.160 |
45–55 | 293 | 6 (2.1) | 0.852 | 0.667 | 0.805 | 0.067 | 0.991 | 0.802 | 0.121 |
≥55 | 316 | 15 (4.8) | 0.838 | 0.867 | 0.701 | 0.126 | 0.991 | 0.709 | 0.220 |
aMAP score | | | | | | | | | |
low-risk | 49 | 1 (2.0) | 1.000 | 1.000 | 0.958 | 0.333 | 1.000 | 0.959 | 0.500 |
medium-risk | 324 | 5 (1.5) | 0.953 | 0.800 | 0.912 | 0.125 | 0.997 | 0.910 | 0.216 |
high-risk | 436 | 18 (4.1) | 0.809 | 0.778 | 0.675 | 0.093 | 0.986 | 0.679 | 0.167 |
Validation cohort | 348 | 11 (3.2) | 0.809 | 0.727 | 0.780 | 0.098 | 0.989 | 0.779 | 0.172 |
Males | 259 | 10 (3.9) | 0.801 | 0.727 | 0.767 | 0.121 | 0.985 | 0.765 | 0.208 |
Females | 89 | 1 (1.1) | 0.352 | 0.000 | 0.818 | 0.000 | 0.986 | 0.809 | n.a. |
Age, years | | | | | | | | | |
≤45 | 96 | 2 (2.1) | 0.718 | 0.500 | 0.883 | 0.083 | 0.988 | 0.875 | 0.143 |
45–55 | 136 | 4 (2.9) | 0.780 | 0.500 | 0.818 | 0.077 | 0.982 | 0.809 | 0.133 |
≥55 | 116 | 5 (4.3) | 0.877 | 1.000 | 0.649 | 0.114 | 1.000 | 0.664 | 0.204 |
aMAP score | | | | | | | | | |
low-risk | 16 | 1 (6.3) | 0.933 | 1.000 | 0.867 | 0.333 | 1.000 | 0.875 | 0.500 |
medium-risk | 157 | 3 (1.9) | 0.742 | 0.333 | 0.922 | 0.077 | 0.986 | 0.911 | 0.125 |
high-risk | 175 | 7 (4.0) | 0.816 | 0.857 | 0.643 | 0.091 | 0.991 | 0.651 | 0.164 |
Test cohort | 1,254 | 83 (6.6) | 0.815 | 0.602 | 0.878 | 0.259 | 0.969 | 0.860 | 0.362 |
Males | 1,075 | 78 (7.3) | 0.826 | 0.628 | 0.876 | 0.283 | 0.968 | 0.858 | 0.390 |
Females | 179 | 5 (2.8) | 0.640 | 0.200 | 0.891 | 0.050 | 0.975 | 0.872 | 0.080 |
Age, years | | | | | | | | | |
≤45 | 479 | 16 (3.3) | 0.734 | 0.375 | 0.948 | 0.200 | 0.978 | 0.929 | 0.261 |
45–55 | 477 | 34 (7.1) | 0.881 | 0.706 | 0.871 | 0.296 | 0.975 | 0.860 | 0.417 |
≥55 | 298 | 33 (11.1) | 0.718 | 0.606 | 0.766 | 0.244 | 0.940 | 0.748 | 0.348 |
aMAP score | | | | | | | | | |
low-risk | 180 | 5 (2.8) | 0.725 | 0.000 | 0.989 | 0.000 | 0.972 | 0.961 | n.a. |
medium-risk | 671 | 26 (3.9) | 0.803 | 0.423 | 0.953 | 0.268 | 0.976 | 0.933 | 0.328 |
high-risk | 403 | 52 (12.9) | 0.765 | 0.750 | 0.684 | 0.260 | 0.949 | 0.692 | 0.386 |
Comparison of the predictive performance of the aMAP-CT model with other existing HCC risk scores
Existing HCC risk scores, including aMAP-2, aMAP-2 Plus, CU-HCC, LSM-HCC, PAGE-B, mPAGE-B, and THRI, were calculated for all patients. Compared with the aMAP-2 Plus score, the aMAP-CT model showed no significant difference in terms of AUC values (p > 0.1) and sensitivity (p > 0.01) for predicting HCC occurrence within 18 months after enrollment. Furthermore, the aMAP-CT score demonstrated superior performance in predicting HCC risk compared to the other scores mentioned above, with significantly higher AUC and sensitivity values (Table 4; Supplementary Table 7).
Table 4Comparison of the AUC values of the aMAP-CT model with other existing HCC risk scores in predicting HCC development among each cohort
Model | LSM-HCC | CU-HCC | PAGE-B | mPAGE-B | THRI |
---|
Training cohort | 0.512 (0.394, 0.632)* | 0.544 (0.408, 0.647)* | 0.594 (0.470, 0.707)* | 0.614 (0.495, 0.733)* | 0.694 (0.575, 0.816)* |
Validation cohort | 0.450 (0.287, 0.616)* | 0.572 (0.394, 0.748) | 0.748 (0.569, 0.892) | 0.739 (0.541, 0.897) | 0.715 (0.534, 0.882) |
Test cohort | 0.350 (0.293, 0.402)* | 0.651 (0.594, 0.729)* | 0.676 (0.618, 0.734)* | 0.671 (0.608, 0.736)* | 0.688 (0.627, 0.763)* |
Model | aMAP | aMAP-2 | aMAP-2 plus (18 months)a | aMAP-CT (18 months)b | aMAP-CT |
---|
Training cohort | 0.643 (0.517, 0.757)* | 0.782 (0.689, 0.865) | 0.943 (0.894, 0.979)# | 0.882 (0.809, 0.954) | 0.869 (0.789, 0.931) |
Validation cohort | 0.686 (0.472, 0.873) | 0.649 (0.437, 0.833) | 0.773 (0.634, 0.890)# | 0.824 (0.686, 0.951) | 0.815 (0.686, 0.927) |
Test cohort | 0.692 (0.630, 0.750)* | 0.759 (0.702, 0.809) | 0.922 (0.886, 0.950)# | 0.897 (0.863, 0.932) | 0.809 (0.762, 0.868) |
Stepwise application of aMAP and aMAP-CT
Considering cost-effectiveness, we adopted a stepwise approach using the aMAP score and the aMAP-CT score (aMAP → aMAP-CT) (Fig. 4). This approach was designed to achieve two key objectives: (1) to further refine the identification of super high-risk patients for more intensive monitoring, and (2) to exclude low-risk individuals who only require routine screening. Specifically, the aMAP-CT model stratified the medium- and high-risk groups initially identified by the aMAP score, pinpointing a subset of individuals at super high risk for HCC.
Figure 4A illustrates the reclassification of patients using the stepwise approach, emphasizing the additional value provided by the aMAP-CT model in enhancing risk stratification. Figure 4B depicts the cumulative incidence of HCC in the reclassified groups. Notably, the stepwise application enriched 169 individuals, accounting for only 7% of the cohort, into the super high-risk group, who exhibited a significantly higher three-year HCC incidence of 27.2%, compared to 1.8% in the low-risk group (p < 0.0001).
Discussion
In this nationwide, multicenter study, we developed and externally validated the aMAP-CT model for HCC risk prediction by integrating liver and spleen CT image signatures with the aMAP model, using data from 2,411 cirrhosis patients across 11 centers in mainland China. Adding both liver and spleen image signatures enhanced robustness and patient stratification. The stepwise application of the aMAP and aMAP-CT scores improved cost-effectiveness by enriching a more targeted population at higher risk for intensive HCC surveillance. To our knowledge, this is the first HCC risk model to incorporate liver and spleen CT image signatures, thus supporting more precise screening strategies.
As our outcome of interest is HCC development among cirrhotic patients, the aMAP-CT model is designed to capture features truly predictive of HCC occurrence rather than merely reflecting cirrhosis severity. The aMAP-CT model achieved an AUC of 0.809–0.869, outperforming the aMAP score. It serves as an alternative to the cfDNA-dependent aMAP-2 Plus, which faces cost and availability limitations.9,10 All model training and tuning relied exclusively on heterogeneous data from multiple centers, strengthening generalizability and minimizing overfitting. Interestingly, the test set outperformed both the training and validation sets, likely due to differences in data distribution rather than data leakage. The training set, derived from multiple centers, enhances generalizability but may introduce noise or spurious patterns, while the more homogeneous test set from one center exhibited less noise and a more balanced distribution, contributing to better performance.
Although CT is not routinely used as a screening tool for cirrhotic patients, they often undergo CT scans for various clinical reasons. Clinical guidelines recognize CT as a superior modality for evaluating liver size, cirrhosis progression, and screening high-risk patients for HCC, particularly those with virus-related cirrhosis.21 Research has also shown that CT provides critical information for assessing complications such as portal vein thrombosis and evaluating the risk of upper gastrointestinal bleeding and liver venous pressure gradients non-invasively.22–24 Our study demonstrates that a single CT scan can accurately assess HCC risk, making it a more practical and cost-effective alternative to aMAP-2 Plus.
In addition, CT imaging offers several other advantages when combined with AI, which improves the detection of microscopic lesions. More importantly, AI can recognize subtle anomalies, insights imperceptible to humans, allowing for the prediction of disease progression or treatment response.25,26 The ALARM model, developed by our team using similar CT-based techniques, accurately predicts HCC onset three to twelve months in advance, confirming the superiority of integrating AI and medical imaging through unique pattern recognition capabilities.27 This synergy of AI and CT enables personalized treatment planning, accelerates radiologic workflows, and improves intervention timing.
Beyond liver image signatures, our study also incorporated spleen information, an often underappreciated but critical factor in hepatocarcinogenesis. Beyond functioning as a reservoir of immune cells, alterations in the splenic immune microenvironment, driven by chronic inflammation and portal hypertension, can contribute to tumor-promoting systemic immunosuppression.28–30 Both clinical and experimental evidence suggest that spleen-derived immune modulation influences HCC progression and therapeutic responsiveness.31–37 AI-extracted spleen features can predict HCC development and have been associated with late recurrence after curative-intent resection in patients with HCC and cirrhosis.38,39 In our study, overfitting observed in the model with only aMAP and liver signatures (aMAP + liver model) (AUC 0.853 in the training cohort vs. 0.691 in the validation cohort) was mitigated by including spleen signatures (AUC 0.869 in the training cohort vs. 0.809 in the validation cohort) (Fig. 1). Time-to-event risk analysis further confirmed that the aMAP-CT model more effectively identifies high-risk patients, highlighting the enhanced robustness and predictive power achieved through integrating spleen signatures.
The radiomic features selected in this study demonstrated potential associations with key biological processes of HCC, thereby enhancing the model’s pathophysiological interpretability. For instance, liver features such as Coarseness and Gray Level Non-Uniformity reflect increased parenchymal texture heterogeneity and spatial heterogeneity of intrahepatic angiogenesis. Among the selected spleen features, Skewness represents asymmetry in the gray-level distribution, suggesting tissue remodeling potentially caused by chronic portal hypertension. Although no existing literature directly links the radiomic features identified in our study to inflammation or portal hypertension, prior studies have indirectly demonstrated associations between CT-based spleen/liver texture features and these pathological processes.22,40 While splenic volume is a recognized prognostic factor in HCC,39 it was not retained in our LASSO-selected feature set. Additional analysis including splenic volume (Supplementary Table 8) revealed that its inclusion did not significantly enhance model performance, suggesting that volume-related information may have been implicitly captured by deep learning features.
Considering cost-effectiveness, we applied a stepwise analysis (aMAP → aMAP-CT) to stratify cirrhosis patients into two groups. This strategy enriched the high-risk group to only 7.0% of the overall cirrhosis population, with an increased annual HCC incidence of 13.2%. In contrast, the low-risk group constituted 93.0% of the overall cirrhosis population, exhibiting an annual HCC incidence of just 0.8%. Patients classified as low-risk can continue routine surveillance with US and alpha-fetoprotein testing every six months, whereas super–high-risk individuals require intensified surveillance strategies. The ideal monitoring interval (e.g., every three months), alternating use of contrast-enhanced MRI and CT, and the best combination of serum biomarkers remain to be determined through well-designed prospective randomized controlled trials. This approach optimizes resource allocation by focusing intensive surveillance and care on the small high-risk subset while minimizing unnecessary interventions for low-risk patients. At the same time, it is important to note that HCC risk evolves dynamically with disease progression, underscoring the need to monitor low-risk patients who may progress over time. To address this, it is essential to adhere to standard monitoring protocols for low-risk populations, integrate dynamic mechanisms to update risk factors, and develop short-term warning models to complement long-term predictions.
The key strength and innovation of our study lie in integrating liver and spleen CT image signatures to introduce a novel visual capability to traditional models, substantially enhancing their performance and unlocking new potential for early detection and timely intervention. However, it is important to acknowledge its limitations. First, this study focused on cirrhosis patients within the Chinese population, primarily those with HBV-related cirrhosis due to the national epidemiological profile. Therefore, further validation in non-HBV dominant populations, such as those with MASLD or HCV, is an important direction for future research. Second, despite including patients from 11 different institutions to assess reproducibility, as a retrospective study, potential selection bias was unavoidable. Integrating prospective studies will be crucial to verify the model’s performance in the future.
Third, while variability in acquisition parameters across different CT scanners may have contributed to the generalizability of our model, scanner-induced variability in radiomic features remains a limitation. Although standardized preprocessing and resampling steps were applied, more advanced harmonization methods, such as ComBat, have been shown to effectively reduce scanner-related bias in a multicenter radiomics study.41 Future research will explore the implementation and comparison of such harmonization frameworks to further minimize variability. While the aMAP-CT model is built on established radiomics and deep learning techniques, it lacks significant algorithmic innovations. Future work will focus on integrating advanced algorithms to enhance feature extraction, address overfitting, and improve predictive accuracy to ensure broader applicability in diverse populations. Looking ahead, developing software or online tools that integrate radiomics and deep learning for broader population analysis will also be necessary.
Conclusions
Incorporating liver and spleen image signatures into the aMAP score using AI techniques offers a more accessible and superior approach for individualized HCC risk prediction in cirrhosis patients. The stepwise application of the aMAP and aMAP-CT scores enhances enrichment strategies, effectively identifying 7% of cirrhosis patients at very high risk for HCC. This method provides a powerful tool for guiding individualized HCC surveillance, potentially improving early detection and patient outcomes.
Supporting information
Supplementary File 1
Supplementary Methods.
(DOCX)
Supplementary Table 1
Details regarding patients included from PreCar cohort. 1157 patients from multiple centers (center 01-10) and 1254 patients from Nanfang hospital (center 11) were included.
(DOCX)
Supplementary Table 2
Detailed information about CT acquisition in each center.
(DOCX)
Supplementary Table 3
Details regarding radiomics features extraction based on the Image Biomarker Standardization Initiative (IBSI) guidelines.
(DOCX)
Supplementary Table 4
Image features selected by least absolute shrinkage and selection operator (LASSO) were integrated using logistic regression.
(DOCX)
Supplementary Table 5
Comparison of the aMAP-CT, model with aMAP and liver signatures (aMAP+liver), model with aMAP and spleen signatures (aMAP+spleen), and aMAP model in HCC risk stratification.
Abbreviations: HCC, hepatocellular carcinoma.
(DOCX)
Supplementary Table 6
Performance of the aMAP-CT model and related subgroup analysis in the training, validation, and test cohorts (with 95% confidence interval).
(DOCX)
Supplementary Table 7
Comparison of the sensitivity of the aMAP-CT model with other existing HCC risk scores in predicting HCC development among each cohort.
(DOCX)
Supplementary Table 8
Predictive performance evaluation of the model with the additional VoxelVolume variable.
(DOCX)
Supplementary Fig. 1
The structure of fine-tuned ResNet-18 model.
(TIF)
Supplementary Fig. 2
Flow diagram shows the pathway of patient exclusion from PreCar cohort. Dataset split is also reflected in the diagram.
Note: images with obvious artifacts were exclude as images of poor quality; incomplete images mean some slices of CT were missing for technical reasons. Abbreviations: HCC, hepatocellular carcinoma.
(TIF)
Supplementary Fig. 3
Cumulative incidence of HCC in the overall cohort (A), training cohort (B), validation cohort (C), and test cohort (D). Abbreviations: HCC, hepatocellular carcinoma.
(TIF)
Supplementary Fig. 4
Slices of original images and the segmentations of a sample case. Note: original images showed above were post-preprocessed.
(TIF)
Supplementary Fig. 5
Feature selection using LASSO regression model.
(A) Tuning parameter (λ) selection in the LASSO model used 3-fold cross-validation via minimum criteria. The mean square error (MSE) curve was plotted versus λ. Dotted vertical lines were drawn at the optimal values by using the minimum criteria and 1 standard error of the minimum criteria (the 1-SE criteria). A λ value of 0.012 was chosen (1-SE criteria) according to 3-fold cross-validation. (B) LASSO coefficient profiles of the liver and spleen features. A coefficient profile plot was produced against the λ sequence. A vertical line was drawn at the value selected using 3-fold cross-validation, where optimal λ resulted in 9 nonzero coefficients.
(TIF)
Supplementary Fig. 6
Calibration curves of aMAP-CT score in the training cohort, validation cohort, and test cohort.
(TIF)
Supplementary Fig. 7
Time-to-event risk analysis of HCC incidence of high and low risk groups classified by aMAP score in the training (A), validation (B), and test (C) cohorts.
Abbreviations: HCC, hepatocellular carcinoma.
(TIF)
Supplementary Fig. 8
Time-to-event risk analysis of HCC incidence of high and low risk groups classified by model with aMAP score and liver image signatures (aMAP+liver) in the training (A), validation (B), and test (C) cohorts.
Abbreviations: HCC, hepatocellular carcinoma.
(TIF)
Supplementary Fig. 9
Time-to-event risk analysis of HCC incidence of high and low risk groups classified by model with aMAP score and spleen image signatures (aMAP+spleen) in the training (A), validation (B), and test (C) cohorts.
Abbreviations: HCC, hepatocellular carcinoma.
(TIF)
Supplementary Fig. 10
Decision curve analysis (DCA) of aMAP and the combined model for predicting 1-year HCC occurrence in the training (A), validation (B), and test (C) cohorts.
The y-axis measures the net benefit.
(TIF)
Declarations
Acknowledgement
The authors thank the study investigators of the PreCar cohort, as well as the coordinators, patients, and their families for their contributions. Additionally, the authors appreciate the important support provided by the Department of Medical Imaging Center at Nanfang Hospital in the organ segmentation process.
Ethical statement
The study was approved by the Ethics Committee of Nanfang Hospital (approval Number: NFEC-2018-101) and was conducted in accordance with the guidelines of the Declaration of Helsinki (as revised in 2024). Patient informed consent was waived given the retrospective design, and all data were de-identified.
Data sharing statement
The data, Python, and R code used in this study are available from the corresponding authors upon reasonable request.
Funding
This work was supported by the National Key Research and Development Program of China (2022YFC2304800, 2022YFC2303600, 2023YFC2507500), the National Natural Science Foundation of China (82170610, 92359304), and the Guangdong Basic and Applied Basic Research Foundation (2023A1515011211).
Conflict of interest
JLH has received consulting fees from AbbVie, Arbutus, Bristol Myers Squibb, Gilead Sciences, Johnson &Johnson, and Roche, and has received grants from Bristol Myers Squibb, Berry, and Johnson &Johnson, and has been an Executive Associate Editor of Journal of Clinical and Translational Hepatology since 2013. The other authors declare no conflicts of interest related to this work.
Authors’ contributions
Study concept and design (RF, JLH), coordination of the study (JLH, HYW, JS, RF, LC), data acquisition (CXW, YSQ, YHG, CYW, XTF, XLL, HLB, DZ, GQJ, YLY, XEL, JJC, LTD, HDY, YJG, WH, JFL, MFL, FK, JS, SHJ), statistical analysis, data interpretation, underlying data verification (RF, JLH, YRS), drafting of the manuscript (RF, YRS), and review of the manuscript (RF, LC, WFX, SHJ, HYW, JLH). All authors approved the final manuscript and had final responsibility for the decision to submit for publication.