Background and aims: Current prioritization models for liver transplantation (LT) are hampered by their linear nature, which does not fully capture the severity of patients with extreme analytical values. We aimed to develop and externally validate the Gender-Equity Model for Liver Allocation built on Artificial Intelligence (GEMA-AI) to predict waiting list outcomes in candidates for LT. Method: Cohort study including adult patients who qualified for elective LT in the United Kingdom (2010–2020, model training and internal validation) and in two Australian institutions (1998–2020, external validation). The Gender-Equity Model for Liver Allocation corrected by serum sodium (GEMA-Na) was compared with GEMA-AI, which was built on a shallow artificial neural network optimized by neuroevolution and hybridization using the same input variables. The primary outcome was mortality or delisting for sickness within the first 90 days. Discrimination was assessed by Harrell’s c-statistic (Hc). This study was funded by the Instituto de Salud Carlos III (Project no. PI22/00312) and co-funded by the European Union. Results: The study population comprised 9, 320 patients: training cohort n = 5,762, internal validation cohort n = 1, 920, and external validation cohort n = 1, 638. The prevalence of 90-days mortality or delisting for sickness ranged from 5.3% to 6% in the different cohorts. The transition from a linear to a non-linear score (from GEMA-Na to GEMA-AI) resulted in improved discrimination in the internal and external validation cohorts (Hc = 0.766 vs Hc = 0.781; p = 0.035 and Hc = 0.774 vs Hc = 0.793; p = 0.003, respectively), being these differences more pronounced in women (Hc = 0.802 vs Hc = 0.826; p = 0.048 and Hc = 0.796 vs Hc = 0.836; p = 0.014, respectively). Among 1,403 patients (39.4% of the merged validation cohorts) who showed at least one extreme analytical value, GEMA-AI had Hc = 0.823 compared to Hc = 0.797 ( p = 0.036). In this subpopulation, GEMA-AI showed a good calibration (chi-square = 5.04; p = 0.66) whereas GEMA-Na did not (chi-square = 18.94; p = 0.015). A meaningful change ≥2 score prioritization points occurred in 27.8% of patients (11.4% upgraded, 16.4% downgraded). Differential prioritization would occur in 6.4% of the available organs within the first 90 days and would save one in 59 deaths overall, and one in 13 deaths among women. Conclusion: The use of non-linear explainable machine learning models may improve predictions of waiting list outcomes, particularly in the sickest patients showing extreme analytical values. Their use should be preferred over Cox’s regression-based models.