Chinese Medical E-ournals Database

Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition) ›› 2023, Vol. 19 ›› Issue (04): 446 -454. doi: 10.3877/cma.j.issn.1673-5250.2023.04.010

Original Article

Identification of hub genes associated with bronchopulmonary dysplasia in preterm infants based on machine learning

Zhengyun Hu, Jianwei Shi, Jianwei Shen, Bing Wang, Chunmiao Jiang, Chong Liu()   

  1. Department of Pediatrics, Shanghai Songjiang District Central Hospital, Shanghai 201600, China
    Department of Neurosurgery, the Affiliated Brain Hospital of Nanjing Medical University, Nanjing 210024, Jiangsu Province, China
  • Received:2022-12-18 Revised:2023-05-10 Published:2023-08-01
  • Corresponding author: Chong Liu
  • Supported by:
    National Natural Science Foundation of China(82270068)
Objective

This study aims to apply gene-based machine learning to screen and identify key genes (hub genes) related to bronchopulmonary dysplasia (BPD) in preterm infants, providing theoretical insights into the pathogenesis of BPD.

Methods

We obtained gene microarray dataset GSE32472, including 68 BPD preterm infants (research group) and 43 non-BPD preterm infants born during the same period (control group) from the Gene Expression Omnibus (GEO) database. Using Weighted Gene Co-expression Network Analysis (WGCNA), we screened the hub gene set from color modules. Through least absolute shrinkage and selection operator(LASSO) regression analysis based on penalty values (λ values), we calculated coefficients for each gene and selected candidate hub genes for BPD in preterm infants. Random forest analysis results were used to screen the top 10 hub genes in the color module of BPD in preterm infants. After taking the intersection of the genes selected through LASSO regression and random forest analysis, we identified 6 hub genes for BPD.

Results

① Through the " WGCNA" package in R software (version 4.1.3), we achieved a scale-free topology fit index slightly above 0.9, setting the soft-thresholding power to 18 for scale independence. This allowed us to identify and cluster hub gene modules of BPD, yielding 11 characteristic hub gene color modules. Further analysis revealed that genes in the yellow module showed significant correlation with the incidence of BPD in preterm infants. ②Using the " glmnet" package in R software, we performed LASSO regression analysis on 189 genes in the yellow module, achieving L1 regularization parameter estimation and variable selection. Most of the model parameter regression coefficients tended to zero, effectively avoiding overfitting of the training data. We employed ten-fold cross-validation to validate the model of the dataset of candidate hub genes for BPD. The prediction error of the model was minimized when it included 41 candidate hub genes, corresponding to a λ value of 0.0114. ③With the " randomForest" package in R software, our random forest analysis of the 189 genes in the yellow module showed that ten candidate genes (SPON1, TMEM204, CD28, ICOS, LOC100996619, NOL9, GCSAM, UBASH3A, CCNI2, AQP3) had an importance score of over 1.0 in the analysis of candidate hub genes for BPD, significantly surpassing other candidate genes. ④In the GSE32472 gene microarray dataset, six hub genes for BPD (SPON1, TMEM204, CD28, ICOS, LOC100996619, NOL9) were identified in the yellow module related to BPD, following LASSO regression analysis and random forest analysis.

Conclusions

By constructing a co-expression hub gene regulatory network for BPD and selecting six BPD-related hub genes based on machine learning algorithms, we have laid a theoretical foundation for exploring the pathogenesis of BPD and potential treatment targets.

图1 尺度自由度曲线的软阈值筛选(图1A:尺度自由度曲线的软阈值的无标度拟合指数分析;图1B:尺度自由度曲线的不同软阈值平均连通性分析)注:soft threshold (power)为软阈值功率。scale free topology model fit, signed R2为无标度拓扑模型拟合指数,scale independence为尺度独立性,mean connectivity为平均连通度
图2 GSE32472数据集样本的动态树对离群值的去除(每1条线代表1个样本,GSM开头的一串数字为每个样本的编号,通过聚类分析的方法寻找,并且删除离群样本)注:height为高度
图3 GSE32472数据集中2组BPD早产儿的血清标本hub基因的颜色模块聚类注:height为高度,ME为模块特征基因,BPD为支气管肺发育不良
图4 根据所选mRNA中拓扑重叠的不同,直接创建共表达WGCNA模块的聚类基因树状图(不同的颜色代表不同hub基因模块)注:height为高度,dynamic tree cut为动态树切割,WGCNA为加权基因共表达网络分析
图5 GSE32472数据集中2组早产儿BPD的共表达WGCNA模块与临床性状的关系(每行代表1个颜色模块,每列表示临床性状,单元格包含相关系数及P值。红色表示候选hub基因与BPD发生呈正相关关系,蓝色呈负相关关系,颜色越深说明相关性越大)注:WGCNA为加权基因共表达网络分析,ME为模块特征基因,BPD为支气管肺发育不良
图6 yellow模块hub基因回归系数变化图注:coefficients为系数估计值,fraction deviance explained为模型解释的响应变量的方差比例
图7 交叉验证参数惩罚系数λ的选择图注:binomial deviance为二项式离差(1种衡量二项式响应变量拟合度的统计量)
图8 不同回归树的随机森林回归误差趋势注:error为交叉验证的误差,trees为变量选择时建立的回归树
图9 早产儿BPD的hub候选基因在随机森林中的变量重要性评分注:IncNodePurity为节点纯度增益。BPD为支气管肺发育不良
图10 LASSO回归与随机森林分析相结合后的交集基因韦恩图注:LASSO为最小绝对收缩和选择算子
[1]
Principi N, Di Pietro GM, Esposito S. Bronchopulmonary dysplasia: clinical aspects and preventive and therapeutic strategies [J]. J Transl Med, 2018, 16(1): 36. DOI: 10.1186/s12967-018-1417-7.
[2]
Holzfurtner L, Shahzad T, Dong Y, et al. When inflammation meets lung development-an update on the pathogenesis of bronchopulmonary dysplasia [J]. Mol Cell Pediatr, 2022, 9(1): 7. DOI: 10.1186/s40348-022-00137-z.
[3]
Wang SH, Tsao PN. Phenotypes of bronchopulmonary dysplasia [J]. Int J Mol Sci, 2020, 21(17): 6112. DOI: 10.3390/ijms21176112.
[4]
Stoecklin B, Simpson SJ, Pillow JJ. Bronchopulmonary dysplasia: rationale for a pathophysiological rather than treatment based approach to diagnosis [J]. Paediatr Respir Rev, 2019, 32: 91-97. DOI: 10.1016/j.prrv.2018.12.002.
[5]
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis [J]. BMC Bioinformatics, 2008, 9: 559. DOI: 10.1186/1471-2105-9-559.
[6]
Hao ML, Zuo XQ, Qiu Y, et al. WGCNA identification of genes and pathways involved in the pathogenesis of postmenopausal osteoporosis [J]. Int J Gen Med, 2021, 14: 8341-8353. DOI: 10.2147/IJGM.S336310.
[7]
Ren ZH, Shang GP, Wu K, et al. WGCNA co-expression network analysis reveals ILF3-AS1 functions as a CeRNA to regulate PTBP1 expression by sponging miR-29a in gastric cancer [J]. Front Genet, 2020, 11: 39. DOI: 10.3389/fgene.2020.00039.
[8]
Hasankhani A, Bahrami A, Sheybani N, et al. Differential co-expression network analysis reveals key hub-high traffic genes as potential rherapeutic targets for COVID-19 pandemic [J]. Front Immunol, 2021, 12: 789317. DOI: 10.3389/fimmu.2021.789317.
[9]
Sun C, Zhu B, Zhu S, et al. Risk factors analysis of bone mineral density based on Lasso and quantile regression in America during 2015-2018 [J]. Int J Environ Res Public Health, 2021, 19(1): 355. DOI: 10.3390/ijerph19010355.
[10]
汪家清,韦哲,张太鹏,等. 基于随机森林算法的乳腺癌预测模型的研究[J].中国医学装备2022, 19(1): 119-123. DOI: 10.3969/J.ISSN.1672-8270.2022.01.028.
[11]
Mullah M, Hanley JA, Benedetti A. LASSO type penalized spline regression for binary data [J]. BMC Med Res Methodol, 2021, 21(1): 83. DOI: 10.1186/s12874-021-01234-9.
[12]
Weinhold L, Schmid M, Mitchell R, et al. A random forest approach for bounded outcome variables [J]. J Comput Graph Stat, 2020, 29(3): 639-658. DOI: 10.1080/10618600.2019.1705310.
[13]
Deng X, Bao Z, Yang X, et al. Molecular mechanisms of cell death in bronchopulmonary dysplasia [J]. Apoptosis, 2023, 28(1-2): 39-54. DOI: 10.1007/s10495-022-01791-4.
[14]
Gilfillan M, Bhandari A, Bhandari V. Diagnosis and management of bronchopulmonary dysplasia [J]. BMJ, 2021, 375: n1974. DOI: 10.1136/bmj.n1974.
[15]
Carraro S, Giordano G, Pirillo P, et al. Airway metabolic anomalies in adolescents with bronchopulmonary dysplasia: new insights from the metabolomic approach [J]. J Pediatr, 2015, 166(2): 234-239.e1. DOI: 10.1016/j.jpeds.2014.08.049.
[16]
Mowitz ME, Gao W, Sipsma H, et al. Long-term burden of respiratory complications associated with extreme prematurity: an analysis of US Medicaid claims [J]. Pediatr Neonatol, 2022, 63(5): 503-511. DOI: 10.1016/j.pedneo.2022.05.007.
[17]
郝琦蓉,任艺婷,胡晶晶,等. 基于TCGA数据库的子宫内膜癌差异基因患者的预后预测模型构建 [J/OL]. 中华妇幼临床医学杂志(电子版), 2021, 17(2): 181-189. DOI: 10.3877/cma.j.issn.1673-5250.2021.02.009.
[18]
赵迎利,王凯明,肖玉柱,等. 基于l(1,2)惩罚典型相关分析的特征选择[J]. 计算机应用与软件2019, 36(10): 279-284. DOI: 10.3969/j.issn.1000-386x.2019.10.048.
[19]
李贞子,张涛,武晓岩,等. 随机森林回归分析及在代谢调控关系研究中的应用[J].中国卫生统计201229(2): 158-163;158-160, 163. DOI: 10.3969/j.issn.1002-3674.2012.02.001.
[20]
Zhang H, Xu P, Song Y. Machine-learning-based m5C score for the prognosis diagnosis of osteosarcoma [J]. J Oncol, 2021, 2021: 1629318. DOI: 10.1155/2021/1629318.
[21]
Mao Y, Hu Z, Xu X, et al. Identification of a prognostic model based on costimulatory molecule-related subtypes and characterization of tumor microenvironment infiltration in acute myeloid leukemia[J]. Front Genet, 202213:973319. DOI: 10.3389/fgene.2022.973319.
[22]
Li M, Chen H, Yin P, et al. Identification and clinical validation of key extracellular proteins as the potential biomarkers in relapsing-remitting multiple sclerosis [J]. Front Immunol, 2021, 12: 753929. DOI: 10.3389/fimmu.2021.753929.
[23]
Zhang B, Hu X, Li Y, et al. Identification of methylation markers for diagnosis of autism spectrum disorder [J]. Metab Brain Dis, 2022, 37(1): 219-228. DOI: 10.1007/s11011-021-00805-5.
[24]
Zhang X, Zhang S, Yan X, et al. m6A regulator-mediated RNA methylation modification patterns are involved in immune microenvironment regulation of periodontitis [J]. J Cell Mol Med, 2021, 25(7): 3634-3645. DOI: 10.1111/jcmm.16469.
[25]
Gao J, Lu F, Yan J, et al. The role of radiotherapy-related autophagy genes in the prognosis and immune infiltration in lung adenocarcinoma [J]. Front Immunol, 2022, 13: 992626. DOI: 10.3389/fimmu.2022.992626.
[26]
Yang C, Han Z, Zhan W, et al. Predictive investigation of idiopathic pulmonary fibrosis subtypes based on cellular senescence-related genes for disease treatment and management [J]. Front Genet, 2023, 14: 1157258. DOI: 10.3389/fgene.2023.1157258.
[27]
Li Z, Xiao Y, Xu L, et al. Mining the key genes for ventilator-induced lung injury using co-expression network analysis [J]. Biosci Rep, 2021, 41(3): BSR20203235. DOI: 10.1042/BSR20203235.
[28]
Zhang Y, Kong X, Zhang J, et al. Functional analysis of bronchopulmonary dysplasia-related neuropeptides in preterm infants and miRNA-based diagnostic model construction [J]. Comput Math Methods Med, 2022, 2022: 5682599. DOI: 10.1155/2022/5682599.
[29]
Cai Y, Ma F, Qu L, et al. Weighted gene co-expression network analysis of key biomarkers associated with bronchopulmonary dysplasia [J]. Front Genet, 2020, 11: 539292. DOI: 10.3389/fgene.2020.539292.
[30]
Kearsey J, Petit S, De Oliveira C, et al. A novel four transmembrane spanning protein, CLP24. A hypoxically regulated cell junction protein [J]. Eur J Biochem, 2004, 271(13): 2584-2592. DOI: 10.1111/j.1432-1033.2004.04186.x.
[31]
Saharinen P, Helotera H, Miettinen J, et al. Claudin-like protein 24 interacts with the VEGFR-2 and VEGFR-3 pathways and regulates lymphatic vessel development [J]. Genes Dev, 2010, 24(9): 875-880. DOI: 10.1101/gad.565010.
[32]
Secker GA, Harvey NL. VEGFR signaling during lymphatic vascular development: from progenitor cells to functional vessels [J]. Dev Dyn, 2015, 244(3): 323-331. DOI: 10.1002/dvdy.24227.
[33]
Hülskötter K, Lühder F, Leitzen E, et al. CD28-signaling can be partially compensated in CD28-knockout mice but is essential for virus elimination in a murine model of multiple sclerosis [J]. Front Immunol, 2023, 14: 1105432. DOI: 10.3389/fimmu.2023.1105432.
[34]
Kumar VHS, Wang H, Nielsen L. Adaptive immune responses are altered in adult mice following neonatal hyperoxia [J]. Physiol Rep, 2018, 6(2): e13577. DOI: 10.14814/phy2.13577.
[35]
Revhaug C, Bik-Multanowski M, Zasada M, et al. Immune system regulation affected by a murine experimental model of bronchopulmonary dysplasia: genomic and epigenetic findings [J]. Neonatology, 2019, 116(3): 269-277. DOI: 10.1159/000501461.
[36]
Riley JL, Mao M, Kobayashi S, et al. Modulation of TCR-induced transcriptional profiles by ligation of CD28, ICOS, and CTLA-4 receptors [J]. Proc Natl Acad Sci U S A, 2002, 99(18): 11790-11795. DOI: 10.1073/pnas.162359999.
[37]
Xiao Z, Mayer AT, Nobashi TW, et al. ICOS is an indicator of T-cell-mediated response to cancer immunotherapy [J]. Cancer Res, 2020, 80(14): 3023-3032. DOI: 10.1158/0008-5472.CAN-19-3265.
[38]
Cai Y, Ma F, Qu L, et al. Weighted gene co-expression network analysis of key biomarkers associated with bronchopulmonary dysplasia [J]. Front Genet, 2020, 11: 539292. DOI: 10.3389/fgene.2020.539292.
[39]
Hrusch CL, Manns ST, Bryazka D, et al. ICOS protects against mortality from acute lung injury through activation of IL-5+ ILC2s [J]. Mucosal Immunol, 2018, 11(1): 61-70. DOI: 10.1038/mi.2017.42.
[40]
Chang H, Dong T, Ma X, et al. Spondin 1 promotes metastatic progression through Fak and Src dependent pathway in human osteosarcoma [J]. Biochem Biophys Res Commun, 2015, 464(1): 45-50. DOI: 10.1016/j.bbrc.2015.05.092.
[41]
Tamjidifar R, Akbari M, Tarzi S, et al. Prognostic and diagnostic values of miR-506 and SPON 1 in colorectal cancer with clinicopathological considerations [J]. J Gastrointest Cancer, 2021, 52(1): 125-129. DOI: 10.1007/s12029-019-00356-0.
[42]
Li H, Li J, Hu Y, et al. FOXO3 regulates Smad3 and Smad7 through SPON1 circular RNA to inhibit idiopathic pulmonary fibrosis [J]. Int J Biol Sci, 2023, 19(10): 3042-3056. DOI: 10.7150/ijbs.80140.
[43]
Heindl K, Martinez J. Nol9 is a novel polynucleotide 5′-kinase involved in ribosomal RNA processing [J]. EMBO J, 2010, 29(24): 4161-4171. DOI: 10.1038/emboj.2010.275.
[44]
Gordon J, Pillon MC, Stanley RE. Nol9 is a spatial regulator for the human ITS2 pre-rRNA endonuclease-kinase complex [J]. J Mol Biol, 2019, 431(19): 3771-3786. DOI: 10.1016/j.jmb.2019.07.007.
[45]
Hu JY, Wang Y, Tong XM, et al. When to consider logistic LASSO regression in multivariate analysis? [J]. Eur J Surg Oncol, 2021, 47(8): 2206. DOI: 10.1016/j.ejso.2021.04.011.
[46]
Li M, Zhu W, Wang C, et al. Weighted gene co-expression network analysis to identify key modules and hub genes associated with paucigranulocytic asthma [J]. BMC Pulm Med, 2021, 21(1): 343. DOI: 10.1186/s12890-021-01711-3.
[47]
Alderden J, Pepper GA, Wilson A, et al. Predicting pressure injury in critical care patients: a machine-learning model [J]. Am J Crit Care, 2018, 27(6): 461-468. DOI: 10.4037/ajcc2018525.
[1] Ping Yang, Shimin Xu, Liangliang Li, Xiangyun Yin, Hongmin Xi, Lili Ma, Xianghong Li. Influencing factors of bronchopulmonary dysplasia complicated with metabolic bone disease in preterm infants[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2023, 19(02): 202-211.
[2] Bing Yuan, Kai Yan. Current research status in treatment of preterm infants with bronchopulmonary dysplasia by human amniotic epithelial stem cells[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2022, 18(06): 640-644.
[3] Jinli Yan, Dapeng Chen. Research progress on probiotics in clinical application of neonatal respiratory diseases[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2022, 18(05): 517-522.
[4] Hongling Fu, Hanmin Liu. Research progress on signaling pathways involved in bronchopulmonary dysplasia and pulmonary hypertension[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2022, 18(05): 497-505.
[5] Yan Liu, Ming Zhao, Hong Jiang, Chen Chen, Xiaoqin Wang, Lei Zhang. Risk factors of bronchopulmonary dysplasia in very preterm infants: a multicenter study[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2022, 18(04): 419-426.
[6] Haiyang Zhang, Hanmin Liu. Current research status on developmental trajectory of lung in premature infants[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2021, 17(04): 373-379.
[7] Sisi Wang, Jinlin Wu. Research progresses of hyperoxia-induced injury of pulmonary vascular endothelial cells in bronchopulmonary dysplasia[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2021, 17(03): 368-372.
[8] Shaodong Hua, Zhenhui Zhou, Shumei Wang, Jia Chen, Yabo Mei, Qiuping Li, Zhichun Feng. Hemolytic anemia due to unrelated umbilical cord blood stem cell transplantation for extreme premature infants bronchopulmonary dysplasia and literature review[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2020, 16(02): 202-208.
[9] Jinhui Li, Dezhi Mu. Hot issues in clinical management of premature infants[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2020, 16(01): 1-7.
[10] Feng Yang, Ling Xie, Qiulan Lin. Risk factors of intrauterine infective pneumonia and bronchopulmonary dysplasia in premature infants[J]. Chinese Journal of Experimental and Clinical Infectious Diseases(Electronic Edition), 2020, 14(04): 326-330.
[11] Qi Hou, Yang Xiang, Nashan Wu, Yue Xiao, Long Xiao, Xiao Li, Rui Wang, Zhongyi Sun. Machine learning prediction of stone-free rate in patients with ureter stone after treatment of extracorporeal shock wave lithotripsy[J]. Chinese Journal of Endourology(Electronic Edition), 2021, 15(04): 280-284.
[12] Qiqi Wang, Mengyuan Gong, Zhengyuan Feng, Liang Han, Zheng Wang, Qingyong Ma, Zheng Wu. Prediction value of Logistic regression model based on preoperative detection indexes in pathological types of periampullary carcinoma[J]. Chinese Journal of Hepatic Surgery(Electronic Edition), 2023, 12(02): 196-200.
[13] Mingjing Lin, Dong Cai, Wenting Feng, Fangfang Wu, Kaiyan Zhang. Risk factors for bronchopulmonary dysplasia in premature infants born from 2017 to 2019[J]. Chinese Journal of Clinicians(Electronic Edition), 2022, 16(09): 908-913.
[14] Hongjuan Bi, Lijuan Long, Liping Huang, Yisi Huang, Zengshuai Huang, Qiufen Wei. Comparison of clinical features of different degrees of bronchopulmonary dysplasia in very/extremely low birth weight infants[J]. Chinese Journal of Clinicians(Electronic Edition), 2022, 16(01): 66-70.
[15] Shihao Li, Zihao Li, Bo Dong, Chunli Wu, Bin Wu, Yinliang Sheng, Yu Qi. Effects of protein regulator of cytokinesis 1 on migration, invasion and proliferation of lung adenocarcinoma cell[J]. Chinese Journal of Thoracic Surgery(Electronic Edition), 2023, 10(03): 164-175.
Viewed
Full text


Abstract