切换至 "中华医学电子期刊资源库"

中华妇幼临床医学杂志(电子版) ›› 2023, Vol. 19 ›› Issue (04) : 446 -454. doi: 10.3877/cma.j.issn.1673-5250.2023.04.010

论著

基于机器学习鉴定早产儿支气管肺发育不良的关键基因
胡诤贇, 史建伟, 申建伟, 王冰, 蒋春苗, 刘冲()   
  1. 上海市松江区中心医院儿科,上海 201600
    南京医科大学附属脑科医院神经外科,南京 210024
  • 收稿日期:2022-12-18 修回日期:2023-05-10 出版日期:2023-08-01
  • 通信作者: 刘冲

Identification of hub genes associated with bronchopulmonary dysplasia in preterm infants based on machine learning

Zhengyun Hu, Jianwei Shi, Jianwei Shen, Bing Wang, Chunmiao Jiang, Chong Liu()   

  1. Department of Pediatrics, Shanghai Songjiang District Central Hospital, Shanghai 201600, China
    Department of Neurosurgery, the Affiliated Brain Hospital of Nanjing Medical University, Nanjing 210024, Jiangsu Province, China
  • Received:2022-12-18 Revised:2023-05-10 Published:2023-08-01
  • Corresponding author: Chong Liu
  • Supported by:
    National Natural Science Foundation of China(82270068)
引用本文:

胡诤贇, 史建伟, 申建伟, 王冰, 蒋春苗, 刘冲. 基于机器学习鉴定早产儿支气管肺发育不良的关键基因[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(04): 446-454.

Zhengyun Hu, Jianwei Shi, Jianwei Shen, Bing Wang, Chunmiao Jiang, Chong Liu. Identification of hub genes associated with bronchopulmonary dysplasia in preterm infants based on machine learning[J]. Chinese Journal of Obstetrics & Gynecology and Pediatrics(Electronic Edition), 2023, 19(04): 446-454.

目的

探讨基因于机器学习对早产儿支气管肺发育不良(BPD)关键基因(hub基因)筛选和鉴定,为揭示早产儿BPD发病机制提供理论依据。

方法

从基因表达综合(GEO)数据库,获取68例BPD早产儿(研究组)和43例同期出生的非BPD早产儿(对照组)的血清标本数据集GSE32472基因微阵列,利用加权基因共表达网络分析(WGCNA),筛选颜色模块hub基因集。通过最小绝对收缩和选择算子(LASSO)回归分析依据惩罚值(λ值),计算每个基因系数,并筛选早产儿BPD的候选hub基因。通过随机森林分析结果,筛选早产儿BPD的hub基因颜色模块中排名前10的hub基因。经LASSO回归模型和随机森林分析结果所筛选基因取交集后,获得BPD的6个hub基因。

结果

①通过R软件4.1.3的"WGCNA"程序包,得到无标度拓扑拟合指数略高于0.9,设定尺度自由度曲线软阈值为18,识别并聚类BPD的hub基因模块,得出11种特征hub基因颜色模块。进一步分析发现,yellow模块的基因与早产儿BPD发生显著相关性。②通过R软件的"glmnet"程序包,对yellow模块的189个基因进行LASSO回归分析,实现L1正则化项参数估计和变量筛选。模型参数的大部分回归系数趋于0,可有效避免训练数据过拟合。使用十折交叉验证进行数据集的模型验证BPD的候选hub基因,当模型中包含41个候选hub基因时,模型的预测误差达到最小,对应的λ值为0.011 4。③利用R软件的"randomForest"程序包,对yellow模块的189个基因进行随机森林分析的结果显示,SPON1、TMEM204、CD28、ICOS、LOC100996619、NOL9、GCSAM、UBASH3A、CCNI2、AQP3这10个候选基因在分析BPD的候选hub基因中的重要性评分>1.0,明显超过其他候选基因。④数据集GSE32472基因微阵列中,与BPD的yellow模块hub基因,经LASSO回归分析和随机森林分析后,最终得出BPD的6个hub基因为:SPON1、TMEM204、CD28、ICOS、LOC100996619、NOL9

结论

通过构建BPD共表达hub基因调控网络,并依据机器学习算法筛选出6个BPD相关hub基因,这为探索BPD的发病机制和潜在的治疗靶点奠定了理论基础。

Objective

This study aims to apply gene-based machine learning to screen and identify key genes (hub genes) related to bronchopulmonary dysplasia (BPD) in preterm infants, providing theoretical insights into the pathogenesis of BPD.

Methods

We obtained gene microarray dataset GSE32472, including 68 BPD preterm infants (research group) and 43 non-BPD preterm infants born during the same period (control group) from the Gene Expression Omnibus (GEO) database. Using Weighted Gene Co-expression Network Analysis (WGCNA), we screened the hub gene set from color modules. Through least absolute shrinkage and selection operator(LASSO) regression analysis based on penalty values (λ values), we calculated coefficients for each gene and selected candidate hub genes for BPD in preterm infants. Random forest analysis results were used to screen the top 10 hub genes in the color module of BPD in preterm infants. After taking the intersection of the genes selected through LASSO regression and random forest analysis, we identified 6 hub genes for BPD.

Results

① Through the " WGCNA" package in R software (version 4.1.3), we achieved a scale-free topology fit index slightly above 0.9, setting the soft-thresholding power to 18 for scale independence. This allowed us to identify and cluster hub gene modules of BPD, yielding 11 characteristic hub gene color modules. Further analysis revealed that genes in the yellow module showed significant correlation with the incidence of BPD in preterm infants. ②Using the " glmnet" package in R software, we performed LASSO regression analysis on 189 genes in the yellow module, achieving L1 regularization parameter estimation and variable selection. Most of the model parameter regression coefficients tended to zero, effectively avoiding overfitting of the training data. We employed ten-fold cross-validation to validate the model of the dataset of candidate hub genes for BPD. The prediction error of the model was minimized when it included 41 candidate hub genes, corresponding to a λ value of 0.0114. ③With the " randomForest" package in R software, our random forest analysis of the 189 genes in the yellow module showed that ten candidate genes (SPON1, TMEM204, CD28, ICOS, LOC100996619, NOL9, GCSAM, UBASH3A, CCNI2, AQP3) had an importance score of over 1.0 in the analysis of candidate hub genes for BPD, significantly surpassing other candidate genes. ④In the GSE32472 gene microarray dataset, six hub genes for BPD (SPON1, TMEM204, CD28, ICOS, LOC100996619, NOL9) were identified in the yellow module related to BPD, following LASSO regression analysis and random forest analysis.

Conclusions

By constructing a co-expression hub gene regulatory network for BPD and selecting six BPD-related hub genes based on machine learning algorithms, we have laid a theoretical foundation for exploring the pathogenesis of BPD and potential treatment targets.

图1 尺度自由度曲线的软阈值筛选(图1A:尺度自由度曲线的软阈值的无标度拟合指数分析;图1B:尺度自由度曲线的不同软阈值平均连通性分析)注:soft threshold (power)为软阈值功率。scale free topology model fit, signed R2为无标度拓扑模型拟合指数,scale independence为尺度独立性,mean connectivity为平均连通度
图2 GSE32472数据集样本的动态树对离群值的去除(每1条线代表1个样本,GSM开头的一串数字为每个样本的编号,通过聚类分析的方法寻找,并且删除离群样本)注:height为高度
图3 GSE32472数据集中2组BPD早产儿的血清标本hub基因的颜色模块聚类注:height为高度,ME为模块特征基因,BPD为支气管肺发育不良
图4 根据所选mRNA中拓扑重叠的不同,直接创建共表达WGCNA模块的聚类基因树状图(不同的颜色代表不同hub基因模块)注:height为高度,dynamic tree cut为动态树切割,WGCNA为加权基因共表达网络分析
图5 GSE32472数据集中2组早产儿BPD的共表达WGCNA模块与临床性状的关系(每行代表1个颜色模块,每列表示临床性状,单元格包含相关系数及P值。红色表示候选hub基因与BPD发生呈正相关关系,蓝色呈负相关关系,颜色越深说明相关性越大)注:WGCNA为加权基因共表达网络分析,ME为模块特征基因,BPD为支气管肺发育不良
图6 yellow模块hub基因回归系数变化图注:coefficients为系数估计值,fraction deviance explained为模型解释的响应变量的方差比例
图7 交叉验证参数惩罚系数λ的选择图注:binomial deviance为二项式离差(1种衡量二项式响应变量拟合度的统计量)
图8 不同回归树的随机森林回归误差趋势注:error为交叉验证的误差,trees为变量选择时建立的回归树
图9 早产儿BPD的hub候选基因在随机森林中的变量重要性评分注:IncNodePurity为节点纯度增益。BPD为支气管肺发育不良
图10 LASSO回归与随机森林分析相结合后的交集基因韦恩图注:LASSO为最小绝对收缩和选择算子
[1]
Principi N, Di Pietro GM, Esposito S. Bronchopulmonary dysplasia: clinical aspects and preventive and therapeutic strategies [J]. J Transl Med, 2018, 16(1): 36. DOI: 10.1186/s12967-018-1417-7.
[2]
Holzfurtner L, Shahzad T, Dong Y, et al. When inflammation meets lung development-an update on the pathogenesis of bronchopulmonary dysplasia [J]. Mol Cell Pediatr, 2022, 9(1): 7. DOI: 10.1186/s40348-022-00137-z.
[3]
Wang SH, Tsao PN. Phenotypes of bronchopulmonary dysplasia [J]. Int J Mol Sci, 2020, 21(17): 6112. DOI: 10.3390/ijms21176112.
[4]
Stoecklin B, Simpson SJ, Pillow JJ. Bronchopulmonary dysplasia: rationale for a pathophysiological rather than treatment based approach to diagnosis [J]. Paediatr Respir Rev, 2019, 32: 91-97. DOI: 10.1016/j.prrv.2018.12.002.
[5]
Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis [J]. BMC Bioinformatics, 2008, 9: 559. DOI: 10.1186/1471-2105-9-559.
[6]
Hao ML, Zuo XQ, Qiu Y, et al. WGCNA identification of genes and pathways involved in the pathogenesis of postmenopausal osteoporosis [J]. Int J Gen Med, 2021, 14: 8341-8353. DOI: 10.2147/IJGM.S336310.
[7]
Ren ZH, Shang GP, Wu K, et al. WGCNA co-expression network analysis reveals ILF3-AS1 functions as a CeRNA to regulate PTBP1 expression by sponging miR-29a in gastric cancer [J]. Front Genet, 2020, 11: 39. DOI: 10.3389/fgene.2020.00039.
[8]
Hasankhani A, Bahrami A, Sheybani N, et al. Differential co-expression network analysis reveals key hub-high traffic genes as potential rherapeutic targets for COVID-19 pandemic [J]. Front Immunol, 2021, 12: 789317. DOI: 10.3389/fimmu.2021.789317.
[9]
Sun C, Zhu B, Zhu S, et al. Risk factors analysis of bone mineral density based on Lasso and quantile regression in America during 2015-2018 [J]. Int J Environ Res Public Health, 2021, 19(1): 355. DOI: 10.3390/ijerph19010355.
[10]
汪家清,韦哲,张太鹏,等. 基于随机森林算法的乳腺癌预测模型的研究[J].中国医学装备2022, 19(1): 119-123. DOI: 10.3969/J.ISSN.1672-8270.2022.01.028.
[11]
Mullah M, Hanley JA, Benedetti A. LASSO type penalized spline regression for binary data [J]. BMC Med Res Methodol, 2021, 21(1): 83. DOI: 10.1186/s12874-021-01234-9.
[12]
Weinhold L, Schmid M, Mitchell R, et al. A random forest approach for bounded outcome variables [J]. J Comput Graph Stat, 2020, 29(3): 639-658. DOI: 10.1080/10618600.2019.1705310.
[13]
Deng X, Bao Z, Yang X, et al. Molecular mechanisms of cell death in bronchopulmonary dysplasia [J]. Apoptosis, 2023, 28(1-2): 39-54. DOI: 10.1007/s10495-022-01791-4.
[14]
Gilfillan M, Bhandari A, Bhandari V. Diagnosis and management of bronchopulmonary dysplasia [J]. BMJ, 2021, 375: n1974. DOI: 10.1136/bmj.n1974.
[15]
Carraro S, Giordano G, Pirillo P, et al. Airway metabolic anomalies in adolescents with bronchopulmonary dysplasia: new insights from the metabolomic approach [J]. J Pediatr, 2015, 166(2): 234-239.e1. DOI: 10.1016/j.jpeds.2014.08.049.
[16]
Mowitz ME, Gao W, Sipsma H, et al. Long-term burden of respiratory complications associated with extreme prematurity: an analysis of US Medicaid claims [J]. Pediatr Neonatol, 2022, 63(5): 503-511. DOI: 10.1016/j.pedneo.2022.05.007.
[17]
郝琦蓉,任艺婷,胡晶晶,等. 基于TCGA数据库的子宫内膜癌差异基因患者的预后预测模型构建 [J/OL]. 中华妇幼临床医学杂志(电子版), 2021, 17(2): 181-189. DOI: 10.3877/cma.j.issn.1673-5250.2021.02.009.
[18]
赵迎利,王凯明,肖玉柱,等. 基于l(1,2)惩罚典型相关分析的特征选择[J]. 计算机应用与软件2019, 36(10): 279-284. DOI: 10.3969/j.issn.1000-386x.2019.10.048.
[19]
李贞子,张涛,武晓岩,等. 随机森林回归分析及在代谢调控关系研究中的应用[J].中国卫生统计201229(2): 158-163;158-160, 163. DOI: 10.3969/j.issn.1002-3674.2012.02.001.
[20]
Zhang H, Xu P, Song Y. Machine-learning-based m5C score for the prognosis diagnosis of osteosarcoma [J]. J Oncol, 2021, 2021: 1629318. DOI: 10.1155/2021/1629318.
[21]
Mao Y, Hu Z, Xu X, et al. Identification of a prognostic model based on costimulatory molecule-related subtypes and characterization of tumor microenvironment infiltration in acute myeloid leukemia[J]. Front Genet, 202213:973319. DOI: 10.3389/fgene.2022.973319.
[22]
Li M, Chen H, Yin P, et al. Identification and clinical validation of key extracellular proteins as the potential biomarkers in relapsing-remitting multiple sclerosis [J]. Front Immunol, 2021, 12: 753929. DOI: 10.3389/fimmu.2021.753929.
[23]
Zhang B, Hu X, Li Y, et al. Identification of methylation markers for diagnosis of autism spectrum disorder [J]. Metab Brain Dis, 2022, 37(1): 219-228. DOI: 10.1007/s11011-021-00805-5.
[24]
Zhang X, Zhang S, Yan X, et al. m6A regulator-mediated RNA methylation modification patterns are involved in immune microenvironment regulation of periodontitis [J]. J Cell Mol Med, 2021, 25(7): 3634-3645. DOI: 10.1111/jcmm.16469.
[25]
Gao J, Lu F, Yan J, et al. The role of radiotherapy-related autophagy genes in the prognosis and immune infiltration in lung adenocarcinoma [J]. Front Immunol, 2022, 13: 992626. DOI: 10.3389/fimmu.2022.992626.
[26]
Yang C, Han Z, Zhan W, et al. Predictive investigation of idiopathic pulmonary fibrosis subtypes based on cellular senescence-related genes for disease treatment and management [J]. Front Genet, 2023, 14: 1157258. DOI: 10.3389/fgene.2023.1157258.
[27]
Li Z, Xiao Y, Xu L, et al. Mining the key genes for ventilator-induced lung injury using co-expression network analysis [J]. Biosci Rep, 2021, 41(3): BSR20203235. DOI: 10.1042/BSR20203235.
[28]
Zhang Y, Kong X, Zhang J, et al. Functional analysis of bronchopulmonary dysplasia-related neuropeptides in preterm infants and miRNA-based diagnostic model construction [J]. Comput Math Methods Med, 2022, 2022: 5682599. DOI: 10.1155/2022/5682599.
[29]
Cai Y, Ma F, Qu L, et al. Weighted gene co-expression network analysis of key biomarkers associated with bronchopulmonary dysplasia [J]. Front Genet, 2020, 11: 539292. DOI: 10.3389/fgene.2020.539292.
[30]
Kearsey J, Petit S, De Oliveira C, et al. A novel four transmembrane spanning protein, CLP24. A hypoxically regulated cell junction protein [J]. Eur J Biochem, 2004, 271(13): 2584-2592. DOI: 10.1111/j.1432-1033.2004.04186.x.
[31]
Saharinen P, Helotera H, Miettinen J, et al. Claudin-like protein 24 interacts with the VEGFR-2 and VEGFR-3 pathways and regulates lymphatic vessel development [J]. Genes Dev, 2010, 24(9): 875-880. DOI: 10.1101/gad.565010.
[32]
Secker GA, Harvey NL. VEGFR signaling during lymphatic vascular development: from progenitor cells to functional vessels [J]. Dev Dyn, 2015, 244(3): 323-331. DOI: 10.1002/dvdy.24227.
[33]
Hülskötter K, Lühder F, Leitzen E, et al. CD28-signaling can be partially compensated in CD28-knockout mice but is essential for virus elimination in a murine model of multiple sclerosis [J]. Front Immunol, 2023, 14: 1105432. DOI: 10.3389/fimmu.2023.1105432.
[34]
Kumar VHS, Wang H, Nielsen L. Adaptive immune responses are altered in adult mice following neonatal hyperoxia [J]. Physiol Rep, 2018, 6(2): e13577. DOI: 10.14814/phy2.13577.
[35]
Revhaug C, Bik-Multanowski M, Zasada M, et al. Immune system regulation affected by a murine experimental model of bronchopulmonary dysplasia: genomic and epigenetic findings [J]. Neonatology, 2019, 116(3): 269-277. DOI: 10.1159/000501461.
[36]
Riley JL, Mao M, Kobayashi S, et al. Modulation of TCR-induced transcriptional profiles by ligation of CD28, ICOS, and CTLA-4 receptors [J]. Proc Natl Acad Sci U S A, 2002, 99(18): 11790-11795. DOI: 10.1073/pnas.162359999.
[37]
Xiao Z, Mayer AT, Nobashi TW, et al. ICOS is an indicator of T-cell-mediated response to cancer immunotherapy [J]. Cancer Res, 2020, 80(14): 3023-3032. DOI: 10.1158/0008-5472.CAN-19-3265.
[38]
Cai Y, Ma F, Qu L, et al. Weighted gene co-expression network analysis of key biomarkers associated with bronchopulmonary dysplasia [J]. Front Genet, 2020, 11: 539292. DOI: 10.3389/fgene.2020.539292.
[39]
Hrusch CL, Manns ST, Bryazka D, et al. ICOS protects against mortality from acute lung injury through activation of IL-5+ ILC2s [J]. Mucosal Immunol, 2018, 11(1): 61-70. DOI: 10.1038/mi.2017.42.
[40]
Chang H, Dong T, Ma X, et al. Spondin 1 promotes metastatic progression through Fak and Src dependent pathway in human osteosarcoma [J]. Biochem Biophys Res Commun, 2015, 464(1): 45-50. DOI: 10.1016/j.bbrc.2015.05.092.
[41]
Tamjidifar R, Akbari M, Tarzi S, et al. Prognostic and diagnostic values of miR-506 and SPON 1 in colorectal cancer with clinicopathological considerations [J]. J Gastrointest Cancer, 2021, 52(1): 125-129. DOI: 10.1007/s12029-019-00356-0.
[42]
Li H, Li J, Hu Y, et al. FOXO3 regulates Smad3 and Smad7 through SPON1 circular RNA to inhibit idiopathic pulmonary fibrosis [J]. Int J Biol Sci, 2023, 19(10): 3042-3056. DOI: 10.7150/ijbs.80140.
[43]
Heindl K, Martinez J. Nol9 is a novel polynucleotide 5′-kinase involved in ribosomal RNA processing [J]. EMBO J, 2010, 29(24): 4161-4171. DOI: 10.1038/emboj.2010.275.
[44]
Gordon J, Pillon MC, Stanley RE. Nol9 is a spatial regulator for the human ITS2 pre-rRNA endonuclease-kinase complex [J]. J Mol Biol, 2019, 431(19): 3771-3786. DOI: 10.1016/j.jmb.2019.07.007.
[45]
Hu JY, Wang Y, Tong XM, et al. When to consider logistic LASSO regression in multivariate analysis? [J]. Eur J Surg Oncol, 2021, 47(8): 2206. DOI: 10.1016/j.ejso.2021.04.011.
[46]
Li M, Zhu W, Wang C, et al. Weighted gene co-expression network analysis to identify key modules and hub genes associated with paucigranulocytic asthma [J]. BMC Pulm Med, 2021, 21(1): 343. DOI: 10.1186/s12890-021-01711-3.
[47]
Alderden J, Pepper GA, Wilson A, et al. Predicting pressure injury in critical care patients: a machine-learning model [J]. Am J Crit Care, 2018, 27(6): 461-468. DOI: 10.4037/ajcc2018525.
[1] 杨皓媛, 龚杰, 邹青伟, 阮航. 哮喘孕妇的母婴不良妊娠结局研究现状[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(05): 522-529.
[2] 董晓燕, 赵琪, 唐军, 张莉, 杨晓燕, 李姣. 奥密克戎变异株感染所致新型冠状病毒感染疾病新生儿的临床特征分析[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(05): 595-603.
[3] 周梦玲, 薛志伟, 周淑. 妊娠合并子宫肌瘤的孕期变化及其与不良妊娠结局的关系[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(05): 611-615.
[4] 魏徐, 张鸽, 伍金林. 新生儿脓毒症相关性凝血病的监测和治疗[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(04): 379-386.
[5] 杨莹, 刘艳, 王央丹. 新生儿结节性硬化症相关性癫痫1例并文献复习[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(04): 464-472.
[6] 周美岑, 王华, 母得志. 早产儿疫苗预防接种及时性[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(03): 261-266.
[7] 李敏, 熊菲. 母乳成分及其影响因素的研究现状[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(03): 267-272.
[8] 赵金琦, 杨楠, 宫丽霏, 唐玥, 李璐璐, 杨海河, 孔元原. 2011—2020年北京市小于胎龄儿出生状况分析[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(03): 278-286.
[9] 李聪, 徐艳, 吴铭, 丁瑞东, 王军. 极低出生体重儿出生时血清25-羟维生素D水平与其生后早期喂养不耐受关系的临床分析[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(03): 309-314.
[10] 苏永维, 陈果. 早产儿非心脏手术的麻醉管理[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(02): 139-144.
[11] 杨萍, 许世敏, 李亮亮, 尹向云, 锡洪敏, 马丽丽, 李向红. 早产儿支气管肺发育不良合并代谢性骨病的影响因素[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(02): 202-211.
[12] 秦茜羽, 唐英, 于珍, 杨海波, 杨洁. 婴幼儿腹股沟卵巢疝临床分析并文献复习[J]. 中华妇幼临床医学杂志(电子版), 2023, 19(02): 227-234.
[13] 王琦琦, 龚梦元, 冯正源, 韩亮, 王铮, 马清涌, 仵正. 基于术前检测指标构建Logistic回归模型在预测壶腹周围癌病理类型中的价值[J]. 中华肝脏外科手术学电子杂志, 2023, 12(02): 196-200.
[14] 强光峰, 孟兰兰, 赵静, 牛峰海, 任雪云. 肺部超声评分对呼吸困难新生儿使用有创机械通气的预测价值[J]. 中华诊断学电子杂志, 2023, 11(02): 104-108.
[15] 李世浩, 李子豪, 董博, 吴春莉, 吴彬, 盛银良, 齐宇. 胞质分裂蛋白调节因子1对肺腺癌细胞迁移、侵袭和增殖的影响[J]. 中华胸部外科电子杂志, 2023, 10(03): 164-175.
阅读次数
全文


摘要