Scripts and code used in the analysis of the TFM
Backgroud: Lung cancer is the most frequent cause of cancer-related deaths worldwide for 2040 > 3.1 million the new cases are expected. The survival rate of patients with lung cancer is around 5-year survival rates, it strongly depends on stage of the tumor. The main cause factor of lung cancer is smoking and the use of tobacco products. Lung cancer is classified into two broad histologic classes, which grow and spread differently: small-cell lung carcinomas (SCLC) and non-small cell lung carcinomas (NSCLC). In NSCLC the two predominant subtypes are: lung adenocarcinoma (LUAD) and lung squamous cancer (LUSC). Lung cancer shows one of the most diverse profiles not only in the histopathological landscape but also in the molecular mechanisms of carcinogenesis which involve a large number of genetic alterations. Treatment options for lung cancer include surgery, radiation therapy, chemotherapy, and targeted therapy. Therapeutic recommendations depend on several factors such as the type and stage of cancer. Despite the improvements in diagnosis and therapy, only a few patients can be benefit from target therapy there are still many patients whose prognosis is still unsatisfactory. However, a better understanding of the molecular profiles of this patient, might lead to the development of more efficacious and perhaps more specific drugs. The main purpose of this project was to predict personalized treatment for groups of lung cancer lacking therapy based on their transcriptomic profile.
Methods: RNA-Seq data of LUAD and LUSC was download from The Cancer Genome Atlas (TGCA). Frist a clustering analysis was performed to identify the possible groups of each pathology. Non-negative matrix factorization (NMF) and K-means were used as clustering algorithm. Second, different bioinformatic analysis were made to classify the biological profile of the identify groups: 1) the differentially expressed genes (DEGs) were identified, 2) an functional enrichment analysis was done to identify the enriched GO Terms in each groups signature, 3) the tumor microenvironment (TM) was analyzed and the tumor infiltrating immune cells (TIICs) were also identified, 4) the differentially expressed (DE) transcription factors (TFs) were identify and 5) with the clinical data from the TGCA a survival analysis was performed to achieve the survival rate of each group. Third, a more extense study of the TFs was made, with a gene regulatory network (GRN) to find the most biologically relevant TF. The TFs that were rewired were identify. Lastly, a drug repositioning analysis was performed to predict the drugs that could reverse the signature of each group.
Results: In both pathologies we obtained 3 groups that presented relevant difference at a biological and clinical level. In both pathologies the groups presented differences enriched terms and different DE TFs and rewired TFs. However, survival differences were only found in LUSC. Furthermore, the proposed drugs were specific for each group signature.