Repository logo
 
Publication

Methodology to identify a gene expression signature by merging microarray datasets

dc.contributor.authorFajarda, Olga
dc.contributor.authorAlmeida, João Rafael
dc.contributor.authorDuarte-Pereira, Sara
dc.contributor.authorSilva, Raquel M.
dc.contributor.authorOliveira, José Luís
dc.date.accessioned2023-04-19T13:07:02Z
dc.date.available2023-04-19T13:07:02Z
dc.date.issued2023-06
dc.description.abstractA vast number of microarray datasets have been produced as a way to identify differentially expressed genes and gene expression signatures. A better understanding of these biological processes can help in the diagnosis and prognosis of diseases, as well as in the therapeutic response to drugs. However, most of the available datasets are composed of a reduced number of samples, leading to low statistical, predictive and generalization power. One way to overcome this problem is by merging several microarray datasets into a single dataset, which is typically a challenging task. Statistical methods or supervised machine learning algorithms are usually used to determine gene expression signatures. Nevertheless, statistical methods require an arbitrary threshold to be defined, and supervised machine learning methods can be ineffective when applied to high-dimensional datasets like microarrays. We propose a methodology to identify gene expression signatures by merging microarray datasets. This methodology uses statistical methods to obtain several sets of differentially expressed genes and uses supervised machine learning algorithms to select the gene expression signature. This methodology was validated using two distinct research applications: one using heart failure and the other using autism spectrum disorder microarray datasets. For the first, we obtained a gene expression signature composed of 117 genes, with a classification accuracy of approximately 98%. For the second use case, we obtained a gene expression signature composed of 79 genes, with a classification accuracy of approximately 82%. This methodology was implemented in R language and is available, under the MIT licence, at https://github.com/bioinformatics-ua/MicroGES.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.doi10.1016/j.compbiomed.2023.106867pt_PT
dc.identifier.eid85152129348
dc.identifier.issn0010-4825
dc.identifier.pmid37060770
dc.identifier.urihttp://hdl.handle.net/10400.14/40884
dc.identifier.wos000982862600001
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectAutism spectrum disorderpt_PT
dc.subjectGene expression signaturept_PT
dc.subjectHeart failurept_PT
dc.subjectLSVMpt_PT
dc.subjectMicroarray datapt_PT
dc.subjectNeural networkpt_PT
dc.subjectRandom forestpt_PT
dc.titleMethodology to identify a gene expression signature by merging microarray datasetspt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.citation.titleComputers in Biology and Medicinept_PT
oaire.citation.volume159pt_PT
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
67612790.pdf
Size:
1.29 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.44 KB
Format:
Item-specific license agreed upon to submission
Description: