Bioinformatic analysis of secretory protein candidates
Absztrakt
We used two data sets, first, 200 proteins for which secretion has been described by biological and physicochemical detection outside of the cells and or in bodily fluids, second, we selected 200 intracellular proteins, which have not been detected outside of the cells. We then performed our scoring and analyzed the score distribution histogram in both populations. By applying various thresholds, we were able derive better accuracy than any individual prediction algorithm. The results are encouraging for further studies where we plan to deploy machine learning algorithms for the application of specific weights for individual outcome scores for further optimization of the method. Due to the significance of secreted proteins as potential disease specific biomarkers and therapeutic targets, application of our technology for the entire genome may have an important impact.