Highlights - Computational Methods for Motif Discovery in Biological Units
Principal Investigator: Yang Du (Computer Systems and Software Engineering, Valley City State University)
Dr. Du’s research
is focused on discovering recurring motifs or patterns among biological units. A recurring motif can be anything that repeats itself. For example, the motif can be the occurrence order of specific units, such as a piece of DNA sequence or amino acid sequence; it can be the local or global spatial structure such as a geometric composition of molecules; it can also be the expression of features. The recurrence of the motifs can happen in an individual, a species, or along the evolutional path. These motifs play crucial roles in many biological processes since the repetition reflects the reserved functional sites. The same biological function may share the same motif, just like if we see an animal has a pair of wings, we can expect the animal can fly. Motifs can help predict unknown proteins' function, understand diseases, make medicine, and engineer artificial biological units.
Sigma 70 promoter sequence motifs of Escherichia coli.
Their studies start with exact computational methods [1–2] and later turn to machine learning techniques. The simulations are performed on HPC clusters at CCAST/NDSU
. Thanks to the availability of such HPC resources, the tasks can be completed within acceptable time; otherwise, it is impossible to experiment on such a wide range of settings, especially for grid-searching the hyperparameters for deep learning models.
 Y. Du and C. Yan, “An efficient method for discovering functionally important motifs in a group of protein structures,” International Conference on Computational Science and Computational Intelligence (CSCI; Dec. 12, 2018): 1345-1350. IEEE.
 Y. Du and C. Yan, “An improved clique-based method for discovery of novel spatial motifs in protein structures,” BIBE 2018; International Conference on Biological Information and Biomedical Engineering, VDE, 2018: 1-5.