Canada Research Chair in Computational Statistics
Tier 1 - 2015-10-01
Developing computationally intensive statistical methods for modern data sets.
This research will lead to the development of approaches for analyzing big, and complex, data sets in areas like bioinformatics, finance, health, marketing and nutrigenomics.
FINDING MEANING IN BIG AND COMPLEX DATA
The old adage that “bigger is better” may or may not apply to data. Nevertheless, big data have become a fact of life across virtually all areas of societal and scientific endeavor. Paul McNicholas is developing new, computationally intensive, statistical methods to better gain insight into big, and otherwise complex, data.
Among the most problematic big data, from an analytics viewpoint, are data sets where very many measurements are taken for each observation. While statisticians and computer scientists routinely deal with data that can contain hundreds or thousands of variables, modern data sets often have upwards of ten thousand variables. Unfortunately, there is a dearth of effective methodology for so-called ultra high-dimensional data.
Paul McNicholas is combining expertise in computing and statistics to develop computational statistics approaches for ultra high-dimensional data. These approaches focus on methods that find subgroups of similar observations – known as classification or clustering methods – and are applicable in any setting where ultra high-dimensional data arise, from management science to disease diagnostics and bioinformatics.
Beyond ultra high-dimensional data, Paul McNicholas is developing computational statistics methods that will allow users to make sense of massive data sets with measurements of different types. Similar to his work on ultra high-dimensional data, this work promises to simplify and facilitate data analytics in many fields of social, economic, and scientific endeavor.
Concrete examples of applications of Paul McNicholas’ work include developing computational statistics approaches to: look for subtypes of certain cancers; help identify candidate genes for modification to allow food crops to grow in developing countries; and combine genetic, fitness, and other health data to study the relationship between obesity, genes, nutrition, and exercise.