A statistician by training, CHÉOS Scientist Dr. Ehsan Karim decided to pursue research in biostatistics because he was driven to investigate real-life problems.
“When I was doing research in more theoretical statistics, I found that sometimes it’s hard to see how a hypothetical construct or theory can be useful to anybody,” said Dr. Karim, “but with biostatistics, all of your work is motivated by real problems.”
Dr. Karim holds a Ph.D. in statistics from the University of British Columbia and completed his postdoctoral training in Biostatistics and Epidemiology from McGill University.
In his research, Dr. Karim uses observational data to study the effectiveness of treatments and potential causes of disease. In essence, he uses causal inference techniques on volumes of linked health care administrative datasets to emulate clinical trial settings.
For example, during his PhD., Dr. Karim investigated the effectiveness of treatment for multiple sclerosis (MS). To understand the chronic condition, patients need to be followed throughout their lifetime.
“The drugs that were approved for MS treatment were based on clinical trials of only three to five years,” said Dr. Karim. “What happens after 10 or 15 years? Does the drug still work?”
Using data collected over 13 years, his research using causal inference techniques showed that beta-interferon, a widely used drug for MS, may not be as effective in slowing disease progression in the long term as was previously believed.
Since then, Dr. Karim has continued to study how causal inference and big-data analytics can be used to study the effectiveness of drugs.
The magnitude of observational epidemiological data obtained from health care administrative datasets can be both an asset and a challenge in his research, Dr. Karim said. On the one hand, these datasets more closely resemble routine clinical practice compared to clinical trials.
On the other hand, although these large datasets capture thousands of variables over time, some vital information may still be missing, since the data are not generally collected with a specific study question in mind.
“Conventionally, you try to pick variables that are important for your analysis by incorporating expert knowledge,” Dr. Karim said. “But when you are dealing with a massive data set that has 10,000 variables, that’s not really possible.”
That’s where Dr. Karim’s research on machine learning comes in.
Machine learning methods can automatically detect various features of a dataset and determine if interactions between variables need to be considered. This allows statisticians to examine data attributes which may otherwise get overlooked by standard statistical models.
If crucial elements are missing from a data set, the machine learning tools can detect and identify “proxy variables” that are closely associated and potentially compensate for the missing information.
“We often don’t know which variables are important or which functional form of the model is more appropriate in a given big-data scenario,” said Dr. Karim. “With the use of machine learning techniques, you can push the boundaries and go beyond what the simple standard statistical tools have to offer.”
On April 12, Dr. Karim will present a Work in Progress Seminar on using popular machine learning algorithms to enhance a recently popularized technique for analyzing health care data, known as high-dimensional propensity scores.