Automatic detection of nursing quality metrics using EHR data

The aim of this collective project with multiple departments at Emory University is to apply computational techniques, like noisy labelling, weak supervision and truth inference methods on Electronic Health Records, freely and openly accessible on the Internet, to automatically extract relevant nursing qualities. These extracted meterics can infrom both nursing staff and management to better identify needs of patients.

Using Graphical Models to extract Meaningful, High quality features From Electronic Health Records

I’m focusing on applying some graphical models on Electronic Health Records, freely and openly accessible on the Internet, to extract more robust features that can later be pipelined into well know machine learning algoirhtms. Once these new set of features are confirmed and tested carefully, they may be used for disease prediction (classification tasks),or prediction of other health - related outcomes.

Tensor Factorization for Phenotyping of patients with Chronic Diseases

The purpose of this project was to obtain novel clusters of patients features and attributes using Tensor factorization applied on Electronic Healthcare Records data. Once these clusters or phenotypes are chosen, one can conveniently take adavantage of membership information of each patients and whether they belong to a specific cluster, or a set of attributes, in future machine learning tasks, as summarized, less noisy features. The final outcome being, that once we have more robust and reliable biomedical predictions, we may better be able to deliver appropriate intervention and course of treatment to the patients.

Analysis of performance of truth inference method in Adversarial settings and Robust inference through matrix completion

Crowdsourcing is a paradigm that provides a cost-effective solution for obtaining services or data from a large group of users. It is increasingly being used in modern society for data collection in domains such as image annotation or real-time traffic reports. A key component of these crowdsourcing applications is truth inference which aims to derive the true answer for a given task from the user-contributed data, e.g. the existence of objects in an image, or true traffic condition of a road. In addition to the variable quality of the contributed data, a potential challenge presented to crowdsourcing applications is data poisoning attacks where malicious users may intentionally and strategically report incorrect information in order to mislead the system to infer the wrong truth for all or a targeted set of tasks. We proposed a comprehensive data poisoning attack taxonomy for truth inference in crowdsourcing and systematically evaluate the state-of-the-art truth inference methods under various data poisoning attacks. We use several evaluation metrics to analyze the robustness or susceptibility of truth inference methods against various attacks, which sheds light on the resilience of existing methods and ultimately helps in building more robust truth inference methods in an open setting.