USA - MA - Boston
Brooklyn, New York
La Jolla / San Diego, California
USA - MA - Cambridge
Location: USA - MA - Boston
Internal Number: 56664BR
The Center for Computational Biomedicine (CCB) is a new center within the Blavatnik Institute at Harvard Medical School. Our mission is to provide cutting-edge computational capabilities, data analysis, and data integration technologies to support medical and biological research within the Medical School. Based at the Harvard Medical School Longwood Campus, we are part of a vibrant community of scientists, physicians, and engineers whose goal is to advance the boundaries of knowledge and improve patient care. The working environment combines the best features of a startup (fast pace, flexibility, flat hierarchies) with those of one of the leading medical schools (excellent benefits, outstanding opportunities for learning, great resources, name recognition).
CCB is looking for an individual to join the Data and Analytic Platforms Group, a group of engineers and scientists developing data warehousing and analytic solutions in support of epidemiology, healthcare economics, machine learning, and basic science research.
The Group works to reduce the burden on faculty by developing centrally managed and shareable data solutions to be used across research silos. We curate very large public and private healthcare utilization (insurance claims, electronic health record), multi-omics, environmental exposure, and social determinants data sets, provision access to those curated data sets, and develop analytic frameworks to accelerate reproducible academic research on top of them. Collectively these data sets contain information relating to hundreds of millions of patients.
This position reports to the Director of the CCB Data and Analytic Platforms Group. Primary responsibilities will include designing and implementing relational database architecture (schema, indexing, stored procedures, ETL processes, etc.) to warehouse multi-terabyte data sets in Microsoft SQL Server. This will include periodically evaluating various query performance metrics to ensure real-time availability to the research community and recommending modifications to the underlying database platform to resolve any identified issues. The bulk of this design work will be left up with the candidate, while a small portion will involve refactoring (or strategically deciding to abandon) existing ETL / indexing strategies. The data sets will be staged into a combination of proprietary schemas as well as the open-source i2b2 data model.
Additional opportunities will be available for the candidate to interact with individual scientific research teams to help improve their workflows.
Additional Qualifications and Skills