Medidata: Power Smarter Treatments and Healthier People
Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,900+ customers and partners access the world's most trusted platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at www.medidata.com and follow us @medidata.
The Medidata AI Summer internship program is a competitive and comprehensive 12-week rotational program. We provide pioneering data and analytics products to most of the $100B pharmaceutical development industry and our team is made up of data scientists, statisticians, computer scientists, implementation designers, engineers, and business experts.
We are looking for interns to be an integral part of this dynamic team, where you would drive innovative research, client-facing deliverables to collaboratively build new data science solutions. Our team leverages industry-leading data assets and analytical models to transform the clinical development industry, driving clinical and operational success for our clients and partners.
Participation in the internship program requires that you are located in the United States for the duration of the internship program. Roles are based out of either New York City or Boston. This internship is intended for students who are currently pursuing a Master’s degree program in a quantitative discipline with an anticipated graduation date on or before June 2024, depending on their program.
Position Overview / Project Description:
Clinical trial data volume has increased seven-fold in the last 20 years to over 3.6 million data points in a typical Phase III study. Medidata’s Clinical Cloud solution tackles the evolving data landscape by offering real-time, automated clinical data review using machine learning, analytics, and visualizations in a unified platform. In this platform, users can write custom code, known as a custom function, to enable edit checks on clinical data to meet clinical requirements. For instance, an edit check can be made to ensure that patient body temperatures are within plausible ranges.This data-driven research project aims to leverage machine learning and natural language processing to identify similar custom functions across the pool of over 6 million functions. These insights will facilitate the consolidation of custom functions to improve the user experience and reduce software maintenance in the platform.This role will work alongside Machine Learning engineers and researchers in the Data Science team. The research team specializes in building predictive models and advanced algorithms to support key Medidata products.This research initiative involves developing a machine learning pipeline to analyze the source code to characterize similar functions. Specifically, research is required to explore and implement a transformer-based model to embed the source code, followed by dimensionality reduction, clustering, and visualization.
This internship also offers a unique opportunity to participate in an Innovation Lab. This allows interns to partner with industry leaders and cross-functional teams to work on a real-world business problem that Medidata currently faces. All participants present their solutions to the leadership of the AI team, and the winner presents to the SVPs and CEO of Medidata and other key leadership.
Design, develop and validate machine learning models for novel medical applications. Areas of team focus include clinical trial document NLP, classification and clustering algorithms
Evaluate and assess novel tools, algorithms, and technologies that enable data science capabilities
Provide support functions around model-building, including data curation, cleaning and transformation
Ability to understand and explore machine clearing models (including transformer model), as well as perform model evaluation
Ability to work independently on complex and diverse issues and propose intelligent solutions
Bring to production developed methods and code for integration with existing/new products
Work directly with our team comprised of the brightest minds in technology, research, and mathematics as well as senior interfaces from leading life sciences companies across the globe
Qualifications / Competencies:
Ability to translate business challenges into data pipelines & model framework, owning and driving successful projects
Strong communication skills to articulate highly technical methods to diverse audiences to shape decision-making with a collaborative focus
Fluency in statistical tools and programming languages that allow you to be self-sufficient in handling data (e.g. Python, SQL, bash)
Ability to apply machine learning and analytical algorithms (feature selection, classification, clustering, etc.)
Education and Experience:
Bachelors/Masters/PhDs in Math, Statistics, Computer Science, Physics, Engineering, Bioinformatics, or another quantitative field with a strong foundation in statistical methodology and computation.
Experience with machine learning techniques (classification, clustering, deep learning, etc.)
Experience using Git version control
Experience with large healthcare datasets and experience with deep learning is a plus
Experience or interest in NLP is a plus
Experience in a Linux environment, container is a plus
The salary range posted below refers only to positions that will be physically based in New York City. As with all roles, Medidata sets ranges based on a number of factors including function, level, candidate expertise and experience, and geographic location. Pay ranges for candidates in locations other than New York City, may differ based on the local market data in that region. The base salary pay range for this position is $32.00 to $37.00 per hr with a $3500 sign on bonus.
MEDIDATA generates the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes.