Be the next big change > Dassault Systèmes

Be the Next Game Changer

Data Science Intern

United States, NY, New York

Medidata: Power Smarter Treatments and Healthier People


Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,900+ customers and partners access the world's most trusted platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at and follow us @medidata.


Job Description:


Program Overview:

The Medidata AI Summer internship program is a competitive and comprehensive 12-week rotational program. We provide pioneering data and analytics products to most of the $100B pharmaceutical development industry and our team is made up of data scientists, statisticians, computer scientists, implementation designers, engineers, and business experts. 

We are looking for interns to be an integral part of this dynamic team, where you would drive innovative research, client-facing deliverables to collaboratively build new data science solutions. Our team leverages industry-leading data assets and analytical models to transform the clinical development industry, driving clinical and operational success for our clients and partners. 

Participation in the internship program requires that you are located in the United States for the duration of the internship program. Roles are based out of either New York City or Boston. This internship is intended for students who are currently pursuing a Master’s degree program in a quantitative discipline with an anticipated graduation date on or before June 2024, depending on their program. 

Position Overview / Project Description:

Clinical trial data volume has increased seven-fold in the last 20 years to over 3.6 million data points in a typical Phase III study. Medidata’s Clinical Cloud solution tackles the evolving data landscape by offering real-time, automated clinical data review using machine learning, analytics, and visualizations in a unified platform. In this platform, users can write custom code, known as a custom function, to enable edit checks on clinical data to meet clinical requirements. For instance, an edit check can be made to ensure that patient body temperatures are within plausible ranges.This data-driven research project aims to leverage machine learning and natural language processing to identify similar custom functions across the pool of over 6 million functions. These insights will facilitate the consolidation of custom functions to improve the user experience and reduce software maintenance in the platform.This role will work alongside Machine Learning engineers and researchers in the Data Science team. The research team specializes in building predictive models and advanced algorithms to support key Medidata products.This research initiative involves developing a machine learning pipeline to analyze the source code to characterize similar functions. Specifically, research is required to explore and implement a transformer-based model to embed the source code, followed by dimensionality reduction, clustering, and visualization.


This internship also offers a unique opportunity to participate in an Innovation Lab. This allows interns to partner with industry leaders and cross-functional teams to work on a real-world business problem that Medidata currently faces. All participants present their solutions to the leadership of the AI team, and the winner presents to the SVPs and CEO of Medidata and other key leadership.

Role responsibilities: 

  • Design, develop and validate machine learning models for novel medical applications. Areas of team focus include clinical trial document NLP, classification and clustering algorithms

  • Evaluate and assess novel tools, algorithms, and technologies that enable data science capabilities

  • Provide support functions around model-building, including data curation, cleaning and transformation

  • Ability to understand and explore machine clearing models (including transformer model), as well as perform model evaluation

  • Ability to work independently on complex and diverse issues and propose intelligent solutions

  • Bring to production developed methods and code for integration with existing/new products

  • Work directly with our team comprised of the brightest minds in technology, research, and mathematics as well as senior interfaces from leading life sciences companies across the globe

Qualifications / Competencies:

  • Ability to translate business challenges into data pipelines & model framework, owning and driving successful projects

  • Strong communication skills to articulate highly technical methods to diverse audiences to shape decision-making with a collaborative focus

  • Fluency in statistical tools and programming languages that allow you to be self-sufficient in handling data (e.g. Python, SQL, bash)

  • Ability to apply machine learning and analytical algorithms (feature selection, classification, clustering, etc.)

Education and Experience:

  • Bachelors/Masters/PhDs in Math, Statistics, Computer Science, Physics, Engineering, Bioinformatics, or another quantitative field with a strong foundation in statistical methodology and computation.

  • Experience with machine learning techniques (classification, clustering, deep learning, etc.)

  • Experience using Git version control

  • Experience with large healthcare datasets and experience with deep learning is a plus

  • Experience or interest in NLP is a plus

  • Experience in a Linux environment, container is a plus


The salary range posted below refers only to positions that will be physically based in New York City.  As with all roles, Medidata sets ranges based on a number of factors including function, level, candidate expertise and experience, and geographic location.  Pay ranges for candidates in locations other than New York City, may differ based on the local market data in that region. The base salary pay range for this position is $32.00 to $37.00 per hr with a $3500 sign on bonus. 





Equal opportunity

In order to provide equal employment and advancement opportunities to all individuals, employment decisions at 3DS are based on merit, qualifications and abilities. 3DS is committed to a policy of non-discrimination and equal opportunity for all employees and qualified applicants without regard to race, color, religion, gender, sex (including pregnancy, childbirth or medical or common conditions related to pregnancy or childbirth), sexual orientation, gender identity, gender expression, marital status, familial status, national origin, ancestry, age (40 and above), disability, veteran status, military service, application for military service, genetic information, receipt of free medical care, or any other characteristic protected under applicable law. 3DS will make reasonable accommodations for qualified individuals with known disabilities, in accordance with applicable law.

Covid statement

Our Company requires all U.S. employees to be fully vaccinated against COVID-19 and to provide documentation of full vaccination, unless qualified for a medical, religious or state-required accommodation or otherwise exempt consistent with applicable law.  Although accommodation requests will be considered (and granted where appropriate/possible), it may be determined that a candidate is unable to adequately perform the essential functions of the position without imposing an undue hardship due to customer requirements, staffing needs, or other business reasons. Definition of full-vaccination: Employees are considered to be fully vaccinated two weeks after their second dose in a 2-dose series or two weeks after a single-dose vaccine.
MEDIDATA Logo > Dassault Systèmes

MEDIDATA generates the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes.