Data Scientist Intern

United States, NY, New York

Medidata: Power Smarter Treatments and Healthier

People Medidata is leading the digital transformation of life sciences, creating hope for millions of patients. Medidata helps generate the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes. More than one million registered users across 1,900+ customers and partners access the world's most trusted platform for clinical development, commercial, and real-world data. Medidata, a Dassault Systèmes company, is headquartered in New York City and has offices around the world to meet the needs of its customers. Discover more at and follow us @medidata.

At Medidata, interns will have the opportunity to accelerate their careers by working closely with experienced professionals and gain valuable, hands-on, full-time work experience.  By being a part of our global organization, interns have the opportunity to work alongside our talented and committed professionals helping them to build a strong foundation for achieving their career goals.  For 12 weeks, beginning May 20, 2024, interns will have an opportunity to gain a deep understanding of what it means to be a Medidatian. United around a single goal of empowering smarter treatments and healthier people.  Medidatians work in a culture of curiosity, innovation and fun.  You will be contributing to the line of business with sustainable and meaningful work. Our Summer Internship program also includes instructor led training, guided mentorship, exposure to senior leadership and community service.  In addition to individual and specific related responsibilities, each intern will participate in our Intern Innovation Lab.  Assigned to cross-functional teams, interns will work closely to develop an innovative solution to a business problem currently facing Medidata.  As they work diligently to present their final solutions to a panel of top Medidata leaders, we are confident that our interns will make a significant impact on our business.

The Position: We are seeking an intern to play a key role in our dynamic team, driving innovative research in synthetic data generation and deriving insights from clinical trial data using LLM-based solutions. Utilizing industry-leading data assets and analytical models, our team is dedicated to transforming the clinical development industry, ensuring both clinical and operational success for our clients and partners.

This key components of this internship are:

  • Performing research and development in the area of synthetic data generation and LLMs
  • Implementing and evaluating algorithms based on research literature
  • Creating, documenting, and maintaining code
  • Reporting findings to internal teams

This project centers on leveraging generative models to enhance the generation of synthetic clinical trial data and derive valuable clinical insights. The scope of the project includes, processing and training generative models using large-scale, complex clinical datasets and external data sources. Key aspects of the project encompass data standardization, feature extraction, identification of external data for augmentation, model development, and evaluation. Collaborating with the Medidata AI Synthetic Data Science team, specialists in clinical trial datasets, this role focuses on building advanced models to extract clinical insights across disease indications. The project demands expertise in longitudinal datasets, machine learning, deep learning, LLMs, NLP/NLU models, and/or generative AI models to craft a cutting-edge solution for processing and extracting insights from clinical trial longitudinal datasets.

Your Requirements:

  • Strong performance in a Bachelor's program in Data Science, Mathematics, Statistics, or Computer Science.
  • Proficiency in Python (with pandas) that allows self-sufficiency in analyzing tabular and longitudinal data. 2+ years experience with Machine Learning and AI, with a focus on areas such as NLP, Deep Learning, Language Modeling, and/or Generative Modeling. Ability to apply statistical analysis techniques.
  • Competence in utilizing ML techniques including Transformers, LSTM, GANs, CNNs, VAEs or other deep gradient-based methods.
  • Demonstrated ability to think creatively, independently access and analyze data, and effectively evaluate both the big picture and key details.
  • Excellent interpersonal, verbal, and written communication skills.
  • Strong time management and problem-solving abilities.
  • Capable of multitasking in a fast-paced environment, with the ability to prioritize deliverables for optimal results.

As with all roles, Medidata sets ranges based on a number of factors including function, level, candidate expertise and experience, and geographic location. The salary range for positions that will be physically based in New York, NY is $32.00 to $37.00 per hour with a $3,500 sign on bonus.

Diversity statement

As a game-changer in sustainable technology and innovation, Dassault Systèmes is striving to build more inclusive and diverse teams across the globe. We believe that our people are our number one asset and we want all employees to feel empowered to bring their whole selves to work every day. It is our goal that our people feel a sense of pride and a passion for belonging. As a company leading change, it’s our responsibility to foster opportunities for all people to participate in a harmonized Workforce of the Future.

Equal opportunity

In order to provide equal employment and advancement opportunities to all individuals, employment decisions at 3DS are based on merit, qualifications and abilities. 3DS is committed to a policy of non-discrimination and equal opportunity for all employees and qualified applicants without regard to race, color, religion, gender, sex (including pregnancy, childbirth or medical or common conditions related to pregnancy or childbirth), sexual orientation, gender identity, gender expression, marital status, familial status, national origin, ancestry, age (40 and above), disability, veteran status, military service, application for military service, genetic information, receipt of free medical care, or any other characteristic protected under applicable law. 3DS will make reasonable accommodations for qualified individuals with known disabilities, in accordance with applicable law.
MEDIDATA Logo > Dassault Systèmes

MEDIDATA generates the evidence and insights to help pharmaceutical, biotech, medical device and diagnostics companies, and academic researchers accelerate value, minimize risk, and optimize outcomes.