Beschreibung
### Overview ###
We are seeking **Data Research Engineers** to join our **Multimodal team**, where we are building the next generation of foundation models across vision, language, audio, and beyond. If you are passionate about exploring, designing, and building high-quality datasets to drive frontier AI models, this role is for you.
At Microsoft AI, data is at the heart of innovation-and in this role, you will collaborate closely with scientists, engineers, and annotators to curate, analyze, and evaluate diverse multimodal data sources critical to model development. You-ll lead efforts in developing novel data collection strategies, improving dataset quality, understanding data-driven model behaviors, and aligning datasets with ethical and societal values.
This is a cross-disciplinary, high-impact role ideal for engineers who want to push the boundaries of what AI can learn from data, especially in multimodal contexts.
Microsoft-s mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
### Qualifications ###
**Required Qualifications:**
* Bachelor's Degree in AI, Computer Science, Data Science, Statistics, Physics, Engineering, or a related technical field AND technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.)
* OR equivalent experience
* Experience in data analysis or data engineering
* Proficiency in statistics and exploratory data analysis methods
* Ability to communicate technical findings effectively to research and product teams
**Preferred Qualifications:**
* Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, Python and common data libraries (Pandas, NumPy, etc.)
* OR equivalent experience.
* Familiarity with data processing frameworks such as Spark, Ray, Apache Beam
* Experience working with large-scale, real-world datasets that are unstructured or semi-structured
\#copilot \#microsoftAI
### Responsibilities ###
* Create high-quality datasets for training and evaluation; run experiments on new datasets (data ablations) to assess their impact and determine the most effective data
* Develop and maintain scalable data pipelines for multimodal ingestion, preprocessing, filtering, and annotation
* Analyze real-world multimodal datasets to assess quality, diversity, relevance, and identify areas for improvement
* Build lightweight tools and workflows for dataset auditing, visualization, and versioning
* Collaborate with Safety, Ethics, and Governance teams to ensure datasets meet standards for quality, privacy, and responsible AI practices