DA
Senior ML Infrastructure Engineer
Job Description
Databricks is looking for a Senior ML Infrastructure Engineer to build and scale the platform that powers machine learning innovation for thousands of companies worldwide. You will play a critical role in designing, developing, and maintaining the robust, scalable, and efficient infrastructure required for training and deploying large-scale machine learning models. This is an opportunity to work on a high-impact platform at the heart of the AI revolution.
Responsibilities:
- Design, build, and maintain scalable and reliable ML infrastructure, including distributed training systems, model serving platforms, and data pipelines.
- Collaborate with ML engineers and data scientists to understand their infrastructure needs and provide solutions.
- Optimize infrastructure performance, cost, and resource utilization.
- Develop automation tools and frameworks for ML workflows.
- Troubleshoot and resolve complex infrastructure issues in production environments.
Qualifications:
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 5+ years of experience in software engineering, with a focus on distributed systems, cloud infrastructure, or ML platforms.
- Strong programming skills in Python and/or Scala/Java.
- Experience with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Familiarity with ML frameworks (e.g., PyTorch, TensorFlow) and MLOps concepts.
- Experience with big data technologies (e.g., Spark, Hadoop) is a plus.
Benefits:
- Competitive salary, stock options, and performance bonuses.
- Comprehensive health, dental, and vision benefits.
- Generous paid time off and parental leave.
- Professional development and learning opportunities.
- Dynamic and fast-paced work environment.
Skills & Tags
mlopsinfrastructuredistributed systemspythoncloud