ME
Senior ML Infrastructure Engineer
Job Description
Meta is seeking an experienced Senior ML Infrastructure Engineer to build and maintain the next generation of our machine learning platform. You will be responsible for developing scalable, reliable, and efficient infrastructure that enables our researchers and engineers to train and deploy cutting-edge AI models. This role requires a deep understanding of distributed systems, cloud computing, and ML workflows.
Responsibilities:
- Design, build, and operate large-scale ML infrastructure, including compute clusters, storage systems, and networking.
- Develop tools and services to streamline ML workflows, such as data pipelines, experiment tracking, and model serving.
- Optimize ML training and inference performance.
- Collaborate with ML researchers and engineers to understand their needs and provide robust solutions.
- Contribute to the overall architecture and strategy of Meta's AI infrastructure.
Requirements:
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field, or equivalent practical experience.
- 5+ years of experience in software engineering, with a significant focus on distributed systems and infrastructure.
- Experience with cloud computing platforms (AWS, Azure, GCP).
- Proficiency in programming languages like Python, C++, or Go.
- Familiarity with ML frameworks (PyTorch, TensorFlow) and MLOps concepts.
- Experience with containerization technologies (Docker, Kubernetes).
Benefits:
- Highly competitive salary, bonus, and stock grants.
- Comprehensive health, wellness, and retirement benefits.
- Relocation assistance (if applicable).
- Opportunities to shape the future of AI infrastructure at a global scale.
Skills & Tags
mlopsinfrastructuredistributed systemspythonc++