Job Description:
• Implement scalable and reliable systems leveraging cloud-based architectures, technologies and platforms to handle model inference at scale.
• Deploy and manage machine learning & data pipelines in production environments.
• Work on containerization and orchestration solutions for model deployment.
• Participate in fast iteration cycles, adapting to evolving project requirements.
• Collaborate as part of a cross-functional Agile team to create and enhance software that enables state-of-the-art big data and ML applications.
• Leverage CICD best practices, including test automation and monitoring, to ensure successful deployment of ML models and application code.
• Ensure all code is well-managed to reduce vulnerabilities, models are well-governed from a risk perspective, and the ML follows best practices in Responsible and Explainable AI.
• Collaborate with Data scientists, software engineers, data engineers, and other stakeholders to develop and implement best practices for MLOps, including CI/CD pipelines, version control, model versioning, monitoring, alerting and automated model deployment.
• Manage and monitor machine learning infrastructure, ensuring high availability and performance.
• Implement robust monitoring and logging solutions for tracking model performance and system health.
• Monitor real-time performance of deployed models, analyze performance data, and proactively identify and address performance issues to ensure optimal model performance.
• Troubleshoot and resolve production issues related to ML model deployment, performance, and scalability in a timely and efficient manner.
• Implement security best practices for machine learning systems and ensure compliance with data protection and privacy regulations.
• Collaborate with platform engineers to effectively manage cloud compute resources for ML model deployment, monitoring, and performance optimization.
• Develop and maintain documentation, standard operating procedures, and guidelines related to MLOps processes, tools, and best practices.
Requirements:
• Master's or doctoral degree in computer science, electrical engineering, mathematics, or a similar field.
• Typically requires 7+ years of hands-on work experience developing and applying advanced analytics solutions in a corporate environment with at least 4 years of experience programming with Python.
• At least 3 years of experience designing and building data-intensive solutions using distributed computing.
• At least 3 years of experience productionizing, monitoring, and maintaining models.
• Must have skills:
• Understanding of Azure stack like Azure Machine Learning, Azure Data Factory, Azure Databricks, Azure Kubernetes Service, Azure Monitor, etc.
• Demonstrated expertise in building and deploying AI/Machine Learning solutions at scale leveraging cloud such as AWS, Azure, or Google Cloud Platform.
• Experience in developing and maintaining APIs (e.g.: REST).
• Experience specifying infrastructure and Infrastructure as a code (e.g.: Ansible, Terraform).
• Experience in designing, developing & scaling complex data & feature pipelines feeding ML models and evaluating their performance.
• Ability to work across the full stack and move fluidly between programming languages and MLOps technologies (e.g.: Python, Spark, DataBricks, Github, MLFlow, Airflow).
• Expertise in Unix Shell scripting and dependency-driven job schedulers.
• Understanding of security and compliance requirements in ML infrastructure.
• Experience with visualization technologies (e.g.: RShiny, Streamlit, Python DASH, Tableau, PowerBI).
• Familiarity with data privacy standards, methodologies, and best practices.
Benefits:
• Significant career development opportunities exist as the company grows.
• The position offers a unique opportunity to be part of a small, fast-growing, challenging and entrepreneurial environment, with a high degree of individual responsibility.