Role Id:
14
Software Engineer - DevOps and MLOps
LOCATION:
San Francisco Bay Area, California USA
Role Description

We are looking to recruit an exceptional Software Engineer - Software Development and Machine Learning Operations to build and maintain the infrastructure that supports our software development, machine learning models, and AI operations.

In this role you will:

  • Design, implement, and manage CI/CD pipelines to facilitate seamless code integration and deployment.
  • Monitor and optimize system performance, availability, and security.
  • Automate infrastructure orchestration and configuration management using tools such as Kubernetes, Ansible, and similar.
  • Configure and maintain data infrastructure appliances.
  • Troubleshoot and resolve issues related to applications, infrastructure, and deployments.
  • Work closely with our development and AI teams to deliver solutions that increase efficiency and stability.
Qualifications

Must-have:

  • BS or MS in software engineering, computer science, or a related field.
  • Proven experience standing up a CI/CD system from scratch.
  • Experience with multi-language build systems (e.g., Bazel, Bob).
  • Proficiency with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes).
  • Experience with automation tools (e.g., Terraform, Ansible, GitHub Actions, Jenkins) and version control systems (e.g., Git).
  • Strong programming skills in languages such as Python, Go, or Java.
  • Self-starter attitude with strong ability to identify problems, prioritize them, then plan and execute working solutions.
  • Enthusiasm for working in a fast paced startup environment and eagerness to support the team on a variety of topics.

Nice-to-have:

  • Experience with MLOps platforms (e.g., MLflow, Kubeflow, or SageMaker).
  • Knowledge of big data technologies (e.g., Hadoop, Spark, or Kafka).
  • Experience with monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack).
  • Understanding of machine learning frameworks (e.g., TensorFlow, PyTorch, or Scikit-Learn).
  • Experience with edge computing and IoT device management.
  • Knowledge of security best practices and compliance standards in AI/ML environments.
  • Proficiency in database management systems (e.g., PostgreSQL, MongoDB, or Cassandra).
  • Experience with infrastructure-as-code tools (e.g., CloudFormation, Pulumi).
  • Knowledge of GitOps practices and tools (e.g., ArgoCD, Flux).
Apply Now