Member of Technical Staff, Fullstack 4014

San Francisco, US-United States
Posted 3 days ago
About The Company

This company is an AI-driven organization focused on building intelligent solutions that enhance productivity and decision-making. By leveraging advanced machine learning, data analysis, and automation, it develops tools that help businesses streamline workflows, personalize experiences, and unlock insights from complex information. With a strong emphasis on usability and real-world impact, the company bridges cutting-edge AI technology with practical business needs.


Key Responsibilities

– Architect Multi-Environment Deployments: Design and manage the infrastructure to deploy our decision engine into diverse environments, including our own cloud and private client VPCs. You will solve the complex challenges of managing software lifecycles across isolated and secure instances.
– Operationalize the Quant Engine: Transform experimental ML models into resilient production services. You will build the training, inference and retraining pipelines that ensure our models are always fresh and performant.
– Build the Event-Driven Backbone: Architect the scheduling and event-driven triggers that power our high-frequency control loops. You will manage the message queues (e.g., Kafka, SQS) and orchestration layers (e.g., Airflow, Dagster) that coordinate data flow between our context, decision and execution engines.
– Guardian of Reliability: Unexpected issues inevitably arise. You will own system health, setting up comprehensive observability (monitoring, logging, tracing) and acting as the primary troubleshooter when things break. You will convert every incident into an automated prevention mechanism.
– Elevate Developer Experience: You view internal engineers as your customers. You will build the CI/CD pipelines, local development environments and tooling that allow us to ship code faster and with higher confidence.


Requirements

– A Senior Operator: You have 6-8+ years of experience in ML Platform, Developer Experience or Infrastructure Software Engineering roles. You have seen how systems break at scale and know how to design them to survive.
– Platform Mindset: You view internal teams as your customers. You have experience in high-leverage domains like ML Platform or Developer Experience, focusing on building the paved road that maximizes engineering velocity without sacrificing reliability.
– Distributed Systems Native: You are comfortable working with complexity at scale. You understand the challenges of data consistency, concurrency and latency in distributed environments.
– Architect of Reliability: You prioritize system health and visibility. You build the observability and automation required to run critical workloads in production, ensuring that complex deployments remain stable even as they scale across diverse environments.


Bonus Points

– MLOps Tooling: Experience with feature stores, model registries and tools like Ray, MLflow or Kubeflow.
– Client VPC / On-Prem Experience: Experience deploying software into customer-controlled environments (AWS/GCP/Azure.)
– Security First: Experience with SOC2 compliance, IAM policies and securing sensitive financial data.

Job Features

Job CategoryFullstack
SenioritySenior IC / Tech Lead
Base Salary$180,000 - $250,000
Recruiterjudy.zhu@ocbridge.ai

Apply Online