About HILOS
HILOS is building the first 3D design platform made specifically for footwear—no CAD required. Powered by machine learning, our core geometry engine breaks down barriers between 2D and 3D, design and manufacturing.
Our cross-functional team sits at the intersection of footwear and software, combining craft and intelligence to give designers a second brain and third hand. We’re here to amplify human impact in real-world creation—and we believe the tools should be just as inspiring as the work they enable.
Role Overview
We're seeking a Lead Platform Engineer to own and evolve our production infrastructure as we scale our creator platform and computational design capabilities. This is a high impact role with significant ownership: you'll be the infrastructure partner who ensures our platform is reliable, observable, and scalable while enabling product engineers to ship features rapidly and safely.
You'll architect and operate the systems that power 3D geometry processing, algorithm execution, and manufacturing workflows—building platform abstractions that allow us to integrate specialized services cleanly and swap providers as our needs evolve. This is a high-impact, hands-on role where you'll make fundamental technical decisions, establish engineering standards, and directly shape how we build software at HILOS.
Responsibilities
- Production Infrastructure & Reliability:
- Own and operate AWS infrastructure including ECS container orchestration, managed database services, and event streaming platforms
- Ensure high uptime for creator-facing services through proactive monitoring, incident response, and capacity planning
- Implement automated alerting, runbooks, and on-call practices that enable rapid diagnosis and resolution of production issues
- Conduct capacity planning and performance optimization to support platform growth
- Platform Architecture & Service Integration:
- Design and implement platform abstractions for integrating external services (3D geometry engines, ML inference, rendering pipelines, manufacturing APIs)
- Build systems that allow clean swapping of providers and independent scaling of components as requirements evolve
- Manage our transition from REST-based to event-driven architecture, leading the implementation of our Kafka-based messaging infrastructure while maintaining production stability.
- Improve coding standards and architectural patterns for the engineering team to follow
- Technical Decision-Making & Strategy:
- Drive build-vs-buy decisions for platform services, database technologies, event streaming approaches, and observability tooling
- Balance startup constraints (cost, time-to-market, team size) with technical excellence and future scalability
- Evaluate distributed systems patterns (microservices, service mesh, event-driven architecture) and make pragmatic choices
- Establish and maintain service contract standards using a hybrid approach: OpenAPI specs for external-facing APIs, workflow YAML for internal service orchestration
- Lead the adoption of consumer-driven contract testing to enable independent deployments across frontend and backend teams
- Implement cost optimization strategies and monitoring to ensure efficient resource utilization
- Observability & Operational Excellence:
- Maintain comprehensive observability stack including APM tooling, distributed tracing across microservices, and structured logging aggregation
- Maintain custom metrics dashboards that provide visibility into system health and business-critical workflows
- Implement monitoring and alerting strategies that catch issues before they impact creators
- Document architecture, data flows, and operational procedures to enable team autonomy
- Computational Workload Infrastructure:
- Build and maintain async job processing infrastructure for computationally intensive workloads (3D model processing, algorithm execution, batch operations)
- Design proper queuing, retry logic, dead letter handling, and job status visibility
- Collaborate with services teams to understand computational design workload characteristics and optimize infrastructure accordingly
- Translate research prototypes from our Lab into production-ready systems
- Infrastructure Automation & DevOps:
- Own and evolve GitHub Actions CI/CD pipelines for containerized deployments to AWS ECS
- Implement infrastructure-as-code practices (Terraform preferred) for AWS resource management
- Manage environment parity across dev, staging, and production (currently a known gap)
- Maintain secrets management via AWS Secrets Manager with build-time and runtime injection patterns
- Build CI/CD pipelines with automated testing for infrastructure changes and deployment automation
- Establish practices that allow other engineers to safely make infrastructure changes independently
- Create self-service tooling that reduces infrastructure-related blockers for product engineering
- Database & Data Infrastructure:
- Design and operate relational database systems (RDS/Aurora) with proper schema design, indexing, and query optimization
- Implement backup, recovery, and high-availability strategies for critical data
- Work with product engineers to design data models that support current features while anticipating future needs
- Establish data migration and schema evolution practices
- Team Enablement & Knowledge Sharing:
- Lead the team on API design, service architecture, and deployment best practices
- Mentor other engineers on infrastructure practices, distributed systems concepts, and operational excellence
- Foster a culture of ownership, reliability, and continuous improvement
Qualifications
- 5-7+ years of production experience with AWS infrastructure, specifically ECS/Fargate, RDS/Aurora, and event streaming platforms (Kinesis, MSK, or similar)
- Strong hands-on experience building and operating distributed systems at scale, including microservices architecture and inter-service communication patterns
- Expertise in containerization and orchestration (Docker, ECS) including resource optimization, health checks, and deployment strategies
- Experience with event-driven architecture patterns