2026.02.28
Senior Software Engineer, Incident Lake
Senior Software Engineer, Incident Lake (SIGQ)
Description
As a Senior Software Engineer for Incident Lake, you will take full ownership of the lifecycle of our incident data platform. Incident Lake is the mission-critical backbone of our reliability strategy, requiring a system that remains operational even during catastrophic infrastructure failures.
You will lead the design and implementation of a resilient, multi-cloud, and multi-region architecture—spanning from the CDN layer to the database layer. Furthermore, you will drive technical decisions using a first-principles approach; we do not rely on "fuzzy" LLM API calls by default. Instead, you will engage in rigorous discussions to select the optimal method—ranging from classical Machine Learning to state-of-the-art LLMs—to solve complex reliability challenges.
Core Responsibilities
・Resilient Architecture: Design and operate a highly available system utilizing multi-cloud and multi-region strategies (from CDN to DB) to guarantee the reliability of Incident Lake.
・Full-Stack Ownership: Drive the entire product lifecycle, from high-level architecture to hands-on implementation and proactive day-to-day improvements.
・Pragmatic AI/ML Integration: Evaluate and implement the most effective algorithmic approaches. Lead technical discussions to decide between classical ML or LLM-based solutions based on performance, cost, and reliability.
・Operational Excellence: Own the stability of the platform by managing on-call rotations and implementing long-term architectural fixes to prevent systemic issues.
・Cross-functional Leadership: Collaborate with SREs and stakeholders to ensure the platform meets the highest standards of the SIGQ infrastructure.
Requirements
・Education: Bachelor’s degree (or higher) in Computer Science, Information Engineering, or a related technical field (Required for foundational knowledge in algorithms, data structures, and system design).
・System Design: 3+ years of experience in large-scale backend systems with a focus on multi-cloud (GCP/AWS) and multi-region architecture.
・Infrastructure Mastery: Deep understanding of the full stack, including CDN configurations, Load Balancing, and distributed Database management.
・Analytical Approach: Ability to discuss and select technologies (ML vs. LLM) based on mathematical and engineering logic rather than trends.
・Observability: Strong knowledge of modern infrastructure and observability technologies (Kubernetes, Terraform, Datadog).
・Data Modeling: Expert knowledge of RDBMS/NoSQL design and query optimization for high-load environments.
・Testing & Quality: Experience in comprehensive software testing (Unit, Functional, E2E).
・Soft Skills: Strong communication skills to collaborate with stakeholders.
Language Requirements
・English: Independent (CEFR - B2) — Required
・Japanese: Independent (N2 or higher) — Optional
Preferred Experiences
・Language Stack: Experience designing and operating APIs in TypeScript (Node.js) and Go.
・Advanced Tech: Experience with agentic systems or MCP (Model Context Protocol) servers and tools.
・Incident Response: Direct experience in on-call support and managing high-priority production incidents.
・Methodology: Experience with Agile development and third-party API integrations.
We are looking for
・Business-Driven Engineering: You recognize that technology is a powerful tool to drive the business forward, not an end in itself. You possess the ability to maintain a bird's-eye view of the entire organization and make technical decisions that provide the highest strategic value.
・Proactive & Flat Proposals: Even in the absence of established processes or frameworks, you can independently propose solutions from both a business and engineering perspective. You engage in "flat" discussions, evaluating ideas based on their merit and impact rather than hierarchy or trends.
・The Architect-Operator: You are not just a coder; you are an architect who takes pride in the operational health of the systems you build. You think in terms of "99.999%" and are always looking for ways to make the system more robust against regional outages.
・Scientific Rigor: You approach problem-solving with the rigor of a computer scientist, preferring evidence-based technical selection (from classical ML to LLMs) over hype.
How to apply
Please apply via this website