Site Reliability Engineer (SRE) – Next-Link – Amsterdam

Next-Link

Job Description

Job Description

We are seeking an experienced Site Reliability Engineer (SRE) with a deep understanding of service reliability, observability, and a strong background in working with the ELK Stack (Elasticsearch, Logstash, Kibana) to join our team in Amsterdam. As an SRE, you will play a crucial role in ensuring the reliability, performance, and scalability of our systems. You will work closely with development teams to maintain high availability and reduce the impact of incidents, while driving automation and monitoring initiatives to enhance service quality.

Responsibilities:

  • Service Reliability & Performance: Ensure the availability, reliability, and scalability of production systems and services.
  • Incident Management: Lead the investigation and resolution of incidents, focusing on reducing time-to-recovery (TTR) and preventing recurrence.
  • Monitoring & Observability: Develop and maintain comprehensive observability systems using the ELK Stack (Elasticsearch, Logstash, Kibana) to ensure the team has real-time insights into system health and performance.
  • Automation & Tooling: Develop automation scripts and tooling to reduce manual intervention, enhance operational efficiency, and improve system resilience.
  • Collaboration: Work closely with development teams to design and implement monitoring, logging, and alerting frameworks that ensure systems are production-ready.
  • Capacity Planning & Scaling: Monitor capacity and usage trends, and help with scaling services to meet demand while optimizing costs.
  • Proactively identify opportunities to enhance the reliability of systems, optimize performance, and reduce technical debt.


Requirements

Required Skills and Experience:

  • 10+ years of experience in Site Reliability Engineering (SRE), DevOps, or a similar role with a focus on service reliability.
  • Expert knowledge of ELK Stack (Elasticsearch, Logstash, Kibana) for log aggregation, monitoring, and observability.
  • Proven experience with monitoring frameworks (e.g., Prometheus, Grafana, Nagios) and incident management tools.
  • Strong experience in cloud environments (AWS, GCP, Azure) and infrastructure as code tools (e.g., Terraform, Ansible, Kubernetes).
  • Programming and Scripting Skills: Proficiency in scripting languages such as Python, Go, Bash, or Ruby.
  • Experience with CI/CD pipelines and automation tools to streamline operational tasks and deployments.
  • Strong understanding of distributed systems, microservices architectures, and containerization (e.g., Docker, Kubernetes).
  • Experience in performance tuning, capacity planning, and troubleshooting complex production environments.
  • Excellent understanding of security best practices in production environments.
  • Strong problem-solving and analytical skills, with the ability to approach complex problems with a methodical and solution-oriented mindset.
  • Excellent communication skills and ability to collaborate effectively with both technical and non-technical teams.
  • Familiarity with incident response practices and postmortem analysis.

Requirements
We are seeking an experienced Site Reliability Engineer (SRE) with a deep understanding of service reliability, observability, and a strong background in working with the ELK Stack (Elasticsearch, Logstash, Kibana) In-depth knowledge of Enterprise Structure Design, Finance Master Data, General Ledger, Accounts Receivable, Accounts Payable, Asset Accounting, Product Costing, Statutory Reporting, Tax, Treasury, and Profitability Analysis. Experience with S/4HANA Central Finance and Universal Journal. Hands-on experience in implementing Simple Finance and S/4HANA Finance (at least 2 years). Strong understanding of cross-functional integration with SAP PP, MM, and SD modules. Expertise in S/4HANA Integration using ALE, IDoc, SLT, ABAP, and Fiori. Experience in creating and documenting functional and technical specifications from business requirements. Strong problem-solving skills with the ability to troubleshoot and resolve complex financial system issues.

Lees hier meer

Deel deze vacature: