BNY Mellon Careers

Principal Reliability Engineer - App Engine

Jersey City, New Jersey
Information Technology

Job Description

Technology Services Group's Application Engine is a strategic platform and container as a service that schedules and runs containerized and non-containerized applications on Linux and Windows across our data centers.  Our bank's systems power nearly a quarter of the global economy and our application runtime engine will power our private cloud and enable transparent usage of public clouds.


We're building our Application Runtime platform to enable highly available and resilient applications, an efficient developer workflow and an automated "no-ops" environment for our operators.   Our team is responsible for the architecture, design, and implementation of a new, cutting edge, application runtime platform.  Our team's skillset is broad, and includes engineers who consider themselves Developers, Devops, Systems Engineers and Reliability Engineers.  As a diverse platform team, we know how software is built, configured and deployed.  We write highly available and resilient services, plugins and agents using Golang, Java and Javascript.   We configure, automate and run infrastructure and platform services; this includes Mesos and Docker clusters, Logging Services platform based on ELK, Docker Registries, Monitoring using Prometheus and Secrets Services using Vault.  We understand middleware and infrastructure and provide the tools and services that allow developers to run their applications.  Additionally, our on-boarding and engagement team helps developers understand and use the platform.


Because we are diverse, we want a range of skillsets - from developer to Linux and reliability engineer.  We are looking for engineers and leaders who are are passionate about this space and want to work in a highly collaborative and technical environment using innovative tooling and technologies and with a team that has years of experience building and operating mission critical systems.


On this team, you will have the opportunity to:

  • Work with and become an expert on Docker, Linux and Application orchestrators - such as Nomad, Mesos and K8s.
  • Research, design, and implement software components powering our cloud platform.
  • Develop features in an agile environment where we quickly prototype and iterate on functionality.
  • Develop robust functionality in a complex, distributed systems code-base. 
  • Work extensively with open source software.  You may even modify or extend code maintained as part of an open source project.
  • Deploy and scale critical services and features that are used by thousands of developers and potentially impact millions of end users.
  • Employ both Object Orientated development skills and Systems Engineering skills.
  • Code services and User Interfaces in Golang, Java, Groovy, Javascript using frameworks such as Vert.x, SpringBoot and Angular.
  • Use forward thinking tools such as Terraform, Salt and Puppet, to automate on Linux and Windows.


As a Reliability Engineer, you use your diverse skillset to:

  • Ensure our platform is reliable and available.
  • Detect, investigate and resolve issues using Linux and container introspection and monitoring tooling.
  • Design and deliver software to improve the visibility, reliability, availability, scalability and security of the platform and its components.
  • Develop automation to auto-correct or completely prevent issues in our platform.
  • Build and manage systems, infrastructure, clusters and applications through data collection and automation.
  • Deploy, support and monitor new and existing services.
  • Design systems to enable rapid development, high availability, and clear observability.
  • Improve monitoring, alerting and documentation.
  • Engage with our software engineering teams on support issues and improvements to our tools, processes, and software.
  • Collaborate with engineers to ensure services are designed to be cloud-native, scalable, and easily operated.
  • Act as a conduit between infrastructure and development teams, being sympathetic to the concerns and priorities of both.
  • Participate in a regular shift and on-call rotation; this will include a weekend working schedule.


Principal Developer->> Consults with internal business groups to provide high-level application software development services or technical support. Provides comprehensive senior-level technical consulting to IT management and senior technical staffs. Evaluates compliance with the organization's technology standards. Works with internal business groups on implementation opportunities, challenges, and requirements of various applications. Analyzes information and provides recommendations to address and resolve business issues for a specific business group. Guides and consults with IT management and technical staffs regarding use of emerging technologies and associated services. Participates in defining corporate implementation and integration strategies of new technologies. Advocates for innovative, creative technology solutions. Contributes to the achievement of area objectives.


  • Bachelor's degree in computer science engineering or a related discipline, or equivalent work experience required.
  • Ten to twelve (10-12) years of experience in software development is required.
  • Experience in the securities or financial services industry is a plus.


Preferred Qualifications:

  • Strong teamwork, sense of ownership, Customer service, and integrity demonstrated through clear communication.
  • Demonstrated ability to write programs using a high-level programming language like: Go, Java, Python or Ruby.
  • Experience managing large numbers of diverse systems with configuration management systems like: Puppet, Chef, Ansible, or Salt.
  • Good understanding of the Linux Operating System, including Kernel, Memory, Process, Threads and Storage.
  • Understanding of standard networking protocols and components such as: HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing.
  • Demonstrated ability to troubleshoot errors across multiple layers/components.
  • Seven or more (7+) years of Linux system administration skills.
  • Experience of automation of code deployment through the use of containers
  • Deep understanding of Linux Networking - iptables, etc, Security (selinux), Storage, Containers.
  • Experience of automation of code deployment through the use of containers.
  • Excellent analytical and problem-solving skills.
  • Excellent interpersonal and communication skills, verbal and written.

For over 230 years, the people of BNY Mellon have been at the forefront of finance, expanding the financial markets while supporting investors throughout the investment lifecycle. BNY Mellon can act as a single point of contact for clients looking to create, trade, hold, manage, service, distribute or restructure investments & safeguards nearly one-fifth of the world's financial assets. BNY Mellon remains one of the safest, most trusted and admired companies. Every day our employees make their mark by helping clients better manage and service their financial assets around the world. Whether providing financial services for institutions, corporations or individual investors, clients count on the people of BNY Mellon across time zones and in 35 countries and more than 100 markets. It's the collective ambition, innovative thinking and exceptionally focused client service paired with a commitment to doing what is right that continues to set us apart. Make your mark:

Client Technology Solutions provides our business partners with client-focused, technology-based solutions. These enhance their ability to be successful through world-class software solutions and leading-edge infrastructure. Client Technology Solutions provides employees with the tools and resources to enhance their professional qualifications and careers.

BNY Mellon is an Equal Employment Opportunity/Affirmative Action Employer.
Minorities/Females/Individuals With Disabilities/Protected Veterans.

Primary Location: United States-New Jersey-Jersey City
Internal Jobcode: 45198
Job: Information Technology
Organization: Technology Services Group-HR06725
Requisition Number: 1804933