Role Description:
We are looking for a skilled DevOps Engineer that would set up and maintain CI/CD pipelines, manage cloud environments on Google Cloud Platform (GKE or Cloud Run), implement monitoring and logging solutions, and ensure that the platform remains secure, scalable, and reliable. The ideal candidate is proactive, detail-oriented, and experienced in building robust cloud-based environments.
Key Responsibilities:
Infrastructure & Environment Management
Set up and maintain GCP environments (BigQuery, Cloud Composer, Vertex AI, storage, networking).
Manage IAM roles, service accounts, and security policies for data access.
Provision and optimize infrastructure as code (e.g., Terraform/Deployment Manager).
CI/CD & Automation
Build and maintain CI/CD pipelines for Python scripts, orchestration workflows, and Dataform/DBT jobs.
Automate deployments of ETL pipelines and data models from notebooks → production.
Ensure reproducibility and scalability of data pipelines.
Monitoring & Reliability
Set up logging, monitoring, and alerting (e.g., with Stackdriver / Cloud Monitoring).
Track pipeline health, failures, and performance.
Implement data quality checks and alerts for anomalies.
Performance & Cost Optimization
Optimize BigQuery queries and storage costs.
Manage resource utilization (compute, storage, API calls).
Proactively tune workflows for reliability and efficiency.
Collaboration & Support
Partner with data engineers to orchestrate pipelines and support ingestion processes.
Support ML engineers by ensuring infrastructure is ready for model training & deployment.
Provide tooling/documentation to make environments consistent and easy for the small team.
Key Skills & Qualifications:
Strong experience with Google Cloud Platform (GCP), including GKE and Cloud Run.
Proficiency in Kubernetes and container orchestration.
Hands-on experience with Terraform or other Infrastructure as Code tools.
Experience with CI/CD tools such as Jenkins, GitLab CI, or similar.
Proficiency in scripting languages (e.g., Bash, Python).
Solid understanding of monitoring and logging frameworks (e.g., Prometheus, Grafana, ELK, or Stackdriver).
Knowledge of security best practices in cloud environments.
Strong problem-solving skills and the ability to work in a fast-paced environment.
Software Powered by iCIMS
www.icims.com