System Administrator (GCP/AWS/Azure, PySpark, BigQuery, and Google Airflow)

Macpower Digital Assets Edge Private Limited

San Jose, CA

JOB DETAILS
SALARY
$60–$60 Per Hour
SKILLS
Amazon Web Services (AWS), Ansible, Automation, Best Practices, Big Data, Cloud Computing, Cloud Storage, Command Line, Continuous Deployment/Delivery, Continuous Improvement, Continuous Integration, Cost Control, Data Analysis, Data Collection, Data Management, Data Processing, Data Quality, Data Recovery, Data Sets, Data Storage, Data Warehousing, Database Design, Database Programming, DevOps, Disaster Recovery, Ecosystems, Financial Transactions, GCP (Good Clinical Practices), High Availability, Identify Issues, Jenkins, Linux Operating System, Metrics, Microsoft Visual Studio, Microsoft Windows Azure, Operational Support, Operations Management, People Management, Performance Tuning/Optimization, Python Programming/Scripting Language, Query Optimization, Red Hat Linux Operating System, Release Management/Engineering, Reporting Dashboards, Requirements Management, SAP, Service Level Agreement (SLA), Software Administration, Software Development, Software Patches, Systems Administration/Management, Team Lead/Manager, Time Management, Unix Shell Programming, Validation Testing
LOCATION
San Jose, CA
POSTED
26 days ago
Job Overview: This role involves managing and optimizing Big Data environments (PySpark, BigQuery, Airflow) across Google Cloud, AWS, or Azure platforms, ensuring efficient, secure, and cost-effective operations. Key responsibilities include 24x7 support, data pipeline optimization, automation, and troubleshooting, with a strong emphasis on DevOps, CI/CD, and disaster recovery.

Roles and Responsibilities: (Google Cloud/AWS/Azure, PySpark, BigQuery, and Google Airflow)
  • Participate in 24x7x365 rotational shift support and operations for SAP environments.
  • Serve as a team lead responsible for maintaining the upstream Big Data ecosystem, handling millions of financial transactions daily using PySpark, BigQuery, Dataproc, and Google Airflow.
  • Streamline and optimize existing Big Data systems and pipelines while developing new ones, ensuring efficient and cost-effective performance.
  • Manage the operations team during your designated shift and make necessary changes to the underlying infrastructure.
  • Provide day-to-day support, improve platform functionality using DevOps practices, and collaborate with development teams to enhance database operations.
  • Architect and optimize data warehouse solutions using BigQuery to enable efficient data storage and retrieval.
  • Install, build, patch, upgrade, and configure Big Data applications.
  • Administer and configure BigQuery environments, including datasets and tables.
  • Ensure data integrity, availability, and security on the BigQuery platform.
  • Implement partitioning and clustering strategies for optimized query performance.
  • Define and enforce access policies for BigQuery datasets.
  • Set up query usage caps and alerts to control costs and prevent overages.
  • Troubleshoot issues in Linux-based systems with strong command-line proficiency.
  • Create and maintain dashboards and reports to monitor key metrics such as cost and performance.
  • Integrate BigQuery with other GCP services like Dataflow, Pub/Sub, and Cloud Storage.
  • Enable BigQuery usage through tools such as Jupyter Notebook, Visual Studio Code, and CLI utilities.
  • Implement data quality checks and validation processes to maintain data accuracy.
  • Manage and monitor data pipelines using Airflow and CI/CD tools like Jenkins and Screwdriver.
  • Collaborate with data analysts and scientists to gather data requirements and translate them into technical implementations.
  • Provide guidance and support to application development teams for database design, deployment, and monitoring.
  • Demonstrate proficiency in Unix/Linux fundamentals, scripting in Shell/Perl/Python, and using Ansible for automation.
  • Contribute to disaster recovery planning and ensure high availability, including backup and restore operations.
  • Experience with geo-redundant databases and Red Hat clustering is a plus.
  • Ensure timely delivery within defined SLAs and project milestones, adhering to best practices for continuous improvement.
  • Coordinate with support teams including DB, Google, PySpark data engineering, and infrastructure.
  • Participate in Incident, Change, Release, and Problem Management processes.

Must Have Skills, Experience:
  • 4 8 years of relevant experience.
  • Strong experience with Big Data technologies including PySpark, BigQuery, and Google Airflow.
  • Hands-on expertise in cloud platforms (Google Cloud, AWS, or Azure) and Linux system troubleshooting.
  • Proficiency in automation and DevOps tools such as Shell/Python scripting, CI/CD processes, and Ansible.

About the Company

M

Macpower Digital Assets Edge Private Limited