Tentative Schedule: Monday - Friday 8:00 a.m. - 5:00 p.m. This position is responsible for designing, evaluating, and creating systems to support data science projects across the O'Reilly organization, as well as expanding and optimizing our data and data pipeline architecture. This includes data cleansing, preparation, and ETL. The ideal candidate will identify and work with the appropriate technology and software engineering solutions to facilitate machine learning and analytic pipeline deployment.Essential Job Functions
• Move, structure, encode, and condense data from disparate database systems and formats.
• Identify, design, and implement internal process improvements such as, automating manual processes, optimizing data delivery, and re-designing infrastructure for greater scalability.
• Create data tools for analytics and data science team members that assist them in building and optimizing solutions to become an innovative industry leader.
• Evaluate performance of machine learning systems and work with data scientists to improve quality.
• Build processes supporting data transformation, data structures, metadata, dependency and workload management.
• Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
• Develop software solutions with a focus on maintainability and modularity.
Skills and Qualifications
• Bachelor's degree.
• 2+ years of practical experience with ETL, data processing, database programming and data analytics.
• Strong knowledge of Python; including Pandas, Numpy, SciKit-Learn, and experience with notebooks such as Jupyter.
• Demonstrable knowledge of software design and engineering best practices.
• Experience working with large-scale distributed data systems.
• Excellent written and verbal communication skills.
• Desire to work in a dynamic and collaborative environment.