Posted at: 22 January

Big Data Engineer (PySpark/Spark) – REF 114 – 01

Company

Xogito

Xogito is a global B2B digital services provider specializing in data management, real-time bidding, e-commerce, and advertising technology solutions.

Remote Hiring Policy:

Xogito supports a fully remote work environment and hires globally, welcoming applications from various regions without specific restrictions.

Job Type

Full-time

Allowed Applicant Locations

Brazil, Worldwide

Apply Here

Job Description

Purpose of the Role

We are seeking a skilled PySpark/Spark Developer to join our dynamic team and contribute to the design and implementation of data-driven solutions. You will be responsible for developing and optimizing distributed data processing pipelines, enabling large-scale data analytics, and ensuring the efficient handling of big data. If you are passionate about working with cutting-edge technologies in a fast-paced environment, this role is for you.

 

Duties and Responsibilities

  • Design, develop, and maintain data pipelines using PySpark and Apache Spark to process and transform large-scale datasets efficiently.
  • Collaborate with data scientists, analysts, and engineers to understand data requirements and translate them into scalable solutions.
  • Optimize Spark jobs for performance and scalability in distributed environments.
  • Build and deploy big data solutions in cloud environments (e.g., AWS, Azure, GCP) using services such as EMR, Databricks, or similar.
  • Implement solutions for real-time data streaming using Spark Streaming or similar frameworks.
  • Develop and maintain data models, ensuring data integrity and consistency.
  • Troubleshoot and debug issues in existing pipelines, ensuring high reliability and availability of systems.
  • Stay updated with the latest trends and advancements in the big data ecosystem.
  • Document technical solutions, data flows, and pipeline architecture to ensure knowledge sharing.

 

Required Experience & Knowledge

  • 3+ years of experience working with Apache Spark and PySpark in production environments.
  • Proficiency in Python (for PySpark), with a strong understanding of data structures and algorithms.
  • Solid experience with distributed data processing frameworks and handling large datasets.
  • Familiarity with cloud services like AWS (e.g., S3, EMR, Glue), Azure (e.g., Databricks, Synapse), or GCP (e.g., Dataflow, BigQuery).
  • Experience with Hadoop ecosystems (e.g., HDFS, Hive, or HBase).
  • Knowledge of real-time data processing frameworks like Kafka or Spark Streaming.
  • Proficiency in working with structured and unstructured data formats such as JSON, Parquet, and Avro.
  • Understanding of data lake architectures, data partitioning, and schema evolution.
  • Hands-on experience with version control systems (e.g., Git) and CI/CD pipelines.

 

Skills and Attributes

  • Strong analytical and problem-solving abilities, with attention to detail.
  • Excellent collaboration and communication skills to work in cross-functional teams.
  • Ability to adapt quickly to new technologies and a fast-paced work environment.
  • High level of ownership and accountability for deliverables.

 

Required Education & Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field (or equivalent practical experience).
  • Advanced level of spoken and written English.
  • Relevant certifications in big data technologies, cloud platforms, or Spark are a plus.

 

Apply Here