JOB SUMMARY Seeking a Data Engineer with Spark & Streaming skills to build real-time, scalable data pipelines using tools like Spark, Kafka, and cloud services (GCP) to ingest, transform, and deliver data for analytics and ML.
Key Responsibilities • Design, develop, and maintain ETL/ELT data pipelines for batch and real-time data ingestion, transformation, and loading using Spark (PySpark/Scala) and streaming technologies (Kafka, Flink). • Build and optimize scalable data architectures, including data lakes, data warehouses (BigQuery), and streaming platforms. • Optimize Spark jobs, SQL queries, and data processing workflows for speed, efficiency, and cost-effectiveness. • Implement data quality checks, monitoring, and alerting systems to ensure data accuracy and consistency.
Required Qualifications • Programming: Strong proficiency in Python, SQL, and potentially Scala/Java. • Big Data: Expertise in Apache Spark (Spark SQL, DataFrames, Streaming). • Streaming: Experience with messaging queues like Apache Kafka, or Pub/Sub. • Cloud: Familiarity with GCP, Azure data services. • Databases: Knowledge of data warehousing (Snowflake, Redshift) and NoSQL databases. • Total IT Experience: Minimum 8 years. • GCP Experience: 4+ years of recent GCP experience.
Preferred Qualifications • Tools: Experience with Airflow, Databricks, Docker, Kubernetes.
Certifications
This field is requiredPlease enter valid emailId.
This field is requiredPlease enter valid cell phone.
This field is requiredPlease enter valid first name.
This field is requiredPlease enter valid last name.