DATA ENGINEER We are talking about what is HPC, the motivation to use it, and then we choose the world of Slurm to test the...
Blog
Code
Data processing with Dataflow SQL (part 2/2)
DATA ENGINEERIn previous blog post we had a short introduction to Dataflow SQL and Apache Beam in general, which will be great...
GCP pipeline: pub/sub-lookup-storage (part 1/2)
DATA ENGINEERS Description of a data pipeline with simple lookup logic implemented in Google Cloud Platform. This blog post will...
Kerberos – Installation guide, Integration with Apache Kafka and CDH (part 3/3)
DATA ENGINEER Welcome to the last part of our Kerberos series where we will integrate Kerberos authentication mechanism into...
Kerberos – Installation guide, Integration with Apache Kafka and CDH (part 2/3)
DATA ENGINEER We will show you how to integrate Kerberos with probably the most popular data streaming platform on the...
Write error logs from Composer and create an alert policy on Stackdriver
DATA ENGINEERS You want to create a custom alert on Stackdriver for one of your DAG in Composer Airflow? You have heard about...
Kerberos – Installation guide, Integration with Apache Kafka and CDH (part 1/3)
DATA ENGINEER This is the first part of Kerberos series where you will be able to install Kerberos server, Kerberos client, and...
Streaming data from Twitter to GCP
DATA ENGINEER This blog demonstrates the task of ingesting data from remote API (i.e. Twitter's API) to cloud (i.e. Google...
Azure Databricks (part 2/2)
DATA ENGINEERAZURE DATABRICKS Azure Databricks is a fast, easy and collaborative Apache Spark-based analytics platform optimized...