Big Data essentials training is designed to lay foundation for implementing Big Data in your organization by clarifying various technologies available in Hadoop ecosystem. We will deliver most up-to-date overview of quickly changing Big Data world.

Trainers for this course have great feedback from atendees and that makes us more than confident that your organization and your employees will be satisfied with proposed training.

Our goal is to demystify big data to your employees and to yourselves. With hands on experience exercises on our cloud cluster you will start to feel more comfortable to tackle big data tools and think of what is most important to you – how to solve business challenges using big data technologies.

After this training, your organization’s employees will have essential knowledge about tools available Hadoop ecosystem and with that we expect that new ideas how to solve existing issues or implement new functionalities will arise.

BIG DATA ESSENTIALS AND STREAMING BASICS

Introduction to Big Data concepts and technologies. How to choose the best tool for the task.

Audience

This course builds a fundamental understanding of Big Data challenges and Hadoop technology stack as a solution. It is intended for Data Warehouse architects/developers, and anyone who wants to know about Big Data – business analysts, data analysts and business consultants.

What Will You Learn?

You will learn purpose of Hadoop, its related tools and features they provide in data acquisition, storage, transformations and analysis.

Course Info

Duration: 2/3 days

Pre-requisites: Understanding data value, data warehouse concepts and architecture.

Maximum number of participants in classroom: Up to 12

Course Agenda

Day 1 (Big Data)

  • Data storage – HDFS
  • HDFS architecture
  • Distributed data processing (YARN, MapReduce, Spark)
  • Resource scheduler
  • Data manipulation and movement techniques
  • Streaming and batch scheduling
  • Data movement source – target
  • Downstream integration from HDFS to RDBMS, local file systems
  • Data load with Sqoop and data management
  • Data Analysis with Hive and Spark SQL
  • What tool is used for specific purpose
  • When to use HDFS features
  • When and how to use in memory processing
  • What are advantages and disadvantages in each technique
  • All based on project experience

Day 2 (Big Data)

  • Offloading relational database to Hadoop
  • Airflow / Oozie – schedule and monitor workflows
  • Data formats selection and benefits
  • How to choose best tool for given task
  • Data format selection
  • Advantages of column based formats
  • What should be criteria for tool selection
  • Use cases, experiences, best practices, Q&A
  • Use Case identification
  • Use Case execution process through phases
  • Assessment of success
  • Landscape build up

Day 3 (Streaming)


  • Apache Flume architecture
  • Streaming data from Twitter
  • Introduction to Apache Kafka
  • Apache Kafka Architecture
  • Apache Kafka Development
  • Schema management in Kafka
  • Kafka Connect for Data movement
  • Introduction to Kafka streams for data processing
  • Basic Kafka Administration
  • Confluent Platform

Coursework

The training is based on slides and hands on exercises for atendees. Offloading relational database to Hadoop and other hands on tasks will be presented on live pre-built environment running on Microsoft Azure with relational database as a service and Hadoop distribution cluster.

Our Trainers

Tomislav Domanovac – Tomislav has 20 years of experience in enterprise software solutions, and over 12 years of expertise in numerous DWH/BI projects as a solution architect, data architect, data engineer, senior consultant and R&D manager. Has a vast business knowledge in area of Telecommunications, Retail and Distribution/Logistics. For over 10 years he is designing and delivering training courses in Data Warehousing and Big Data technology areas, always focused on best application of technology as a solution for business challenges.

Josip Tokmacic – Josip has 13 years of experience in enterprise software solutions, streaming architectures and has worked as solution architect and senior consultant in numerous DWH/BI/Big Data projects. Has a vast business knowledge in Banking, Telco and Pharma industry.

This project was co-financed by the European Union’s Competitiveness and cohesion Operational Programme.The content of the website is the sole responsibility of Syntio d.o.o. About the project