Labs & musings

Cloudera cluster on Alibaba Cloud Cloudera cluster on Alibaba Cloud

Code / 21.12.2020

Cloudera cluster on Alibaba Cloud

The Cloudera cluster can be run separately or within a cloud environment. In this blog Cloud Infrastructure Engineer Karlo Kričkić will explain how you can install it within the Alibaba Cloud environment in several easy steps.

Cloudera Enterprise is a modern platform for machine learning and analytics, optimized for the cloud to be: 

  • Unified - brings your data warehouse, data science, data engineering, and operational database workloads together on a single integrated platform

  • Hybrid - the most popular data warehouse and machine learning engines that can run on any compute resource for ultimate deployment flexibility

  • Enterprise-grade - the scale and performance required for today’s modern data workloads meets the security and governance demanded by today’s IT departments

Cloudera Enterprise provides the following solutions:

  • Data Warehouse

  • Data Science

  • Data Engineering

  • Operational Database

  • Run Everything in the Cloud, Multi-Cloud, or on a Hybrid "Cloud / On-Premises" Deployment

Task Overview

Cloudera works on the principle of master and worker nodes. The installation consists of several steps:

  • Creating the network
  • Creating the virtual machines
  • Preparation of OS (in this example we will use CentOS)
  • Installing the Cloudera manager
  • Installing the Cloudera cluster 

Creating a network

Alibaba Cloudby default requires the creation of a virtual network so that machines can communicate with each other and be visible on the public Internet. Network creation takes place through the interface of the Alibaba Cloud itself.

Alibaba Cloud network setup

Go to Virtual Private Cloud Console and choose  Create VPC. 

Image1

In create network dialog enter:

  • Name: Cloudera
  • IP Range: 192.168.0.0/16 (Default CIDR Block)
  • VSwitches: Cloudera
  • Frankfurt Zone A

Click on OK and you are finished creating private network.

Alicloud Virtual Machines

Next step is to create Virtual Machines in Alibaba Cloud;  two management nodes and five worker nodes.  

Basic Configuration

Basic configuration

Region: Germany(Frankfurt) – Zone A
Type: ecs.sn2.medium – General Purpose Type sn2) (2 vCPU, 8GB RAM)
Public image: CentOS 7 – 7.7 64-bit; Security Enhancement
Storage: Standard SSD 80 GB; Release with Instance

Networking

Networking

Network: Use existing network which you set up on 1. step:

  • Cloudera
  • Assign Public IP Address
  • Bandwith Billing: Pay-By-Traffic
  • Peak Bandwith: 20 Mbps
  • Security Group auto selected by VPC
  • Elastic Network Interface: VSwitch Cloudera
System Configuration

Cloudera manager setup

Logon credentials : Password
Logon Password :
Instance Name : clouderam01 or clouderaw01
Host : clouderam01 or clouderaw01

Preview

Preview

Click on Create Instance and repeat this procedure for every master or worknode in cluster. 

CentOs preparation Alibaba Cloud

Connect to each server using built-in web Console, or your favorite SSH client and create admin user: 

       adduser admin

Use the passwd command to update the new user’s password. 

       passwd admin

Set and confirm the new user’s password at the prompt. 

Set password prompts:
Changing password for user admin.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.

Use the usermod command to add the user to the wheel group. 

usermod - aG wheel admin

By default, on CentOS, members of the wheel group have sudo privileges.

Use the su command to switch to the new user account.

su - admin

Step1 - Remove sudo password for user admin: 

sudo vi /etc/sudoers

Addline totheendofthefile: 

admin ALL=(ALL) NOPASSWD: ALL

Step2 - Disable firewall (if enabled): 

sudo systemctl disable firewalld 
sudo systemctl stop firewalld

 Step3 - Disable root login remotely

sudo vi /etc/ssh/sshd_config

PermitRootLoginno

sudo systemctl restart sshd.service

Step4 - Prepare host name lookup

(add IP address followed by FQDN (fully qualified domain name) and short name – make sure FQDN is first after IP address otherwise Cloudera Manager takes short name during installation which results with unsuccessful installation):

sudo vi /etc/hosts 

Add these lines to the file:  

192.168.0.118 clouderaw05 clouderaw05
192.168.0.117 clouderaw04 clouderaw04
192.168.0.116 clouderaw03 clouderaw03
192.168.0.115 clouderaw02 clouderaw02
192.168.0.114 clouderaw01 clouderaw01
192.168.0.113 clouderam02 clouderam02
192.168.0.112 clouderam01 clouderam01

            Disable all IPv6 entries on all hosts.

#::1localhost localhost.localdomain localhost6 localhost6.localdomain6
#127.0.0.1localhost localhost.localdomain localhost4 localhost4.localdomain4

Step5 –setup auto login from ‘clouderam01’ to all hosts:

First, let’s create a public and private key pair on the main Cloudera Manager server ‘clouderam01’ using (leave defaults when asked):  

cd /home/admin 
ssh-keygen 
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
chmod  700 ~/.ssh 
chmod  600 ~/.ssh/* 

Now let’s copy the public key to all included servers:  

scp /home/admin/.ssh/id_rsa.pub admin@clouderam01:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderam02:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderaw01:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderaw02:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderaw03:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderaw04:id_rsa.pub
scp /home/admin/.ssh/id_rsa.pub admin@clouderaw05:id_rsa.pub

On each server add this public key into authorized_keys:  

mkdir  -p ~/.ssh 
cat id_rsa.pub >> ~/.ssh/authorized_keys 
chmod 700 ~/.ssh 
chmod 600 ~/.ssh/* 

All hosts ssh from clouderam01 must be valid (paswordless):  

ssh clouderam01 (as well) 
ssh clouderam02
ssh louderaw01
ssh clouderaw02
ssh clouderaw03
ssh clouderaw04
ssh clouderaw05

Cloudera Manager setup

Now that the VMs are properly configured, it's time to install the Cloudera Manager on our main node 'clouderam01'. 

Connect to the node and download the installer:  

wget https://archive.cloudera.com/cm6/6.3.1/cloudera-manager-installer.bin

Next up, we need to give the installer executable permission: 

chmod u+x  cloudera-manager-installer.bin

And lastly – run the Cloudera Manager Server installer:  

sudo ./cloudera-manager-installer.bin

The installation is fairly simple (next – next – next – accept – accept - finish). 

After the installation, the Cloudera Manager service should be up and running after a couple of minutes. But, before we're able to access the manager, we need to open up some ports that Cloudera Manager uses. 

Cloudera manager setup

Go to Elastic Compute Service and select in right menu Network & Security - Security Groups- name of VPC.

Click on Add security group rule and add port 7180.

Add security group rule

Cloudera cluster setup

Open your browser and type in the address of you cloudera main node:  

http://ip_address::7180/

If everything up to this point was done correctly, you should get a login screen:  

Login screen

Login using username/pass: admin/admin.

Accept the agreement and move to choose which edition to deploy. 

Choose 'free' edition. 

Cloudera Manager

Click continue the next screen and you’ll come to the host specification part of the cluster installation. 

To specify our hosts, enter the following: 

 clouderam[01-02]

clouderaw [01-05]

specify hosts for cdh cluster intallation

This will now search for all ‘clouderam’ hosts that end with numbers from 01 to 02 and all ‘clouderaw’ hosts that end with numbers from 01 to 05. That would be all our VMs. 

Click ‘Search’. 

If done correctly, the search will find our 7 hosts: 

Found hosts for CDH cluster installation

Click ‘Continue’. 

On the next screen leave everything as is, just under ‘Additional Parcels’ choose KAFKA (at least we did): 

Cluster intallation

Click ‘Continue’. 

On the next screen, select the checkbox ‘Install Oracle Java SE Development Kit (JDK)’ and click ‘Continue’: 

JDK kit

Do not enable Single User Mode on the next screen. Just click ‘Continue’.  

We login using ‘admin’ user, so put its login info on the next screen: 

Admin setup

Continue. 

Agent installation will start. If everything up to this point was done correctly, all bars should be green by the end with text ‘Installation completed successfully’: 

Agent installation

Side note: there’s a common error here if /etc/hosts on our VMs is not configured properly - agents won’t be able to heartbeat and installation will fail, but only after everything has already been installed. 

To fix this – check your /etc/hosts file if everything was typed in correctly. 

Click ‘Continue’ and wait for selected parcels to be installed:  

Cluster installation

Continue when finished. 

At the end you will get a Cluster Installation Validations and Summary. 

There should be no errors, maybe only some warnings. 

Click ‘Finish’. 

Cloudera custom services

Select ‘Custom Services’ when asked which combination of services to install. A new menu will appear. Select every service type except for isilon, key-value store indexer, solr (at least in our case i.e. choose depending on your needs):  

Custom Services

Continue. 

Now it’s time to Customize Role Assignments. 

Here we assign which hosts will be management nodes (NameNode) and which will be workers (DataNode). In our case clouderam01/02 are primary/secondary NameNode and others are DataNodes. Most of the services are divided between NameNodes. Usually were host is not assigned (e.g. Kafka MirrorMaker) we left it empty as it is. 

The result looks like this: 

Cloud setup

Cloudera management services (2)

And in ‘View By Host’:

View by host

Click ‘Continue’ when finished. 

On the next screen choose ‘Use Embedded Database’, Test Connection and Continue. 

Cluster Setup

You will arrive to the Review Changes screen. 

By default, some services haven’t set directory paths for Kudu.

Kudu

Add value as mentioned in picture. 

Kudu Master Data Directories : data/kudu/master_wal
Kudu Tablet Server WAL Directory : data/kudu/tablet_wal
Kudu Tablet Server WAL Directory : data1/kudu/master_wal
Kudu Tablet Server Data Directories : data1/kudu/tablet_data
     data2/kudu/tablet_data
     data3/kudu/tablet_data

Click continue and finish setup.

Conclusion

In this blog we covered the installation of a clouder cluster on an Alibaba Cloud environment and the preparation of the environment itself.

References:

https://eu.alibabacloud.com/en

https://www.cloudera.com/downloads/manager/6-3-1.html

BACK TO LAB

Cookie policy

To make this website run properly and to improve your experience, we use cookies. For more detailed information, please check our Cookie Policy.

Choice of cookies on this website

Allow or deny the website to use functional and/or advertising cookies described below:

Settings Accept necessary Accept selected