Labs & musings

Cloudera cluster on Alibaba Cloud Cloudera cluster on Alibaba Cloud

Code / 21.12.2020

Cloudera cluster on Alibaba Cloud

The Cloudera cluster can be run separately or within a cloud environment. In this blog Cloud Infrastructure EngineerKarlo Kričkić will explain how you can install it within the Alibaba Cloud environment in several easy steps.

Cloudera Enterprise is a modern platform for machine learning and analytics, optimized for the cloud to be:

  • Unified - brings your data warehouse, data science, data engineering, and operational database workloads together on a single integrated platform

  • Hybrid - the most popular data warehouse and machine learning engines that can run on any compute resource for ultimate deployment flexibility

  • Enterprise-grade - the scale and performance required for today’s modern data workloads meets the security and governance demanded by today’s IT departments

Cloudera Enterprise provides the following solutions:

  • Data Warehouse

  • Data Science

  • Data Engineering

  • Operational Database

  • Run Everything in the Cloud, Multi-Cloud, or on a Hybrid "Cloud / On-Premises" Deployment

Task Overview

Cloudera works on the principle of master and worker nodes. The installation consists of several steps:

  • Creating the network
  • Creating the virtual machines
  • Preparation of OS (in this example we will use CentOS)
  • Installing the Cloudera manager
  • Installing the Cloudera cluster

Creating a network

Alibaba Cloudby default requires the creation of a virtual network so that machines can communicate with each other and be visible on the public Internet. Network creation takes place through the interface of theAlibaba Clouditself.

Alibaba Cloud network setup

GotoVirtualPrivateCloudConsoleandchoose CreateVPC.

Image1

Increatenetworkdialogenter:

·         Name:Cloudera

·         IPRange: 192.168.0.0/16 (DefaultCIDRBlock)

·        VSwitches:Cloudera

·         Frankfurt Zone A

ClickonOKandyouarefinishedcreating a privatenetwork.

AlicloudVirtual Machines

NextstepistocreateVirtualMachinesinAlibabaCloud; twomanagementnodesandfiveworkernodes.

BasicConfiguration

Basic configuration

Region:Germany(Frankfurt) – Zone A

Type: ecs.sn2.medium – GeneralPurposeTypesn2) (2vCPU, 8GB RAM)

Publicimage:CentOS7 – 7.7 64-bit; SecurityEnhancement

Storage: Standard SSD 80 GB;ReleasewithInstance

Networking

Networking

Network: Useexistingnetworkwhichyousetupon 1.step:

  • Cloudera

  • AssignPublicIPAddress

  • BandwithBilling:Pay-By-Traffic

  • PeakBandwith: 20 Mbps

  • Security Group autoselectedbyVPC

  • ElasticNetwork Interface:VSwitchCloudera

SystemConfiguration

Cloudera manager setup

Logoncredentials: Password

LogonPassword :

Instance Name : clouderam01orclouderaw01

Host: clouderam01orclouderaw01

Preview

Preview

ClickonCreateInstanceandrepeatthisprocedure foreverymasterorworknodeincluster.

CentOspreparation Alibaba Cloud

Connecttoeachserverusingbuilt-inwebConsole,oryourfavorite SSHclientandcreateadminuser:

       adduseradmin

Usethe passwd commandtoupdatethenewuser’spassword.

       passwdadmin

Setandconfirmthenewuser’spassword attheprompt.

Set passwordprompts:
Changingpassword foruseradmin.
New password:
Retypenewpassword:
passwd:allauthenticationtokensupdatedsuccessfully.

Usethe usermod commandtoaddtheusertothe wheel group.

usermod-aGwheeladmin

Bydefault, onCentOS,membersofthe wheel grouphavesudoprivileges.

Usethe su commandtoswitchtothenewuseraccount.

su -admin

Step1- Removesudopassword foruseradmin: 

sudo vi /etc/sudoers

Addline totheendofthefile: 

adminALL=(ALL) NOPASSWD: ALL

Step2 –Disablefirewall(ifenabled): 

sudo systemctl disable firewalld 
sudo systemctl stop firewalld

 Step3Disablerootloginremotely

sudovi /etc/ssh/sshd_config

PermitRootLoginno

sudosystemctlrestartsshd.service

Step4 –Preparehostnamelookup

(addIPaddressfollowedbyFQDN (fullyqualifieddomainname)andshort name– make sure FQDNisfirstafterIPaddressotherwiseClouderaManagertakesshort nameduringinstallationwhichresultswithunsuccessfulinstallation):

sudovi /etc/hosts 

Addtheselinestothefile: 

192.168.0.118 clouderaw05clouderaw05
192.168.0.117 clouderaw04clouderaw04
192.168.0.116 clouderaw03clouderaw03
192.168.0.115 clouderaw02clouderaw02
192.168.0.114 clouderaw01clouderaw01
192.168.0.113 clouderam02clouderam02
192.168.0.112 clouderam01clouderam01

           DisableallIPv6entriesonallhosts.

#::1localhostlocalhost.localdomainlocalhost6 localhost6.localdomain6
#127.0.0.1localhostlocalhost.localdomainlocalhost4 localhost4.localdomain4

Step5 –setupauto loginfrom‘clouderam01’ toallhosts:

First,let’screateapublicandprivatekeypaironthemainClouderaManager server ‘clouderam01’using (leavedefaultswhenasked):

cd /home/admin 
ssh-keygen 
cat~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys 
chmod 700 ~/.ssh 
chmod 600 ~/.ssh/* 

Nowlet’scopythepublickeytoallincludedservers:

scp/home/admin/.ssh/id_rsa.pub admin@clouderam01:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderam02:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw01:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw02:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw03:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw04:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw05:id_rsa.pub

Oneachserveraddthispublickeyinto authorized_keys:

mkdir -p ~/.ssh 
catid_rsa.pub >> ~/.ssh/authorized_keys 
chmod 700 ~/.ssh 
chmod 600 ~/.ssh/* 

Allhosts ssh fromclouderam01 mustbevalid(passwordless):

ssh clouderam01(aswell) 
ssh clouderam02
sshclouderaw01
ssh clouderaw02
ssh clouderaw03
ssh clouderaw04
ssh clouderaw05

Cloudera Manager setup

NowthattheVMsareproperlyconfigured,it'stime toinstalltheClouderaManager onourmainnode'clouderam01'.

Connecttothenodeanddownloadtheinstaller:

wgethttps://archive.cloudera.com/cm6/6.3.1/cloudera-manager-installer.bin

Nextup,weneedtogivetheinstallerexecutablepermission:

chmod u+x cloudera-manager-installer.bin

Andlastly–runtheClouderaManager Serverinstaller:

sudo ./cloudera-manager-installer.bin

 Theinstallationisfairlysimple(next–next–next–accept–accept-finish).

Aftertheinstallation,theClouderaManagerserviceshouldbeupandrunningafteracoupleofminutes. But,beforewe'reabletoaccessthemanager,weneedtoopenupsomeportsthatCloudera Manager uses. 

Cloudera manager setup

GotoElasticComputeServiceandselectin the rightmenuNetwork & Security - SecurityGroups-nameofVPC.

ClickonAddsecuritygroupruleandaddport 7180.

Add security group rule

Cloudera cluster setup

Openyourbrowserandtypeintheaddressofyouclouderamainnode:

http://ip_address::7180/

Ifeverythinguptothispointwasdonecorrectly,youshouldgeta login screen:

Login screen

Loginusingusername/pass: admin/admin.

Accepttheagreementandmove tochoosewhicheditiontodeploy.

Choose'free'edition.

Cloudera Manager

Click to continue to  thenextscreenandyou’llcometothehostspecificationpartoftheclusterinstallation.

Tospecifyour hosts, enterthefollowing:

 clouderam[01-02]

clouderaw[01-05]

specify hosts for cdh cluster intallation

Thiswillnowsearchforall‘clouderam’hoststhatendwithnumbersfrom01 to 02andall‘clouderaw’hoststhatendwithnumbersfrom01 to 05.ThatwouldbeallourVMs.

Click‘Search’.

Ifdonecorrectly,the search willfindour7hosts:

Found hosts for CDH cluster installation

Click‘Continue’.

Onthenextscreenleaveeverythingasis, justunder‘AdditionalParcels’chooseKAFKA (atleastwedid):

Cluster intallation

Click‘Continue’.

Onthenextscreen,selectthecheckbox‘InstallOracle Java SE Development Kit (JDK)’andclick‘Continue’:

JDK kit

DonotenableSingleUserMode onthenextscreen.Justclick‘Continue’. 

Weloginusing‘admin’user,soput its login info onthenextscreen:

Admin setup

Continue.

Agentinstallationwillstart.

Ifeverythinguptothispointwasdonecorrectly,allbarsshouldbegreenbytheendwithtext‘Installationcompletedsuccessfully’:

Agent installation

Side note:there’sacommonerrorhere if /etc/hostsonourVMsisnotconfiguredproperly - agentswon’tbeabletoheartbeatandinstallationwillfail, butonlyaftereverythinghasalreadybeeninstalled.

Tofixthis–checkyour/etc/hostsfileifeverythingwastypedincorrectly.

Click‘Continue’ andwaitforselectedparcelstobeinstalled:

Cluster installation

Continuewhenfinished.

AttheendyouwillgetaClusterInstallationValidationsandSummary.

Thereshouldbenoerrors,maybeonlysomewarnings.

Click‘Finish’.

Cloudera custom services

Select‘CustomServices’whenaskedwhichcombinationofservicestoinstall. Anewmenuwillappear.Selecteveryservicetypeexceptfor isilon,key-valuestoreindexer, solr (atleastinourcase i.e.choose dependingonyourneeds):

Custom Services

Continue.

Nowit’stime toCustomizeRoleAssignments.

Hereweassignwhich hosts willbemanagementnodes (NameNode) andwhich willbeworkers(DataNode). Inourcase clouderam01/02 areprimary/secondaryNameNodeand the othersare DataNodes. Mostoftheservicesaredividedbetween NameNodes.Usuallywhen hostisnotassigned(e.g. KafkaMirrorMaker) weleftitemptyasitis.

Theresult looks likethis:

Cloud setup

Cloudera management services (2)

Andin‘View By Host’:

View by host

Click‘Continue’whenfinished.

Onthenextscreenchoose‘UseEmbeddedDatabase’, TestConnection andContinue.

Cluster Setup

YouwillarrivetotheReviewChangesscreen.

Bydefault, someserviceshaven’tsetdirectorypathsforKudu.

Kudu

Addvalueasmentionedinpicture.

KuduMaster DataDirectories: data/kudu/master_wal

KuduTablet Server WALDirectory: data/kudu/tablet_wal

KuduTablet Server WALDirectory: data1/kudu/master_wal

KuduTablet Server DataDirectories: data1/kudu/tablet_data

data2/kudu/tablet_data

data3/kudu/tablet_data

Clickcontinueandfinishsetup.

Conclusion

InthisblogwecoveredtheinstallationofaclouderclusteronanAlibabaCloudenvironmentandthepreparationoftheenvironmentitself.

References:

https://eu.alibabacloud.com/en

https://www.cloudera.com/downloads/manager/6-3-1.html

BACK TO LAB

Cookie policy

To make this website run properly and to improve your experience, we use cookies. For more detailed information, please check our Cookie Policy.

Choice of cookies on this website

Allow or deny the website to use functional and/or advertising cookies described below:

Settings Accept necessary Accept selected