Labs & musings
Cloudera cluster on Alibaba Cloud Cloudera cluster on Alibaba Cloud
Code / 21.12.2020

The Cloudera cluster can be run separately or within a cloud environment. In this blog Cloud Infrastructure EngineerKarlo Kričkić will explain how you can install it within the Alibaba Cloud environment in several easy steps.
Cloudera Enterprise is a modern platform for machine learning and analytics, optimized for the cloud to be:
-
Unified - brings your data warehouse, data science, data engineering, and operational database workloads together on a single integrated platform
-
Hybrid - the most popular data warehouse and machine learning engines that can run on any compute resource for ultimate deployment flexibility
-
Enterprise-grade - the scale and performance required for today’s modern data workloads meets the security and governance demanded by today’s IT departments
Cloudera Enterprise provides the following solutions:
-
Data Warehouse
-
Data Science
-
Data Engineering
-
Operational Database
-
Run Everything in the Cloud, Multi-Cloud, or on a Hybrid "Cloud / On-Premises" Deployment
Task Overview
Cloudera works on the principle of master and worker nodes. The installation consists of several steps:
- Creating the network
- Creating the virtual machines
- Preparation of OS (in this example we will use CentOS)
- Installing the Cloudera manager
- Installing the Cloudera cluster
Creating a network
Alibaba Cloudby default requires the creation of a virtual network so that machines can communicate with each other and be visible on the public Internet. Network creation takes place through the interface of theAlibaba Clouditself.
Alibaba Cloud network setup
GotoVirtualPrivateCloudConsoleandchoose CreateVPC.
Increatenetworkdialogenter:
· Name:Cloudera
· IPRange: 192.168.0.0/16 (DefaultCIDRBlock)
· VSwitches:Cloudera
· Frankfurt Zone A
ClickonOKandyouarefinishedcreating a privatenetwork.
AlicloudVirtual Machines
NextstepistocreateVirtualMachinesinAlibabaCloud; twomanagementnodesandfiveworkernodes.
BasicConfiguration
Region:Germany(Frankfurt) – Zone A
Type: ecs.sn2.medium – GeneralPurposeTypesn2) (2vCPU, 8GB RAM)
Publicimage:CentOS7 – 7.7 64-bit; SecurityEnhancement
Storage: Standard SSD 80 GB;ReleasewithInstance
Networking
Network: Useexistingnetworkwhichyousetupon 1.step:
-
Cloudera
-
AssignPublicIPAddress
-
BandwithBilling:Pay-By-Traffic
-
PeakBandwith: 20 Mbps
-
Security Group autoselectedbyVPC
-
ElasticNetwork Interface:VSwitchCloudera
SystemConfiguration
Logoncredentials: Password
LogonPassword :
Instance Name : clouderam01orclouderaw01
Host: clouderam01orclouderaw01
Preview
ClickonCreateInstanceandrepeatthisprocedure foreverymasterorworknodeincluster.
CentOspreparation Alibaba Cloud
Connecttoeachserverusingbuilt-inwebConsole,oryourfavorite SSHclientandcreateadminuser:
adduseradmin
Usethe passwd commandtoupdatethenewuser’spassword.
passwdadmin
Setandconfirmthenewuser’spassword attheprompt.
Set passwordprompts:
Changingpassword foruseradmin.
New password:
Retypenewpassword:
passwd:allauthenticationtokensupdatedsuccessfully.
Usethe usermod commandtoaddtheusertothe wheel group.
usermod-aGwheeladmin
Bydefault, onCentOS,membersofthe wheel grouphavesudoprivileges.
Usethe su commandtoswitchtothenewuseraccount.
su -admin
Step1- Removesudopassword foruseradmin:
sudo vi /etc/sudoers
Addline totheendofthefile:
adminALL=(ALL) NOPASSWD: ALL
Step2 –Disablefirewall(ifenabled):
sudo systemctl disable firewalld
sudo systemctl stop firewalld
Step3Disablerootloginremotely
sudovi /etc/ssh/sshd_config
PermitRootLoginno
sudosystemctlrestartsshd.service
Step4 –Preparehostnamelookup
(addIPaddressfollowedbyFQDN (fullyqualifieddomainname)andshort name– make sure FQDNisfirstafterIPaddressotherwiseClouderaManagertakesshort nameduringinstallationwhichresultswithunsuccessfulinstallation):
sudovi /etc/hosts
Addtheselinestothefile:
192.168.0.118 clouderaw05clouderaw05
192.168.0.117 clouderaw04clouderaw04
192.168.0.116 clouderaw03clouderaw03
192.168.0.115 clouderaw02clouderaw02
192.168.0.114 clouderaw01clouderaw01
192.168.0.113 clouderam02clouderam02
192.168.0.112 clouderam01clouderam01
DisableallIPv6entriesonallhosts.
#::1localhostlocalhost.localdomainlocalhost6 localhost6.localdomain6
#127.0.0.1localhostlocalhost.localdomainlocalhost4 localhost4.localdomain4
Step5 –setupauto loginfrom‘clouderam01’ toallhosts:
First,let’screateapublicandprivatekeypaironthemainClouderaManager server ‘clouderam01’using (leavedefaultswhenasked):
cd /home/admin
ssh-keygen
cat~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 700 ~/.ssh
chmod 600 ~/.ssh/*
Nowlet’scopythepublickeytoallincludedservers:
scp/home/admin/.ssh/id_rsa.pub admin@clouderam01:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderam02:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw01:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw02:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw03:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw04:id_rsa.pub
scp/home/admin/.ssh/id_rsa.pub admin@clouderaw05:id_rsa.pub
Oneachserveraddthispublickeyinto authorized_keys:
mkdir -p ~/.ssh
catid_rsa.pub >> ~/.ssh/authorized_keys
chmod 700 ~/.ssh
chmod 600 ~/.ssh/*
Allhosts ssh fromclouderam01 mustbevalid(passwordless):
ssh clouderam01(aswell)
ssh clouderam02
sshclouderaw01
ssh clouderaw02
ssh clouderaw03
ssh clouderaw04
ssh clouderaw05
Cloudera Manager setup
NowthattheVMsareproperlyconfigured,it'stime toinstalltheClouderaManager onourmainnode'clouderam01'.
Connecttothenodeanddownloadtheinstaller:
wgethttps://archive.cloudera.com/cm6/6.3.1/cloudera-manager-installer.bin
Nextup,weneedtogivetheinstallerexecutablepermission:
chmod u+x cloudera-manager-installer.bin
Andlastly–runtheClouderaManager Serverinstaller:
sudo ./cloudera-manager-installer.bin
Theinstallationisfairlysimple(next–next–next–accept–accept-finish).
Aftertheinstallation,theClouderaManagerserviceshouldbeupandrunningafteracoupleofminutes. But,beforewe'reabletoaccessthemanager,weneedtoopenupsomeportsthatCloudera Manager uses.
GotoElasticComputeServiceandselectin the rightmenuNetwork & Security - SecurityGroups-nameofVPC.
ClickonAddsecuritygroupruleandaddport 7180.
Cloudera cluster setup
Openyourbrowserandtypeintheaddressofyouclouderamainnode:
http://ip_address::7180/
Ifeverythinguptothispointwasdonecorrectly,youshouldgeta login screen:
Loginusingusername/pass: admin/admin.
Accepttheagreementandmove tochoosewhicheditiontodeploy.
Choose'free'edition.
Click to continue to thenextscreenandyou’llcometothehostspecificationpartoftheclusterinstallation.
Tospecifyour hosts, enterthefollowing:
clouderam[01-02]
clouderaw[01-05]
Thiswillnowsearchforall‘clouderam’hoststhatendwithnumbersfrom01 to 02andall‘clouderaw’hoststhatendwithnumbersfrom01 to 05.ThatwouldbeallourVMs.
Click‘Search’.
Ifdonecorrectly,the search willfindour7hosts:
Click‘Continue’.
Onthenextscreenleaveeverythingasis, justunder‘AdditionalParcels’chooseKAFKA (atleastwedid):
Click‘Continue’.
Onthenextscreen,selectthecheckbox‘InstallOracle Java SE Development Kit (JDK)’andclick‘Continue’:
DonotenableSingleUserMode onthenextscreen.Justclick‘Continue’.
Weloginusing‘admin’user,soput its login info onthenextscreen:
Continue.
Agentinstallationwillstart.
Ifeverythinguptothispointwasdonecorrectly,allbarsshouldbegreenbytheendwithtext‘Installationcompletedsuccessfully’:
Side note:there’sacommonerrorhere if /etc/hostsonourVMsisnotconfiguredproperly - agentswon’tbeabletoheartbeatandinstallationwillfail, butonlyaftereverythinghasalreadybeeninstalled.
Tofixthis–checkyour/etc/hostsfileifeverythingwastypedincorrectly.
Click‘Continue’ andwaitforselectedparcelstobeinstalled:
Continuewhenfinished.
AttheendyouwillgetaClusterInstallationValidationsandSummary.
Thereshouldbenoerrors,maybeonlysomewarnings.
Click‘Finish’.
Cloudera custom services
Select‘CustomServices’whenaskedwhichcombinationofservicestoinstall. Anewmenuwillappear.Selecteveryservicetypeexceptfor isilon,key-valuestoreindexer, solr (atleastinourcase i.e.choose dependingonyourneeds):
Continue.
Nowit’stime toCustomizeRoleAssignments.
Hereweassignwhich hosts willbemanagementnodes (NameNode) andwhich willbeworkers(DataNode). Inourcase clouderam01/02 areprimary/secondaryNameNodeand the othersare DataNodes. Mostoftheservicesaredividedbetween NameNodes.Usuallywhen hostisnotassigned(e.g. KafkaMirrorMaker) weleftitemptyasitis.
Theresult looks likethis:
Andin‘View By Host’:
Click‘Continue’whenfinished.
Onthenextscreenchoose‘UseEmbeddedDatabase’, TestConnection andContinue.
YouwillarrivetotheReviewChangesscreen.
Bydefault, someserviceshaven’tsetdirectorypathsforKudu.
Addvalueasmentionedinpicture.
KuduMaster DataDirectories: data/kudu/master_wal
KuduTablet Server WALDirectory: data/kudu/tablet_wal
KuduTablet Server WALDirectory: data1/kudu/master_wal
KuduTablet Server DataDirectories: data1/kudu/tablet_data
data2/kudu/tablet_data
data3/kudu/tablet_data
Clickcontinueandfinishsetup.
Conclusion
InthisblogwecoveredtheinstallationofaclouderclusteronanAlibabaCloudenvironmentandthepreparationoftheenvironmentitself.
References: