I recently ran into an issue when deploying Ambari Agent to a new host in my cluster. Here’s my personal Knowledge Base article on the issue.
Symptoms
When deploying Ambari Agent to a new node, the wizard fails. At the bottom of stderr I found the following :
INFO DataCleaner.py:122 - Data cleanup finished INFO 11,947 hostname.py:67 - agent:hostname_script configuration not defined thus read hostname '<my host>' using socket.getfqdn(). INFO 11,952 PingPortListener.py:50 - Ping port listener started on port: 8670 INFO 11,954 main.py:439 - Connecting to Ambari server at https://<my host>:8440 (172.31.42.192) INFO 955 NetUtil.py:70 - Connecting to https://<my host>:8440/ca ERROR 11,958 NetUtil.py:96 - EOF occurred in violation of protocol (_ssl.c:579) ERROR 11,958 NetUtil.py:97 - SSLError: Failed to connect. Please check openssl library versions. Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details. WARNING 11,958 NetUtil.py:124 - Server at https://<my host>:8440 is not reachable, sleeping for 10 seconds...
Root Cause
After a solving the issue I can say the issue, in my case, was that there were previously installed versions of Java that conflicted with my preferred version even after selecting my version using the alternatives command.
Working through the issue I also found a suggestion to disable certificate validation that I implemented since this is not a production cluster, I am listing it as Solution 3 here.
Solution 1 – Deploy new hosts with no previous JDK
After much tinkering with the alternatives command to repair my JDK configuration I decided that it was easier to start with a new AWS set of nodes and ensure that no JDK was installed in my image before I began my prepping of each node. If you have nodes that are having the issue after an upgrade read Solution 2.
I am including the script I used to download and configure the correct JDK pre-requisite for my version of Ambari and HDP below for your consumption:
#!/bin/bash #Script Name: ignacio_install_jdk.scr #Author: Ignacio de la Torre #Independent Contractor Profile: https://linkedin.com/in/idelatorre ################# ##Install Oracle JDK export hm=/home/ec2-user cd /usr sudo mkdir -p jdk64/jdk1.8.0_112 cd jdk64 sudo wget http://public-repo-1.hortonworks.com/ARTIFACTS/jdk-8u112-linux-x64.tar.gz sudo gunzip jdk-8u112-linux-x64.tar.gz sudo tar -xf jdk-8u112-linux-x64.tar #configure paths chmod 666 $hm/.bash_profile echo export JAVA_HOME=/usr/jdk64/jdk1.8.0_112 >> $hm/.bash_profile echo export PATH=$PATH:/usr/jdk64/jdk1.8.0_112/bin >> $hm/.bash_profile chmod 640 /root/.bash_profile #configure java version using alternatives sudo alternatives --install /usr/bin/java java /usr/jdk64/jdk1.8.0_112/bin/java 1 #if the link to /usr/bin/java is broken (file displays red), rebuild using: #ln -s -f /usr/jdk64/jdk1.8.0_112/bin/java /usr/bin/java
Solution 2 – Install new JDK with utilities
I realize scrapping nodes is not an option, especially for those experiencing the issue after an install. Because of a tight deadline I did not try the solution displayed here but it addresses what I think the issue is.
Scenario: On my original nodes that had a previous non-compatible version of JDK installed I issued the following command to select my new JDK as preferred:
#Select Oracle's JDK 1.8 as preferred after install (see install script on Solution 1) sudo alternatives --install /usr/bin/java java /usr/jdk64/jdk1.8.0_112/bin/java 1
Issue: After selecting my new JDK I was able to see it listed in the configurations with the command below, BUT, all of the Java utilities such as jar, javadoc, etc. are pointing to null in my preferred JDK.
#list Java configured alternatives sudo alternatives --display java
Proposed solution: Use the list of tools from the non-compatible version of JDK to install your new JDK with all the java utilities as slaves, please note that you cannot add slaves to an installed JDK, you need to issue the install command with all the utilities all at once. An example adding only JAR is displayed below:
#Select Oracle’s JDK 1.8 as preferred after install with a slave configuration for the JAR and javadoc utilities:
sudo alternatives --install "/usr/bin/java" "java" "/usr/jdk64/jdk1.8.0_112/bin/java" 1 \ --slave "/usr/bin/jar" "jar" "usr/jdk64/jdk1.8.0_112/bin/jar" \ --slave "/usr/bin/javadoc" "javadoc" "usr/jdk64/jdk1.8.0_112/bin/javadoc"
Solution 3 – Disable certificate validation
Like I said before, my cluster is not a production one and will not contain sensitive or confidential data so I opted to implement the suggestion to disable certificate validation as part of my troubleshooting. To do this you have to set verify=diable by editing the /etc/python/cert-verification.cfg file. Do this at your own risk.