hadoop&spark安装（上）-创新互联

硬件环境：

网站设计制作过程拒绝使用模板建站；使用PHP+MYSQL原生开发可交付网站源代码;符合网站优化排名的后台管理系统；成都网站设计、成都网站建设收费合理；免费进行网站备案等企业网站建设一条龙服务.我们是一家持续稳定运营了十载的创新互联网站建设公司。

hddcluster1 10.0.0.197 redhat7

hddcluster2 10.0.0.228 centos7 这台作为master

hddcluster3 10.0.0.202 redhat7

hddcluster4 10.0.0.181 centos7

软件环境：

关闭所有防火墙firewall

openssh-clients

openssh-server

java-1.8.0-openjdk

java-1.8.0-openjdk-devel

hadoop-2.7.3.tar.gz

流程：

选定一台机器作为 Master
在 Master 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
在 Master 节点上安装 Hadoop，并完成配置
在其他 Slave 节点上配置 hadoop 用户、安装 SSH server、安装 Java 环境
将 Master 节点上的 /usr/local/hadoop 目录复制到其他 Slave 节点上
在 Master 节点上开启 Hadoop

#节点的名称与对应的 IP 关系
[hadoop@hddcluster2 ~]$ cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.0.0.228      hddcluster2
10.0.0.197      hddcluster1
10.0.0.202      hddcluster3
10.0.0.181      hddcluster4

创建hadoop用户
su  # 上述提到的以 root 用户登录
useradd -m hadoop -s /bin/bash   # 创建新用户hadoop
passwd hadoop                     #设置hadoop密码
visudo                            #root ALL=(ALL) ALL 这行下面添加hadoop ALL=(ALL) ALL

#登录hadoop用户，安装SSH、配置SSH无密码登陆
[hadoop@hddcluster2 ~]$ rpm -qa | grep ssh
[hadoop@hddcluster2 ~]$ sudo yum install openssh-clients
[hadoop@hddcluster2 ~]$ sudo yum install openssh-server
[hadoop@hddcluster2 ~]$cd ~/.ssh/     # 若没有该目录，请先执行一次ssh localhost
[hadoop@hddcluster2 ~]$ssh-keygen -t rsa              # 会有提示，都按回车就可以
[hadoop@hddcluster2 ~]$ssh-copy-id -i ~/.ssh/id_rsa.pub localhost # 加入授权
[hadoop@hddcluster2 ~]$chmod 600 ./authorized_keys    # 修改文件权限
[hadoop@hddcluster2 ~]$ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@hddcluster1
[hadoop@hddcluster2 ~]$ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@hddcluster3
[hadoop@hddcluster2 ~]$ssh-copy-id -i ~/.ssh/id_rsa.pub  hadoop@hddcluster4

#解压hadoop文件到/usr/local/hadoop
[hadoop@hddcluster2 ~]$sudo tar -zxf hadoop-2.7.3.tar.gz -C /usr/local/
[hadoop@hddcluster2 ~]$sudo mv /usr/local/hadoop-2.7.3 /usr/local/hadoop
[hadoop@hddcluster2 ~]$sudo chown -R hadoop:hadoop /usr/local/hadoop
cd /usr/local/hadoop
./bin/hadoop version
#安装java环境
[hadoop@hddcluster2 ~]$sudo yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel
[hadoop@hddcluster2 ~]$ rpm -ql java-1.8.0-openjdk-devel | grep '/bin/javac' 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/bin/javac
[hadoop@hddcluster2 ~]$ vim ~/.bashrc
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_PREFIX=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_PREFIX/lib:$HADOOP_PREFIX/lib/native"
#测试java环境
source ~/.bashrc
java -version
$JAVA_HOME/bin/java -version  # 与直接执行 java -version 一样

#修改hadoop文件配置
[hadoop@hddcluster2 hadoop]$ pwd
/usr/local/hadoop/etc/hadoop
[hadoop@hddcluster2 hadoop]$ cat core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hddcluster2:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop/tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
</configuration>


[hadoop@hddcluster2 hadoop]$ cat hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->



<configuration>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>hddcluster2:50090</value>
        </property>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/name</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop/tmp/dfs/data</value>
        </property>
</configuration>
[hadoop@hddcluster2 hadoop]$ 

[hadoop@hddcluster2 hadoop]$ cat mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->


<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>hddcluster2:10020</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.webapp.address</name>
                <value>hddcluster2:19888</value>
        </property>
</configuration>
[hadoop@hddcluster2 hadoop]$ 

[hadoop@hddcluster2 hadoop]$ cat yarn-site.xml 
<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->

        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hddcluster2</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>


</configuration>
[hadoop@hddcluster2 hadoop]$ 

[hadoop@hddcluster2 hadoop]$ cat slaves 
hddcluster1
hddcluster2
hddcluster3
hddcluster4

$cd /usr/local
$sudo rm -r ./hadoop/tmp     # 删除 Hadoop 临时文件
$sudo rm -r ./hadoop/logs/*   # 删除日志文件
$tar -zcf ~/hadoop.master.tar.gz ./hadoop   # 先压缩再复制
$cd ~
$scp ./hadoop.master.tar.gz hddcluster1:/home/hadoop
$scp ./hadoop.master.tar.gz hddcluster3:/home/hadoop
$scp ./hadoop.master.tar.gz hddcluster4:/home/hadoop

在salve节点上操作，安装软件环境并配置好.bashrc

sudo tar -zxf ~/hadoop.master.tar.gz -C /usr/local
sudo chown -R hadoop /usr/local/hadoop

[hadoop@hddcluster2 ~]$hdfs namenode -format       # 首次运行需要执行初始化，之后不需要
接着可以启动 hadoop 了，启动需要在 Master 节点上进行启动命令：
$start-dfs.sh
$start-yarn.sh
$mr-jobhistory-daemon.sh start historyserver
通过命令 jps 可以查看各个节点所启动的进程。正确的话，
在 Master 节点上可以看到 NameNode、ResourceManager、SecondrryNameNode、JobHistoryServer 进程，
另外还需要在 Master 节点上通过命令 hdfs dfsadmin -report 查看 DataNode 是否正常启动，如果 Live datanodes 不为 0 ，则说明集群启动成功。
[hadoop@hddcluster2 ~]$ hdfs dfsadmin -report
Configured Capacity: 2125104381952 (1.93 TB)
Present Capacity: 1975826509824 (1.80 TB)
DFS Remaining: 1975824982016 (1.80 TB)
DFS Used: 1527808 (1.46 MB)
DFS Used%: 0.00%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (4):
也可以通过 Web 页面看到查看 DataNode 和 NameNode 的状态：http://hddcluster2:50070/。如果不成功，可以通过启动日志排查原因。

在 Slave 节点操作可以看到 DataNode 和 NodeManager 进程

测试hadoop分布式实例
首先创建 HDFS 上的用户目录：
hdfs dfs -mkdir -p /user/hadoop
将 /usr/local/hadoop/etc/hadoop 中的配置文件作为输入文件复制到分布式文件系统中：
hdfs dfs -mkdir input
hdfs dfs -put /usr/local/hadoop/etc/hadoop/*.xml input
通过查看   的DataNode 的状态（占用大小有改变），输入文件确实复制到了 DataNode 中。
接着就可以运行 MapReduce 作业了：
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs[a-z.]+'
等待执行完毕后的输出结果：

hadoop启动命令：
start-dfs.sh
start-yarn.sh
mr-jobhistory-daemon.sh start historyserver
hadoop关闭命令：
stop-dfs.sh
stop-yarn.sh
mr-jobhistory-daemon.sh stop historyserver

PS：如果集群有一两台无法启动的话，先尝试一下删除hadoop临时文件

cd /usr/local

sudo rm -r ./hadoop/tmp

sudo rm -r ./hadoop/logs/*

然后执行

hdfs namenode -format

再启动

本文参考了一下网站并实验成功：

http://www.powerxing.com/install-hadoop-cluster/

另外有需要云服务器可以了解下创新互联scvps.cn，海内外云服务器15元起步，三天无理由+7*72小时售后在线，公司持有idc许可证，提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案，具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势，专为企业上云打造定制，能够满足用户丰富、多元化的应用场景需求。

新闻名称：hadoop&spark安装（上）-创新互联
网站地址：https://www.cdcxhl.com/article22/djsgcc.html

成都网站建设公司_创新互联，为您提供网站设计公司、外贸网站建设、App开发、网站维护、定制网站、手机网站建设

声明：本网站发布的内容（图片、视频和文字）以用户投稿、用户转载内容为主，如果涉及侵权请尽快告知，我们将会在第一时间删除。文章观点不代表本网站立场，如需处理请联系客服。电话：028-86922220；邮箱：631063699@qq.com。内容未经允许不得转载，或转载时需注明来源：创新互联

猜你还喜欢下面的内容