invert ar 没root怎么看wifi密码看

System Dump
>> 本文链接: >> 订阅本站: >> 转载请注明来源: >> > 本文链接: >> 订阅本站: >> 转载请注明来源: >> <a rel="bookmark" title="在Redhat Linux 下查看和配置ILO(2-4)信息"金鳞|LOFTER(乐乎) - 记录生活,发现同好
LOFTER for ipad —— 记录生活,发现同好
下载移动端
关注最新消息
&nbsp&nbsp被喜欢
&nbsp&nbsp被喜欢
{list posts as post}
{if post.type==1 || post.type == 5}
{if !!post.title}${post.title|escape}{/if}
{if !!post.digest}${post.digest}{/if}
{if post.type==2}
{if post.type == 3}
{if !!post.image}
{if post.type == 4}
{if !!post.image}
{if !!photo.labels && photo.labels.length>0}
{var wrapwidth = photo.ow < 500?photo.ow:500}
{list photo.labels as labs}
{var lbtxtwidth = Math.floor(wrapwidth*(labs.ort==1?labs.x:(100-labs.x))/100)-62}
{if lbtxtwidth>12}
{if !!labs.icon}
{list photos as photo}
{if photo_index==0}{break}{/if}
品牌${make||'-'}
型号${model||'-'}
焦距${focalLength||'-'}
光圈${apertureValue||'-'}
快门速度${exposureTime||'-'}
ISO${isoSpeedRatings||'-'}
曝光补偿${exposureBiasValue||'-'}
镜头${lens||'-'}
{if data.msgRank == 1}{/if}
{if data.askSetting == 1}{/if}
{if defined('posts')&&posts.length>0}
{list posts as post}
{if post_index < 3}
{if post.type == 1 || post.type == 5}
{if !!post.title}${post.title|escape}{/if}
{if !!post.digest}${post.digest}{/if}
{if post.type == 2}
{if post.type == 3}
{if post.type == 4}
{if post.type == 6}
{if drlist.length>0}
更多相似达人:
{list drlist as dr}{if drlist.length === 3 && dr_index === 0}、{/if}{if drlist.length === 3 && dr_index === 1}、{/if}{if drlist.length === 2 && dr_index === 0}、{/if}{/list}
暂无相似达人,
{if defined('posts')&&posts.length>0}
{list posts as post}
{if post.type == 2}
{if post.type == 3}
{if post.type == 4}
{if post.type == 6}
this.p={ currentPage:1,pageNewMode:true,isgooglead3:false,ishotrecompost:false,visitorId:0, first:'',tag:'金鳞',recommType:'new',recommenderRole:0,offset:20,type:0,isUserEditor:0,};Nutch相关框架安装使用最佳指南_慧知文库
Nutch 相关框架安装使用最佳指南
一、nutch1.2 二、nutch1.5.1 三、nutch2.0 四、配置 SSH 五、安装 Hadoop Cluster(伪分布式运行模式)并运行 Nutch 六、安装 Hadoop Cluster(分布式运行模式)并运行 N...
Nutch 相关框架安装使用最佳指南 一、nutch1.2 二、nutch1.5.1 三、nutch2.0 四、配置 SSH 五、安装 Hadoop Cluster(伪分布式运行模式)并运行 Nutch 六、安装 Hadoop Cluster(分布式运行模式)并运行 Nutch 七、配置 Ganglia 监控 Hadoop 集群和 HBase 集群 八、Hadoop 配置 Snappy 压缩 九、Hadoop 配置 Lzo 压缩 十、配置 zookeeper 集群以运行 hbase 十一、配置 Hbase 集群以运行 nutch-2.1(Region Servers 会因为内存的问题宕机) 十二、配置 Accumulo 集群以运行 nutch-2.1(gora 存在 BUG) 十三、配置 Cassandra 集群以运行 nutch-2.1(Cassandra 采用去中心化结构) 十四、配置 MySQL 单机服务器以运行 nutch-2.1 十五、nutch2.1 使用 DataFileAvroStore 作为数据源 十六、nutch2.1 使用 AvroStore 作为数据源 十七、配置 SOLR 十八、Nagios 监控 十九、配置 Splunk 二十、配置 Pig 二十一、配置 Hive 二十二、配置 Hadoop2.x 集群 一、nutch1.2 步骤和二大同小异,在步骤 5、配置构建路径中需要多两个操作:在左部 Package Explorer 的 nutch1.2 文件夹上单击右键& Build Path & Configure Build Path... &选中 Source 选项& Default output folder:修改 nutch1.2/bin 为 nutch1.2/_bin, 在左部 Package Explorer 的 nutch1.2 文件夹下的 bin 文件夹上单击右键& Team &还原 二中黄色背景部分是版本号的差异, 红色部分是 1.2 版本没有的, 绿色部分是不一样的地方, 如下: 1、Add JARs... & nutch1.2 & lib ,选中所有的.jar 文件& OK 2、crawl-urlfilter.txt 3、将 crawl -urlfilter.txt.template 改名为 crawl -urlfilter.txt 4、修改 crawl-urlfilter.txt,将 # accept hosts in MY.DOMAIN.NAME +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/ # skip everything else -.
5、cd /home/ysc/workspace/nutch1.2 nutch1.2 是一个完整的搜索引擎,nutch1.5.1 只是一个爬虫。nutch1.2 可以把索引提交给 SOLR,也可以直接生成 LUCENE 索引,nutch1.5.1 则只能把索引提交给 SOLR: 1、cd /home/ysc 2 、 wget http://mirrors.tuna./apache/tomcat/tomcat-7/v7.0.29/bin/apache-tomcat-7.0.29 .tar.gz 3、tar -xvf apache-tomcat-7.0.29.tar.gz 4、 在左部 Package Explorer 的 nutch1.2 文件夹下的 build.xml 文件上单击右键& Run As & Ant Build... &选中 war target & Run 5、cd /home/ysc/workspace/nutch1.2/build 6、unzip nutch-1.2.war -d nutch-1.2 7、cp -r nutch-1.2 /home/ysc/apache-tomcat-7.0.29/webapps 8、vi /home/ysc/apache-tomcat-7.0.29/webapps/nutch-1.2/WEB-INF/classes/nutch-site.xml 加入以下配置: &property& &name&searcher.dir&/name& &value&/home/ysc/workspace/nutch1.2/data&/value& &description& Path to root of crawl. This directory is searched (in order) for either the file search-servers.txt, containing a list of distributed search servers, or the directory &index& containing merged indexes, or the directory &segments& containing segment indexes. &/description& &/property& 9、vi /home/ysc/apache-tomcat-7.0.29/conf/server.xml 将 &Connector port=&8080& protocol=&HTTP/1.1& connectionTimeout=&20000& redirectPort=&8443&/& 改为 &Connector port=&8080& protocol=&HTTP/1.1& connectionTimeout=&20000& redirectPort=&8443& URIEncoding=&utf-8&/& 10、cd /home/ysc/apache-tomcat-7.0.29/bin 11、./startup.sh 12、访问:http://localhost:8080/nutch-1.2/ 关 于 nutch1.2 更 多 的 BUG 修 复 及 资 料 , 请 参 看 我 在 CSDN 发 布 的 资 源 : http://download.csdn.net/user/yangshangchuan 二、nutch1.5.1 1、下载并解压 eclipse(集成开发环境) 下载地址:http://www.eclipse.org/downloads/,下载 Eclipse IDE for Java EE Developers 2、安装 Subclipse 插件(SVN 客户端)
插件地址:http://subclipse.tigris.org/update_1.8.x, 3、安装 IvyDE 插件(下载依赖 Jar) 插件地址:http://www.apache.org/dist/ant/ivyde/updatesite/ 4、签出代码 File & New & Project & SVN &从 SVN 检出项目 创建新的资源库位置& URL:https://svn.apache.org/repos/asf/nutch/tags/release-1.5.1/ &选中 URL & Finish 弹出 New Project 向导,选择 Java Project & Next,输入 Project name:nutch1.5.1 & Finish 5、配置构建路径 在左部 Package Explorer 的 nutch1.5.1 文件夹上单击右键& Build Path & Configure Build Path... &选中 Source 选项&选择 src & Remove & Add Folder... &选择 src/bin, src/java, src/test 和 src/testresources(对于插件,需要选中 src/plugin 目录下的每一个插件目录下的 src/java , src/test 文件夹)& OK 切换到 Libraries 选项& Add Class Folder... &选中 nutch1.5.1/conf & OK Add JARs... &需要选中 src/plugin 目录下的每一个插件目录下的 lib 目录下的 jar 文件& OK Add Library... & IvyDE Managed Dependencies & Next & Main & Ivy File & Browse & ivy/ivy.xml & Finish 切换到 Order and Export 选项& 选中 conf & Top 6、执行 ANT 在左部 Package Explorer 的 nutch1.5.1 文件夹下的 build.xml 文件上单击右键& Run As & Ant Build 在左部 Package Explorer 的 nutch1.5.1 文件夹上单击右键& Refresh 在左部 Package Explorer 的 nutch1.5.1 文件夹上单击右键& Build Path & Configure Build Path... &选中 Libraries 选项& Add Class Folder... &选中 build & OK 7、修改配置文件 nutch-site.xml 和 regex-urlfilter.txt 将 nutch-site.xml.template 改名为 nutch-site.xml 将 regex-urlfilter.txt.template 改名为 regex-urlfilter.txt 在左部 Package Explorer 的 nutch1.5.1 文件夹上单击右键& Refresh 将如下配置项加入文件 nutch-site.xml: &property& &name&http.agent.name&/name& &value&nutch&/value& &/property& &property& &name&http.content.limit&/name& &value&-1&/value& &/property& 修改 regex-urlfilter.txt,将 # accept anything else +. 替换为: +^http://([a-z0-9]*\.)*/
-. 8、开发调试 在左部 Package Explorer 的 nutch1.5.1 文件夹上单击右键& New & Folder & Folder name: urls 在刚新建的 urls 目录下新建一个文本文件 url,文本内容为: 打开 src/java 下的 org.apache.nutch.crawl.Crawl.java 类, 单击右键 Run As & Run Configurations & Arguments &在 Program arguments 输入框中输入: urls -dir data -depth 3 & Run 在需要调试的地方打上断点 Debug As & Java Applicaton 9、查看结果 查看 segments 目录: 打开 src/java 下的 org.apache.nutch.segment.SegmentReader.java 类 单击右键 Run As & Java Applicaton,控制台会输出该命令的使用方法 单击右键 Run As & Run Configurations & Arguments &在 Program arguments 输入框中输入: -dump data/segments/* data/segments/dump 用文本编辑器打开文件 data/segments/dump/dump 查看 segments 中存储的信息 查看 crawldb 目录: 打开 src/java 下的 org.apache.nutch.crawl.CrawlDbReader.java 类 单击右键 Run As & Java Applicaton,控制台会输出该命令的使用方法 单击右键 Run As & Run Configurations & Arguments &在 Program arguments 输入框中输入: data/crawldb -stats 控制台会输出 crawldb 统计信息 查看 linkdb 目录: 打开 src/java 下的 org.apache.nutch.crawl.LinkDbReader.java 类 单击右键 Run As & Java Applicaton,控制台会输出该命令的使用方法 单击右键 Run As & Run Configurations & Arguments &在 Program arguments 输入框中输入: data/linkdb -dump data/linkdb_dump 用文本编辑器打开文件 data/linkdb_dump/part-00000 查看 linkdb 中存储的信息 10、全网分步骤抓取 在左部 Package Explorer 的 nutch1.5.1 文件夹下的 build.xml 文件上单击右键& Run As & Ant Build cd /home/ysc/workspace/nutch1.5.1/runtime/local #准备 URL 列表 wget http://rdf.dmoz.org/rdf/content.rdf.u8.gz gunzip content.rdf.u8.gz mkdir dmoz bin/nutch org.apache.nutch.tools.DmozParser content.rdf.u8 -subset 5000 & dmoz/url #注入 URL bin/nutch inject crawl/crawldb dmoz #生成抓取列表 bin/nutch generate crawl/crawldb crawl/segments #第一次抓取 s1=`ls -d crawl/segments/2* | tail -1` echo $s1 #抓取网页 bin/nutch fetch $s1
#解析网页 bin/nutch parse $s1 #更新 URL 状态 bin/nutch updatedb crawl/crawldb $s1 #第二次抓取 bin/nutch generate crawl/crawldb crawl/segments -topN 1000 s2=`ls -d crawl/segments/2* | tail -1` echo $s2 bin/nutch fetch $s2 bin/nutch parse $s2 bin/nutch updatedb crawl/crawldb $s2 #第三次抓取 bin/nutch generate crawl/crawldb crawl/segments -topN 1000 s3=`ls -d crawl/segments/2* | tail -1` echo $s3 bin/nutch fetch $s3 bin/nutch parse $s3 bin/nutch updatedb crawl/crawldb $s3 #生成反向链接库 bin/nutch invertlinks crawl/linkdb -dir crawl/segments 11、索引和搜索 cd /home/ysc/ wget http://mirror./apache/lucene/solr/3.6.1/apache-solr-3.6.1.tgz tar -xvf apache-solr-3.6.1.tgz cd apache-solr-3.6.1 /example NUTCH_RUNTIME_HOME=/home/ysc/workspace/nutch1.5.1/runtime/local APACHE_SOLR_HOME=/home/ysc/apache-solr-3.6.1 cp ${NUTCH_RUNTIME_HOME}/conf/schema.xml ${APACHE_SOLR_HOME}/example/solr/conf/ 如果需要把网页内容存储到索引中,则修改 schema.xml 文件中的 &field name=&content& type=&text& stored=&false& indexed=&true&/& 为 &field name=&content& type=&text& stored=&true& indexed=&true&/& 修 改 ${APACHE_SOLR_HOME}/example/solr/conf/solrconfig.xml, 将 里 面 的 &str name=&df&&text&/str&都替换为&str name=&df&&content&/str& 把 ${APACHE_SOLR_HOME}/example/solr/conf/schema.xml 中 的 &schema name=&nutch& version=&1.5.1&&修改为&schema name=&nutch& version=&1.5&& #启动 SOLR 服务器 java -jar start.jar http://127.0.0.1:8983/solr/admin/ http://127.0.0.1:8983/solr/admin/stats.jsp cd /home/ysc/workspace/nutch1.5.1/runtime/local #提交索引 bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb
crawl/segments/* 执行完整 crawl: bin/nutch crawl urls -dir data -depth 2 -topN 100 -solr http://127.0.0.1:8983/solr/ 使用以下命令分页查看所有索引的文档: http://127.0.0.1:8983/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on 标题包含“网易”的文档: http://127.0.0.1:8983/solr/select/?q=title%3A%E7%BD%91%E6%98%93&version=2.2&start=0&r ows=10&indent=on 12、查看索引信息 cd /home/ysc/ wget /files/lukeall-3.5.0.jar java -jar lukeall-3.5.0.jar Path: /home/ysc/apache-solr-3.6.1/example/solr/data 13、配置 SOLR 的中文分词 cd /home/ysc/ wget /files/mmseg4j-1.8.5.zip unzip mmseg4j-1.8.5.zip -d mmseg4j-1.8.5 APACHE_SOLR_HOME=/home/ysc/apache-solr-3.6.1 mkdir $APACHE_SOLR_HOME/example/solr/lib mkdir $APACHE_SOLR_HOME/example/solr/dic cp mmseg4j-1.8.5/mmseg4j-all-1.8.5.jar $APACHE_SOLR_HOME/example/solr/lib cp mmseg4j-1.8.5/data/*.dic $APACHE_SOLR_HOME/example/solr/dic 将${APACHE_SOLR_HOME}/example/solr/conf/schema.xml 文件中的 &tokenizer class=&solr.WhitespaceTokenizerFactory&/& 和 &tokenizer class=&solr.StandardTokenizerFactory&/& 替换为 &tokenizer class=&com.chenlb.mmseg4j.solr.MMSegTokenizerFactory& dicPath=&/home/ysc/apache-solr-3.6.1/example/solr/dic&/& #重新启动 SOLR 服务器 java -jar start.jar #重建索引,演示在开发环境中如何操作 打开 src/java 下的 org.apache.nutch.indexer.solr.SolrIndexer.java 类 单击右键 Run As & Java Applicaton,控制台会输出该命令的使用方法 单击右键 Run As & Run Configurations & Arguments &在 Program arguments 输入框中输入: http://127.0.0.1:8983/solr/ ; data/crawldb -linkdb data/linkdb data/segments/* 使用 luke 重新打开索引就会发现分词起作用了 三、nutch2.0 nutch2.0 和二中的 nutch1.5.1 的步骤相同,但在 8、开发调试之前需要做以下配置: 在左部 Package Explorer 的 nutch2.0 文件夹上单击右键& New & Folder & Folder name: data 并 mode=&complex&
指定数据存储方式,选如下之一: 1、使用 mysql 作为数据存储 1) 、在 nutch2.0/conf/nutch-site.xml 中加入如下配置: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.sql.store.SqlStore&/value& &/property& 2) 、将 nutch2.0/conf/gora.properties 文件中的 gora.sqlstore.jdbc.driver=org.hsqldb.jdbc.JDBCDriver gora.sqlstore.jdbc.url=jdbc:hsqldb:hsql://localhost/nutchtest gora.sqlstore.jdbc.user=sa gora.sqlstore.jdbc.password= 修改为 gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://127.0.0.1:3306/nutch2 gora.sqlstore.jdbc.user=root gora.sqlstore.jdbc.password=ROOT 3) 、打开 nutch2.0/ivy/ivy.xml 中的 mysql-connector-java 依赖 4) 、sudo apt-get install mysql-server 2、使用 hbase 作为数据存储 1) 、在 nutch2.0/conf/nutch-site.xml 中加入如下配置: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.hbase.store.HBaseStore&/value& &/property& 2) 、打开 nutch2.0/ivy/ivy.xml 中的 gora-hbase 依赖 3) 、cd /home/ysc 4) 、wget http://mirror./apache/hbase/hbase-0.90.5/hbase-0.90.5.tar.gz 5) 、tar -xvf hbase-0.90.5.tar.gz 6) 、vi hbase-0.90.5/conf/hbase-site.xml 加入以下配置: &property& &name&hbase.rootdir&/name& &value&file:///home/ysc/hbase-0.90.5-database&/value& &/property& 7)、hbase-0.90.5/bin/start-hbase.sh 8)、将/home/ysc/hbase-0.90.5/hbase-0.90.5.jar 加入开发环境 eclipse 的 build path 四、配置 SSH 三台机器 devcluster01, devcluster02, devcluster03,分别在每一台机器上面执行如下操 作: 1、sudo vi /etc/hosts 加入以下配置: 192.168.1.1 devcluster01 192.168.1.2 devcluster02
192.168.1.3 devcluster03 2、安装 SSH 服务: sudo apt-get install openssh-server 3、(有提示的时候回车键确认) ssh-keygen -t rsa 该命令会在用户主目录下创建 .ssh 目录,并在其中创建两个文件:id_rsa 私钥文件。是基 于 RSA 算法创建。该私钥文件要妥善保管,不要泄漏。id_rsa.pub 公钥文件。和 id_rsa 文 件是一对儿,该文件作为公钥文件,可以公开。 4、cp .ssh/id_rsa.pub .ssh/authorized_keys 把 三 台 机 器 devcluster01 , devcluster02 , devcluster03 的 文 件 /home/ysc/.ssh/authorized_keys 的内容复制出来合并成一个文件并替换每一台机器上的 /home/ysc/.ssh/authorized_keys 文件 在 devcluster01 上面执行时,以下两条命令的主机为 02 和 03 在 devcluster02 上面执行时,以下两条命令的主机为 01 和 03 在 devcluster03 上面执行时,以下两条命令的主机为 01 和 02 5、ssh-copy-id -i .ssh/id_rsa.pub ysc@ devcluster02 6、ssh-copy-id -i .ssh/id_rsa.pub ysc@ devcluster03 以上两条命令实际上是将 .ssh/id_rsa.pub 公钥文件追加到远程主机 server 的 user 主目录 下的 .ssh/authorized_keys 文件中。 五、安装 Hadoop Cluster(伪分布式运行模式)并运行 Nutch 步 骤 和 四 大 同 小 异 , 只 需 要 1 台 机 器 devcluster01 , 所 以黄 色 背 景 部 分全 部 设 置 为 devcluster01,不需要第 11 步 六、安装 Hadoop Cluster(分布式运行模式)并运行 Nutch 三台机器 devcluster01, devcluster02, devcluster03(vi /etc/hostname) 使用用户 ysc 登陆 devcluster01: 1、cd /home/ysc 2 、 wget http://mirrors.tuna./apache/hadoop/common/hadoop-1.1.1/hadoop-1.1.1-bin.ta r.gz 3、tar -xvf hadoop-1.1.1-bin.tar.gz 4、cd hadoop-1.1.1 5、vi conf/masters 替换内容为: devcluster01 6、vi conf/slaves 替换内容为: devcluster02 devcluster03 7、vi conf/core-site.xml 加入配置: &property& &name&fs.default.name&/name& &value&hdfs://devcluster01:9000&/value& &description&
Where to find the Hadoop Filesystem through the network. Note 9000 is not the default port. (This is slightly changed from previous versions which didnt have &hdfs&) &/description& &/property& &property& &name&hadoop.security.authorization&/name& &value&true&/value& &/property& 编辑 conf/hadoop-policy.xml 8、vi conf/hdfs-site.xml 加入配置: &property& &name&dfs.name.dir&/name& &value&/home/ysc/dfs/filesystem/name&/value& &/property& &property& &name&dfs.data.dir&/name& &value&/home/ysc/dfs/filesystem/data&/value& &/property& &property& &name&dfs.replication&/name& &value&1&/value& &/property& &property& &name&dfs.block.size&/name& &value&&/value& &description&The default block size for new files.&/description& &/property& 9、vi conf/mapred-site.xml 加入配置: &property& &name&mapred.job.tracker&/name& &value&devcluster01:9001&/value& &description& The host and port that the MapReduce job tracker runs at. If &local&, then jobs are run in-process as a single map and reduce task. Note 9001 is not the default port. &/description& &/property& &property& &name&mapred.reduce.tasks.speculative.execution&/name& &value&false&/value&
&description&If true, then multiple instances of some reduce tasks may be executed in parallel.&/description& &/property& &property& &name&mapred.map.tasks.speculative.execution&/name& &value&false&/value& &description&If true, then multiple instances of some map tasks may be executed in parallel.&/description& &/property& &property& &name&mapred.child.java.opts&/name& &value&-Xmx2000m&/value& &/property& &property& &name&mapred.tasktracker.map.tasks.maximum&/name& &value&4&/value& &description& the core number of host &/description& &/property& &property& &name&mapred.map.tasks&/name& &value&4&/value& &/property& &property& &name&mapred.tasktracker.reduce.tasks.maximum&/name& &value&4&/value& &description& define mapred.map tasks to be number of slave hosts.the best number is the number of slave hosts plus the core numbers of per host &/description& &/property& &property& &name&mapred.reduce.tasks&/name& &value&4&/value& &description& define mapred.reduce tasks to be number of slave hosts.the best number is the number of slave hosts plus the core numbers of per host &/description& &/property& &property& &name&pression.type&/name& &value&BLOCK&/value& &description&If the job outputs are to compressed as SequenceFiles, how should they be
compressed? Should be one of NONE, RECORD or BLOCK. &/description& &/property& &property& &name&press&/name& &value&true&/value& &description&Should the job outputs be compressed? &/description& &/property& &property& &name&press.map.output&/name& &value&true&/value& &description&Should the outputs of the maps sent across the network. Uses SequenceFile compression. &/description& &/property& &property& &name&mapred.system.dir&/name& &value&/home/ysc/mapreduce/system&/value& &/property& &property& &name&mapred.local.dir&/name& &value&/home/ysc/mapreduce/local&/value& &/property& be compressed before being 10、vi conf/hadoop-env.sh 追加: export JAVA_HOME=/home/ysc/jdk1.7.0_05 export HADOOP_HEAPSIZE=2000 #替换掉默认的垃圾回收器,因为默认的垃圾回收器在多线程环境下会有更多的 wait 等待 export HADOOP_OPTS=&-server -Xmn256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70& 11、复制 HADOOP 文件 scp -r /home/ysc/hadoop-1.1.1 :/home/ysc/hadoop-1.1.1 scp -r /home/ysc/hadoop-1.1.1 :/home/ysc/hadoop-1.1.1 12、sudo vi /etc/profile 追加并重启系统: export PATH=/home/ysc/hadoop-1.1.1/bin:$PATH 13、格式化名称节点并启动集群 hadoop namenode -format start-all.sh 14、cd /home/ysc/workspace/nutch1.5.1/runtime/deploy mkdir urls echo
& urls/url hadoop dfs -put urls urls
bin/nutch crawl urls -dir data -depth 2 -topN 100 15 、 访 问 http://localhost:50030 可 以 查 看 JobTracker 的 运 行 状 态 。 访 问 http://localhost:50060 可以查看 TaskTracker 的运行状态。访问 http://localhost:50070 可以 查看 NameNode 以及整个分布式文件系统的状态,浏览分布式文件系统中的文件以及 log 等 16、通过 stop-all.sh 停止集群 17、 如果 NameNode 和 SecondaryNameNode 不在同一台机器上, 则在 SecondaryNameNode 的 conf/hdfs-site.xml 文件中加入配置: &property& &name&dfs.http.address&/name& &value&namenode:50070&/value& &/property& 七、配置 Ganglia 监控 Hadoop 集群和 HBase 集群 1、服务器端(安装到 master devcluster01 上) 1) 、ssh devcluster01 2) 、addgroup ganglia adduser --ingroup ganglia ganglia 3) 、sudo apt-get install ganglia-monitor ganglia-webfront gmetad //补充:在 Ubuntu10.04 上,ganglia-webfront 这个 package 名字叫 ganglia-webfrontend //如果 install 出错,则运行 sudo apt-get update,如果 update 出错,则删除出错路径 4) /etc/ganglia/gmond.conf 、vi 先找到 setuid = yes,改成 setuid = 在找到 cluster 块中的 name,改成 name =”hadoop-cluster”; 5) 、sudo apt-get install rrdtool 6)、vi /etc/ganglia/gmetad.conf 在这个配置文件中增加一些 datasource,即其他 2 个被监控的节点,增加以下内容: data_source “hadoop-cluster” devcluster01:8649 devcluster02:8649 devcluster03:8649 gridname &Hadoop& 2、数据源端(安装到所有 slaves 上) 1)、ssh devcluster02 addgroup ganglia adduser --ingroup ganglia ganglia sudo apt-get install ganglia-monitor 2)、ssh devcluster03 addgroup ganglia adduser --ingroup ganglia ganglia sudo apt-get install ganglia-monitor 3) 、ssh devcluster01 scp /etc/ganglia/gmond.conf devcluster02:/etc/ganglia/gmond.conf scp /etc/ganglia/gmond.conf devcluster03:/etc/ganglia/gmond.conf 3、配置 WEB 1) 、ssh devcluster01
2) 、sudo ln -s /usr/share/ganglia-webfrontend /var/www/ganglia 3) /etc/apache2/apache2.conf 、vi 添加: ServerName devcluster01 4、重启服务 1) 、ssh devcluster02 sudo /etc/init.d/ganglia-monitor restart ssh devcluster03 sudo /etc/init.d/ganglia-monitor restart 2) 、ssh devcluster01 sudo /etc/init.d/ganglia-monitor restart sudo /etc/init.d/gmetad restart sudo /etc/init.d/apache2 restart 5、访问页面 http:// devcluster01/ganglia 6、集成 hadoop 1) 、ssh devcluster01 2) 、cd /home/ysc/hadoop-1.1.1 3) conf/hadoop-metrics2.properties 、vi # 大 于 0.20 以 后 的 版 本 用 ganglia31 *.sink.ganglia.class=org.apache.hadoop.metrics2.sink.ganglia.GangliaSink31 *.sink.ganglia.period=10 # default for supportsparse is false *.sink.ganglia.supportsparse=true *.sink.ganglia.slope=jvm.metrics.gcCount=zero,jvm.metrics.memHeapUsedM=both *.sink.ganglia.dmax=jvm.metrics.threadsBlocked=70,jvm.metrics.memHeapUsedM=40 #广播 IP 地址,这是缺省的,统一设该值(只能用组播地址 239.2.11.71) namenode.sink.ganglia.servers=239.2.11.71:8649 datanode.sink.ganglia.servers=239.2.11.71:8649 jobtracker.sink.ganglia.servers=239.2.11.71:8649 tasktracker.sink.ganglia.servers=239.2.11.71:8649 maptask.sink.ganglia.servers=239.2.11.71:8649 reducetask.sink.ganglia.servers=239.2.11.71:8649 dfs.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 dfs.period=10 dfs.servers=239.2.11.71:8649 mapred.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 mapred.period=10 mapred.servers=239.2.11.71:8649 jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.servers=239.2.11.71:8649 4 ) 、 scp conf/hadoop-metrics2.properties :/home/ysc/hadoop-1.1.1/conf/hadoop-metrics2.properties
5 ) 、 scp conf/hadoop-metrics2.properties :/home/ysc/hadoop-1.1.1/conf/hadoop-metrics2.properties 6) 、stop-all.sh 7) 、start-all.sh 7、集成 hbase 1) 、ssh devcluster01 2) 、cd /home/ysc/hbase-0.92.2 3) conf/hadoop-metrics.properties(只能用组播地址 239.2.11.71) 、vi hbase.extendedperiod = 3600 hbase.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 hbase.period=10 hbase.servers=239.2.11.71:8649 jvm.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 jvm.period=10 jvm.servers=239.2.11.71:8649 rpc.class=org.apache.hadoop.metrics.ganglia.GangliaContext31 rpc.period=10 rpc.servers=239.2.11.71:8649 4 ) 、 scp conf/hadoop-metrics.properties hbase-0.92.2/conf/hadoop-metrics.properties 5 ) 、 scp conf/hadoop-metrics.properties hbase-0.92.2/conf/hadoop-metrics.properties :/home/ysc/ :/home/ysc/ 6) 、stop-hbase.sh 7) 、start-hbase.sh 八、Hadoop 配置 Snappy 压缩 1、wget /files/snappy-1.0.5.tar.gz 2、tar -xzvf snappy-1.0.5.tar.gz 3、cd snappy-1.0.5 4、./configure 5、make 6、make install 7 、 scp /usr/local/lib/libsnappy* devcluster01:/home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64/ scp /usr/local/lib/libsnappy* devcluster02:/home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64/ scp /usr/local/lib/libsnappy* devcluster03:/home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64/ 8、vi /etc/profile 追加: export LD_LIBRARY_PATH=/home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64 9、修改 mapred-site.xml &property& &name&pression.type&/name& &value&BLOCK&/value&
&description&If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. &/description& &/property& &property& &name&press&/name& &value&true&/value& &description&Should the job outputs be compressed? &/description& &/property& &property& &name&press.map.output&/name& &value&true&/value& &description&Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. &/description& &/property& &property& &name&mapred.pression.codec&/name& &value&org.apache.press.SnappyCodec&/value& &description&If the map outputs are compressed, how should they be compressed? &/description& &/property& &property& &name&pression.codec&/name& &value&org.apache.press.SnappyCodec&/value& &description&If the job outputs are compressed, how should they be compressed? &/description& &/property& 九、Hadoop 配置 Lzo 压缩 1、wget /opensource/lzo/download/lzo-2.06.tar.gz 2、tar -zxvf lzo-2.06.tar.gz 3、cd lzo-2.06 4、./configure --enable-shared 5、make 6、make install 7、scp /usr/local/lib/liblzo2.* devcluster01:/lib/x86_64-linux-gnu scp /usr/local/lib/liblzo2.* devcluster02:/lib/x86_64-linux-gnu scp /usr/local/lib/liblzo2.* devcluster03:/lib/x86_64-linux-gnu 8 、 wget http://hadoop-gpl-compression.apache-extras./files/hadoop-gpl-compression0.1.0-rc0.tar.gz 9、tar -xzvf hadoop-gpl-compression-0.1.0-rc0.tar.gz
10、cd hadoop-gpl-compression-0.1.0 11、cp lib/native/Linux-amd64-64/* /home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64/ 12、cp hadoop-gpl-compression-0.1.0.jar /home/ysc/hadoop-1.1.1/lib/(这里 hadoop 集群的版 本要和 compression 使用的版本一致) 13、scp -r /home/ysc/hadoop-1.1.1/lib devcluster02:/home/ysc/hadoop-1.1.1/ scp -r /home/ysc/hadoop-1.1.1/lib devcluster03:/home/ysc/hadoop-1.1.1/ 14、vi /etc/profile 追加: export LD_LIBRARY_PATH=/home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64 15、修改 core-site.xml &property& &name&pression.codecs&/name& &value&pression.lzo.LzoCodec,org.apache.press.DefaultCodec,or g.apache.press.GzipCodec,org.apache.press.BZip2Codec,org.apach e.press.SnappyCodec&/value& &description&A list of the compression codec classes that can be used for compression/decompression.&/description& &/property& &property& &name&pression.codec.lzo.class&/name& &value&pression.lzo.LzoCodec&/value& &/property& &property& &name&fs.trash.interval&/name& &value&1440&/value& &description&Number of minutes between trash checkpoints. If zero, the trash feature is disabled. &/description& &/property& 16、修改 mapred-site.xml &property& &name&pression.type&/name& &value&BLOCK&/value& &description&If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. &/description& &/property& &property& &name&press&/name& &value&true&/value& &description&Should the job outputs be compressed? &/description& &/property& &property&
&name&press.map.output&/name& &value&true&/value& &description&Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. &/description& &/property& &property& &name&mapred.pression.codec&/name& &value&pression.lzo.LzoCodec&/value& &description&If the map outputs are compressed, how should they be compressed? &/description& &/property& &property& &name&pression.codec&/name& &value&pression.lzo.LzoCodec&/value& &description&If the job outputs are compressed, how should they be compressed? &/description& &/property& 十、配置 zookeeper 集群以运行 hbase 1、ssh devcluster01 2、cd /home/ysc 3、wget http://mirror./apache/zookeeper/stable/zookeeper-3.4.5.tar.gz 4、tar -zxvf zookeeper-3.4.5.tar.gz 5、cd zookeeper-3.4.5 6、cp conf/zoo_sample.cfg conf/zoo.cfg 7、vi conf/zoo.cfg 修改:dataDir=/home/ysc/zookeeper 添加: server.1=devcluster01: server.2=devcluster02: server.3=devcluster03: maxClientCnxns=100 8、scp -r zookeeper-3.4.5 devcluster01:/home/ysc scp -r zookeeper-3.4.5 devcluster02:/home/ysc scp -r zookeeper-3.4.5 devcluster03:/home/ysc 9、分别在三台机器上面执行: ssh devcluster01 mkdir /home/ysc/zookeeper(注:dataDir 是 zookeeper 的数据目录,需要手动创建) echo 1 & /home/ysc/zookeeper/myid ssh devcluster02 mkdir /home/ysc/zookeeper echo 2 & /home/ysc/zookeeper/myid ssh devcluster03
mkdir /home/ysc/zookeeper echo 3 & /home/ysc/zookeeper/myid 10、分别在三台机器上面执行: cd /home/ysc/zookeeper-3.4.5 bin/zkServer.sh start bin/zkCli.sh -server devcluster01:2181 bin/zkServer.sh status 十一、配置 Hbase 集群以运行 nutch-2.1(Region Servers 会因为内存的问题宕机) 1、nutch-2.1 使用 gora-0.2.1, gora-0.2.1 使用 hbase-0.90.4,hbase-0.90.4 和 hadoop-1.1.1 不 兼容,hbase-0.94.4 和 gora-0.2.1 不兼容,hbase-0.92.2 没问题。hbase 存在系统时间同步的 问题,并且误差要再 30s 以内。 sudo apt-get install ntp sudo ntpdate -u 210.72.145.44 2、 HBase 是数据库, 会在同一时间使用很多的文件句柄。 大多数 linux 系统使用的默认值 1024 是 不 能 满 足 的 。 还 需 要 修 改 hbase 用 户 的 nproc , 在 压 力 下 , 如 果 过 低 会 造 成 OutOfMemoryError 异常。 vi /etc/security/limits.conf 添加: ysc soft nproc 32000 ysc hard nproc 32000 ysc soft nofile 32768 ysc hard nofile 32768 vi /etc/pam.d/common-session 添加: session required pam_limits.so 3、登陆 master,下载并解压 hbase ssh devcluster01 cd /home/ysc wget /hbase/hbase-0.92.2/hbase-0.92.2.tar.gz tar -zxvf hbase-0.92.2.tar.gz cd hbase-0.92.2 4、修改配置文件 hbase-env.sh vi conf/hbase-env.sh 追加: export JAVA_HOME=/home/ysc/jdk1.7.0_05 export HBASE_MANAGES_ZK=false export HBASE_HEAPSIZE=10000 #替换掉默认的垃圾回收器,因为默认的垃圾回收器在多线程环境下会有更多的 wait 等待 export HBASE_OPTS=&-server -Xmn256m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70& 5、修改配置文件 hbase-site.xml vi conf/hbase-site.xml &property& &name&hbase.rootdir&/name&
&value&hdfs://devcluster01:9000/hbase&/value& &/property& &property& &name&hbase.cluster.distributed&/name& &value&true&/value& &/property& &property& &name&hbase.zookeeper.quorum&/name& &value&devcluster01,devcluster02,devcluster03&/value& &/property& &property& &name&hfile.block.cache.size&/name& &value&0.25&/value& &description& Percentage of maximum heap (-Xmx setting) to allocate to block cache used by HFile/StoreFile. Default of 0.25 means allocate 25%. Set to 0 to disable but it&#39;s not recommended. &/description& &/property& &property& &name&hbase.regionserver.global.memstore.upperLimit&/name& &value&0.4&/value& &description&Maximum size of all memstores in a region server before new updates are blocked and flushes are forced. Defaults to 40% of heap &/description& &/property& &property& &name&hbase.regionserver.global.memstore.lowerLimit&/name& &value&0.35&/value& &description&When memstores are being forced to flush to make room in memory, keep flushing until we hit this mark. Defaults to 35% of heap. This value equal to hbase.regionserver.global.memstore.upperLimit causes the minimum possible flushing to occur when updates are blocked due to memstore limiting. &/description& &/property& &property& &name&hbase.hregion.majorcompaction&/name& &value&0&/value& &description&The time (in miliseconds) between &#39;major&#39; compactions of all HStoreFiles in a region. Default: 1 day. Set to 0 to disable automated major compactions. &/description& &/property&
6、修改配置文件 regionservers vi conf/regionservers devcluster01 devcluster02 devcluster03 7、 因为 HBase 建立在 Hadoop 之上, Hadoop 使用的 hadoop*.jar 和 HBase 使用的必须一致。 所以要将 HBase lib 目录下的 hadoop*.jar 替换成 Hadoop 里面的那个,防止版本冲突。 cp /home/ysc/hadoop-1.1.1/hadoop-core-1.1.1.jar /home/ysc/hbase-0.92.2/lib rm /home/ysc/hbase-0.92.2/lib/hadoop-core-1.0.3.jar 8、复制文件到 regionservers scp -r /home/ysc/hbase-0.92.2 devcluster01:/home/ysc scp -r /home/ysc/hbase-0.92.2 devcluster02:/home/ysc scp -r /home/ysc/hbase-0.92.2 devcluster03:/home/ysc 9、启动 hadoop 并创建目录 hadoop fs -mkdir /hbase 10、管理 HBase 集群: 启动初始 HBase 集群: bin/start-hbase.sh 停止 HBase 集群: bin/stop-hbase.sh 启动额外备份主服务器,可以启动到 9 个备份服务器 (总数 10 个): bin/local-master-backup.sh start 1 bin/local-master-backup.sh start 2 3 启动更多 regionservers, 支持到 99 个额外 regionservers (总 100 个): bin/local-regionservers.sh start 1 bin/local-regionservers.sh start 2 3 4 5 停止备份主服务器: cat /tmp/hbase-ysc-1-master.pid |xargs kill -9 停止单独 regionserver: bin/local-regionservers.sh stop 1 使用 HBase 命令行模式: bin/hbase shell 11、web 界面 http://devcluster01:60010 http://devcluster01:60030 12、如运行 nutch2.1 则方法一: cp conf/hbase-site.xml /home/ysc/nutch-2.1/conf cd /home/ysc/nutch-2.1 ant cd runtime/deploy unzip -d apache-nutch-2.1 apache-nutch-2.1.job rm apache-nutch-2.1.job cd apache-nutch-2.1 rm lib/hbase-0.90.4.jar
cp /home/ysc/hbase-0.92.2/hbase-0.92.2.jar lib zip -r ../apache-nutch-2.1.job ./* cd .. rm -r apache-nutch-2.1 13、如运行 nutch2.1 则方法二: cp conf/hbase-site.xml /home/ysc/nutch-2.1/conf cd /home/ysc/nutch-2.1 cp /home/ysc/hbase-0.92.2/hbase-0.92.2.jar lib ant cd runtime/deploy zip -d apache-nutch-2.1.job lib/hbase-0.90.4.jar 启用 snappy 压缩: 1、vi conf/gora-hbase-mapping.xml 在 family 上面添加属性:compression=&SNAPPY& 2、mkdir /home/ysc/hbase-0.92.2/lib/native/Linux-amd64-64 3 、 cp /home/ysc/hadoop-1.1.1/lib/native/Linux-amd64-64/* /home/ysc/hbase-0.92.2/lib/native/Linux-amd64-64 4、vi /home/ysc/hbase-0.92.2/conf/hbase-site.xml 增加: &property& &name&hbase.regionserver.codecs&/name& &value&snappy&/value& &/property& 十二、配置 Accumulo 集群以运行 nutch-2.1(gora 存在 BUG) 1、wget /accumulo/1.4.2/accumulo-1.4.2-dist.tar.gz 2、tar -xzvf accumulo-1.4.2-dist.tar.gz 3、cd accumulo-1.4.2 4、cp conf/examples/3GB/standalone/* conf 5、vi conf/accumulo-env.sh export HADOOP_HOME=/home/ysc/cluster3 export ZOOKEEPER_HOME=/home/ysc/zookeeper-3.4.5 export JAVA_HOME=/home/jdk1.7.0_01 export ACCUMULO_HOME=/home/ysc/accumulo-1.4.2 6、vi conf/slaves devcluster01 devcluster02 devcluster03 7、vi conf/masters devcluster01 8、vi conf/accumulo-site.xml &property& &name&instance.zookeeper.host&/name& &value&host6:2181,host8:2181&/value&
&description&comma separated list of zookeeper servers&/description& &/property& &property& &name&logger.dir.walog&/name& &value&walogs&/value& &description&The directory used to store write-ahead logs on the local filesystem. It is possible to specify a comma-separated list of directories.&/description& &/property& &property& &name&instance.secret&/name& &value&ysc&/value& &description&A secret unique to a given instance that all servers must know in order to communicate with one another. Change it before initialization. To change it later use ./bin/accumulo org.apache.accumulo.server.util.ChangeSecret [oldpasswd] [newpasswd], and then update this file. &/description& &/property& &property& &name&tserver.memory.maps.max&/name& &value&3G&/value& &/property& &property& &name&tserver.cache.data.size&/name& &value&50M&/value& &/property& &property& &name&tserver.cache.index.size&/name& &value&512M&/value& &/property& &property& &name&trace.password&/name& &!-change this to the root user&#39;s password, and/or change the user below --& &value&ysc&/value& &/property& &property& &name&trace.user&/name& &value&root&/value& &/property& 9、bin/accumulo init 10、bin/start-all.sh 11、bin/stop-all.sh
12、web 访问:http://devcluster01:50095/ 修改 nutch2.1: 1、cd /home/ysc/nutch-2.1 2、vi conf/gora.properties 增加: gora.datastore.default=org.apache.gora.accumulo.store.AccumuloStore gora.datastore.accumulo.mock=false gora.datastore.accumulo.instance=accumulo gora.datastore.accumulo.zookeepers=host6,host8 gora.datastore.accumulo.user=root gora.datastore.accumulo.password=ysc 3、vi conf/nutch-site.xml 增加: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.accumulo.store.AccumuloStore&/value& &/property& 4、vi ivy/ivy.xml 增加: &dependency org=&org.apache.gora& name=&gora-accumulo& rev=&0.2.1& conf=&*-&default& /& 5、升级 accumulo cp /home/ysc/accumulo-1.4.2/lib/accumulo-core-1.4.2.jar /home/ysc/nutch-2.1/lib cp /home/ysc/accumulo-1.4.2/lib/accumulo-start-1.4.2.jar /home/ysc/nutch-2.1/lib cp /home/ysc/accumulo-1.4.2/lib/cloudtrace-1.4.2.jar /home/ysc/nutch-2.1/lib 6、ant 7、cd runtime/deploy 8、删除旧 jar zip -d apache-nutch-2.1.job lib/accumulo-core-1.4.0.jar zip -d apache-nutch-2.1.job lib/accumulo-start-1.4.0.jar zip -d apache-nutch-2.1.job lib/cloudtrace-1.4.2.jar 十三、配置 Cassandra 集群以运行 nutch-2.1(Cassandra 采用去中心化结构) 1、vi /etc/hosts(注意:需要登录到每一台机器上面,将 localhost 解析到实际地址) 192.168.1.1 localhost 2、wget /apache-mirror/cassandra/1.2.0/apache-cassandra-1.2.0-bin.tar.gz 3、tar -xzvf apache-cassandra-1.2.0-bin.tar.gz 4、cd apache-cassandra-1.2.0 5、vi conf/cassandra-env.sh 增加: MAX_HEAP_SIZE=&4G& HEAP_NEWSIZE=&800M& 6、vi conf/log4j-server.properties 修改: log4j.appender.R.File=/home/ysc/cassandra/system.log 7、vi conf/cassandra.yaml
修改: cluster_name: &#39;Cassandra Cluster&#39; data_file_directories: - /home/ysc/cassandra/data commitlog_directory: /home/ysc/cassandra/commitlog saved_caches_directory: /home/ysc/cassandra/saved_caches - seeds: &192.168.1.1& listen_address: 192.168.1.1 rpc_address: 192.168.1.1 thrift_framed_transport_size_in_mb: 1023 thrift_max_message_length_in_mb: 1024 8、vi bin/stop-server 增加: user=`whoami` pgrep -u $user -f cassandra | xargs kill -9 9、复制 cassandra 到其他节点: cd .. scp -r apache-cassandra-1.2.0 devcluster02:/home/ysc scp -r apache-cassandra-1.2.0 devcluster03:/home/ysc 分别在 devcluster02 和 devcluster03 上面修改: vi conf/cassandra.yaml listen_address: 192.168.1.2 rpc_address: 192.168.1.2 vi conf/cassandra.yaml listen_address: 192.168.1.3 rpc_address: 192.168.1.3 10、分别在 3 个节点上面运行 bin/cassandra bin/cassandra -f 参数 -f 的作用是让 Cassandra 以前端程序方式运行, 这样有利于调试 和观察日志信息, 而在实际生产环境中这个参数是不需要的 (即 Cassandra 会以 daemon 方 式运行) 11、bin/nodetool -host devcluster01 ring bin/nodetool -host devcluster01 info 12、bin/stop-server 13、bin/cassandra-cli 修改 nutch2.1: 1、cd /home/ysc/nutch-2.1 2、vi conf/gora.properties 增加: gora.cassandrastore.servers=host2:9160,host6:9160,host8:9160 3、vi conf/nutch-site.xml 增加: &property& &name&storage.data.store.class&/name&
&value&org.apache.gora.cassandra.store.CassandraStore&/value& &/property& 4、vi ivy/ivy.xml 增加: &dependency org=&org.apache.gora& name=&gora-cassandra& rev=&0.2.1& conf=&*-&default& /& 5、升级 cassandra cp /home/ysc/apache-cassandra-1.2.0/lib/apache-cassandra-1.2.0.jar /home/ysc/nutch-2.1/lib cp /home/ysc/apache-cassandra-1.2.0/lib/apache-cassandra-thrift-1.2.0.jar /home/ysc/nutch-2.1/lib cp /home/ysc/apache-cassandra-1.2.0/lib/jline-1.0.jar /home/ysc/nutch-2.1/lib 6、ant 7、cd runtime/deploy 8、删除旧 jar zip -d apache-nutch-2.1.job lib/cassandra-thrift-1.1.2.jar zip -d apache-nutch-2.1.job lib/jline-0.9.1.jar 十四、配置 MySQL 单机服务器以运行 nutch-2.1 1、apt-get install mysql-server mysql-client 2、vi /etc/f 修改: bind-address = 221.194.43.2 在[client]下增加: default-character-set=utf8 在[mysqld]下增加: default-character-set=utf8 3、mysql –uroot –pysc SHOW VARIABLES LIKE &#39;%character%&#39;; 4、service mysql restart 5、mysql –uroot –pysc GRANT ALL PRIVILEGES ON *.* TO root@&%& IDENTIFIED BY &ysc&; 6、vi conf/gora-sql-mapping.xml 修改字段的长度 &primarykey column=&id& length=&333&/& &field name=&content& column=&content& /& &field name=&text& column=&text& length=&19892&/& 7、启动 nutch 之后登陆 mysql ALTER TABLE webpage MODIFY COLUMN content MEDIUMBLOB; ALTER TABLE webpage MODIFY COLUMN text MEDIUMTEXT; ALTER TABLE webpage MODIFY COLUMN title MEDIUMTEXT; ALTER TABLE webpage MODIFY COLUMN reprUrl MEDIUMTEXT; ALTER TABLE webpage MODIFY COLUMN baseUrl MEDIUMTEXT; ALTER TABLE webpage MODIFY COLUMN typ MEDIUMTEXT; ALTER TABLE webpage MODIFY COLUMN inlinks MEDIUMBLOB; ALTER TABLE webpage MODIFY COLUMN outlinks MEDIUMBLOB;
修改 nutch2.1: 1、cd /home/ysc/nutch-2.1 2、vi conf/gora.properties 增加: gora.sqlstore.jdbc.driver=com.mysql.jdbc.Driver gora.sqlstore.jdbc.url=jdbc:mysql://host2:3306/nutch?createDatabaseIfNotExist=true&useUnico de=true&characterEncoding=utf8 gora.sqlstore.jdbc.user=root gora.sqlstore.jdbc.password=ysc 3、vi conf/nutch-site.xml 增加: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.sql.store.SqlStore &/value& &/property& &property& &name&encodingdetector.charset.min.confidence&/name& &value&1&/value& &description&A integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. &/description& &/property& 4、vi ivy/ivy.xml 增加: &dependency org=&mysql& name=&mysql-connector-java& rev=&5.1.18& conf=&*-&default&/& 十五、nutch2.1 使用 DataFileAvroStore 作为数据源 1、cd /home/ysc/nutch-2.1 2、vi conf/gora.properties 增加: gora.datafileavrostore.output.path=datafileavrostore gora.datafileavrostore.input.path=datafileavrostore 3、vi conf/nutch-site.xml 增加: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.avro.store.DataFileAvroStore&/value& &/property& &property& &name&encodingdetector.charset.min.confidence&/name& &value&1&/value& &description&A integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. &/description&
&/property& 十六、nutch2.1 使用 AvroStore 作为数据源 1、cd /home/ysc/nutch-2.1 2、vi conf/gora.properties 增加: gora.avrostore.codec.type=BINARY gora.avrostore.input.path=avrostore gora.avrostore.output.path=avrostore 3、vi conf/nutch-site.xml 增加: &property& &name&storage.data.store.class&/name& &value&org.apache.gora.avro.store.AvroStore&/value& &/property& &property& &name&encodingdetector.charset.min.confidence&/name& &value&1&/value& &description&A integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. &/description& &/property& 十七、配置 SOLR 配置 tomcat: 1 、 wget /apache-mirror/tomcat/tomcat-7/v7.0.35/bin/apache-tomcat-7.0.35.tar.gz 2、tar -xzvf apache-tomcat-7.0.35.tar.gz 3、cd apache-tomcat-7.0.35 4、vi conf/server.xml 增加 URIEncoding=&UTF-8&: &Connector port=&8080& protocol=&HTTP/1.1& connectionTimeout=&20000& redirectPort=&8443& URIEncoding=&UTF-8&/& 5、mkdir conf/Catalina 6、mkdir conf/Catalina/localhost 7、vi conf/Catalina/localhost/solr.xml 增加: &Context path=&/solr&& &Environment name=&solr/home& type=&java.lang.String& value=&/home/ysc/solr/configuration/& override=&false&/& &/Context& 8、cd .. 下载 SOLR:
1、wget http://mirrors.tuna./apache/lucene/solr/4.1.0/solr-4.1.0.tgz 2、tar -xzvf solr-4.1.0.tgz 复制资源: 1、mkdir /home/ysc/solr 2、cp -r solr-4.1.0/example/solr /home/ysc/solr/configuration 3 、 unzip solr-4.1.0/example/webapps/solr.war /home/ysc/apache-tomcat-7.0.35/webapps/solr -d 配置 nutch: 1、复制 schema: cp /home/ysc/nutch-1.6/conf/schema-solr4.xml /home/ysc/solr/configuration/collection1/conf/schema.xml 2、vi /home/ysc/solr/configuration/collection1/conf/schema.xml 在&fields&下增加: &field name=&_version_& type=&long& indexed=&true& stored=&true&/& 配置中文分词: 1、wget /files/mmseg4j-1.9.1.v-SNAPSHOT.zip 2、unzip mmseg4j-1.9.1.v-SNAPSHOT.zip 3 、 cp mmseg4j-1.9.1-SNAPSHOT/dist/* /home/ysc/apache-tomcat-7.0.35/webapps/solr/WEB-INF/lib 4 、 unzip mmseg4j-1.9.1-SNAPSHOT/dist/mmseg4j-core-1.9.1-SNAPSHOT.jar -d mmseg4j-1.9.1-SNAPSHOT/dist/mmseg4j-core-1.9.1-SNAPSHOT 5、mkdir /home/ysc/dic 6、cp mmseg4j-1.9.1-SNAPSHOT/dist/mmseg4j-core-1.9.1-SNAPSHOT/data/* /home/ysc/dic 7、vi /home/ysc/solr/configuration/collection1/conf/schema.xml 将文件中的 &tokenizer class=&solr.WhitespaceTokenizerFactory&/& 和 &tokenizer class=&solr.StandardTokenizerFactory&/& 替换为 &tokenizer class=&com.chenlb.mmseg4j.solr.MMSegTokenizerFactory& mode=&complex& dicPath=&/home/ysc/dic&/& 配置 tomcat 本地库: 1、wget http://apache.spd.co.il/apr/apr-1.4.6.tar.gz 2、tar -xzvf apr-1.4.6.tar.gz 3、cd apr-1.4.6 4、./configure 5、make 6、make install 1、wget http://mirror./apache/apr/apr-util-1.5.1.tar.gz 2、tar -xzvf apr-util-1.5.1.tar.gz 3、cd apr-util-1.5.1 4、./configure --with-apr=/usr/local/apr 5、make 6、make install
1 、 wget http://mirror./apache//tomcat/tomcat-connectors/native/1.1.24/source/tomcat-nativ e-1.1.24-src.tar.gz 2、tar -zxvf tomcat-native-1.1.24-src.tar.gz 3、cd tomcat-native-1.1.24-src/jni/native 4、./configure --with-apr=/usr/local/apr \ --with-java-home=/home/ysc/jdk1.7.0_01 \ --with-ssl=no \ --prefix=/home/ysc/apache-tomcat-7.0.35 5、make 6、make install 7、vi /etc/profile 增加: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/ysc/apache-tomcat-7.0.35/lib:/usr/local/apr/lib 8、source /etc/profile 启动 tomcat: cd apache-tomcat-7.0.35 bin/catalina.sh start http://devcluster01:8080/solr/ 十八、Nagios 监控 服务端: 1、apt-get install apache2 nagios3 nagios-nrpe-plugin 输入密码:nagiosadmin 2、apt-get install nagios3-doc 3、vi /etc/nagios3/conf.d/hostgroups_nagios2.cfg define hostgroup { hostgroup_name nagios-servers alias nagios servers members devcluster01,devcluster02,devcluster03 } 4、 cp /etc/nagios3/conf.d/localhost_nagios2.cfg /etc/nagios3/conf.d/devcluster01_nagios2.cfg vi /etc/nagios3/conf.d/devcluster01_nagios2.cfg 替换: g/localhost/s//devcluster01/g g/127.0.0.1/s//192.168.1.1/g 5、 cp /etc/nagios3/conf.d/localhost_nagios2.cfg /etc/nagios3/conf.d/devcluster02_nagios2.cfg vi /etc/nagios3/conf.d/devcluster02_nagios2.cfg 替换: g/localhost/s//devcluster02/g g/127.0.0.1/s//192.168.1.2/g 6、 cp /etc/nagios3/conf.d/localhost_nagios2.cfg /etc/nagios3/conf.d/devcluster03_nagios2.cfg vi /etc/nagios3/conf.d/devcluster03_nagios2.cfg 替换:
g/localhost/s//devcluster03/g g/127.0.0.1/s//192.168.1.3/g 7、vi /etc/nagios3/conf.d/services_nagios2.cfg 将 hostgroup_name 改为 nagios-servers 增加: # check that web services are running define service { hostgroup_name nagios-servers service_description HTTP check_command check_http use generic-service notification_interval 0 ; set & 0 if you want to be renotified } # check that ssh services are running define service { hostgroup_name nagios-servers service_description SSH check_command check_ssh use generic-service notification_interval 0 ; set & 0 if you want to be renotified } 8、vi /etc/nagios3/conf.d/extinfo_nagios2.cfg 将 hostgroup_name 改为 nagios-servers 增加: define hostextinfo{ hostgroup_name nagios-servers notes nagios-servers # http://webserver.localhost.localdomain/hostinfo.pl?host=netware1 icon_image base/debian.png icon_image_alt Debian GNU/Linux vrml_image debian.png statusmap_image base/debian.gd2 } 9、sudo /etc/init.d/nagios3 restart 10、访问 http://devcluster01/nagios3/ 用户名:nagiosadmin 密码:nagiosadmin 监控端: 1、apt-get install nagios-nrpe-server 2、vi /etc/nagios/nrpe.cfg 替换: g/127.0.0.1/s//192.168.1.1/g 3、sudo /etc/init.d/nagios-nrpe-server restart 十九、配置 Splunk notes_url
1 、 wget /releases/5.0.2/splunk/linux/splunk-5.0.2-149561-Linux-x86_64.tgz 2、tar -zxvf splunk-5.0.2-149561-Linux-x86_64.tgz 3、cd splunk 4、bin/splunk start --answer-yes --no-prompt --accept-license 5、访问 http://devcluster01:8000 用户名:admin 密码:changeme 6、添加数据 -&从 UDP 端口 -& UDP 端口 *: 1688 -&来源类型从列表 log4j -&保存 7、配置 hadoop vi /home/ysc/hadoop-1.1.1/conf/log4j.properties 修改: log4j.rootLogger=${hadoop.root.logger}, EventCounter, SYSLOG 增加: log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender log4j.appender.SYSLOG.facility=local1 log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout log4j.appender.SYSLOG.layout.ConversionPattern=%p %c{2}: %m%n log4j.appender.SYSLOG.SyslogHost=host6:1688 log4j.appender.SYSLOG.threshold=INFO log4j.appender.SYSLOG.Header=true log4j.appender.SYSLOG.FacilityPrinting=true 8、配置 hbase vi /home/ysc/hbase-0.92.2/conf/log4j.properties 修改: log4j.rootLogger=${hbase.root.logger},SYSLOG 增加: log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender log4j.appender.SYSLOG.facility=local1 log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout log4j.appender.SYSLOG.layout.ConversionPattern=%p %c{2}: %m%n log4j.appender.SYSLOG.SyslogHost=host6:1688 log4j.appender.SYSLOG.threshold=INFO log4j.appender.SYSLOG.Header=true log4j.appender.SYSLOG.FacilityPrinting=true 9、配置 nutch vi /home/lanke/ysc/nutch-2.1-hbase/conf/log4j.properties 修改: log4j.rootLogger=INFO,DRFA,SYSLOG 增加: log4j.appender.SYSLOG=org.apache.log4j.net.SyslogAppender log4j.appender.SYSLOG.facility=local1 log4j.appender.SYSLOG.layout=org.apache.log4j.PatternLayout log4j.appender.SYSLOG.layout.ConversionPattern=%p %c{2}: %m%n log4j.appender.SYSLOG.SyslogHost=host6:1688
log4j.appender.SYSLOG.threshold=INFO log4j.appender.SYSLOG.Header=true log4j.appender.SYSLOG.FacilityPrinting=true 10、启动 hadoop 和 hbase start-all.sh start-hbase.sh 二十、配置 Pig 1、wget /apache-mirror/pig/pig-0.11.0/pig-0.11.0.tar.gz 2、tar -xzvf pig-0.11.0.tar.gz 3、cd pig-0.11.0 4、vi /etc/profile 增加: export PIG_HOME=/home/ysc/pig-0.11.0 export PATH=$PIG_HOME/bin:$PATH 5、source /etc/profile 6、cp conf/log4j.properties.template conf/log4j.properties 7、vi conf/log4j.properties 8、pig 二十一、配置 Hive 1、wget /apache/hive/hive-0.10.0/hive-0.10.0.tar.gz 2、tar -xzvf hive-0.10.0.tar.gz 3、cd hive-0.10.0 4、vi /etc/profile 增加: export HIVE_HOME=/home/ysc/hive-0.10.0 export PATH=$HIVE_HOME/bin:$PATH 5、source /etc/profile 6、cp conf/hive-log4j.properties.template conf/hive-log4j.properties 7、vi conf/hive-log4j.properties 替换: log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter 为: log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter 二十二、配置 Hadoop2.x 集群 1 、 wget /apache-mirror/hadoop/common/hadoop-2.0.2-alpha/hadoop-2.0.2-alpha.t ar.gz 2、tar -xzvf hadoop-2.0.2-alpha.tar.gz 3、cd hadoop-2.0.2-alpha 4、vi etc/hadoop/hadoop-env.sh 追加: export JAVA_HOME=/home/ysc/jdk1.7.0_05 export HADOOP_HEAPSIZE=2000
5、vi etc/hadoop/core-site.xml &property& &name&fs.defaultFS&/name& &value&hdfs://devcluster01:9000&/value& &description& Where to find the Hadoop Filesystem through the network. Note 9000 is not the default port. (This is slightly changed from previous versions which didnt have &hdfs&) &/description& &/property& &property& &name&io.file.buffer.size&/name& &value&131072&/value& &description&The size of buffer for use in sequence files. The size of this buffer should probably be a multiple of hardware page size (4096 on Intel x86), and it determines how much data is buffered during read and write operations.&/description& &/property& 6、vi etc/hadoop/mapred-site.xml &property& &name&mapreduce.framework.name&/name& &value&yarn&/value& &/property& &property& &name&mapred.job.reduce.input.buffer.percent&/name& &value&1&/value& &description&The percentage of memory- relative to the maximum heap size- to retain map outputs during the reduce. When the shuffle is concluded, any remaining map outputs in memory must consume less than this threshold before the reduce can begin. &/description& &/property& &property& &name&mapred.job.shuffle.input.buffer.percent&/name& &value&1&/value& &description&The percentage of memory to be allocated from the maximum heap size to storing map outputs during the shuffle. &/description& &/property& &property& &name&mapred.inmem.merge.threshold&/name& &value&0&/value& &description&The threshold, in terms of the number of files for the in-memory merge process. When we accumulate threshold number of files
we initiate the in-memory merge and spill to disk. A value of 0 or less than 0 indicates we want to DON&#39;T have any threshold and instead depend only on the ramfs&#39;s memory consumption to trigger the merge. &/description& &/property& &property& &name&io.sort.factor&/name& &value&100&/value& &description&The number of streams to merge at once while sorting files. This determines the number of open file handles.&/description& &/property& &property& &name&io.sort.mb&/name& &value&240&/value& &description&The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.&/description& &/property& &property& &name&mapred.pression.codec&/name& &value&org.apache.press.SnappyCodec&/value& &description&If the map outputs are compressed, how should they be compressed? &/description& &/property& &property& &name&pression.codec&/name& &value&org.apache.press.SnappyCodec&/value& &description&If the job outputs are compressed, how should they be compressed? &/description& &/property& &property& &name&pression.type&/name& &value&BLOCK&/value& &description&If the job outputs are to compressed as SequenceFiles, how should they be compressed? Should be one of NONE, RECORD or BLOCK. &/description& &/property& &property& &name&mapred.child.java.opts&/name& &value&-Xmx2000m&/value& &/property& &property& &name&press&/name&
&value&true&/value& &description&Should the job outputs be compressed? &/description& &/property& &property& &name&press.map.output&/name& &value&true&/value& &description&Should the outputs of the maps be compressed before being sent across the network. Uses SequenceFile compression. &/description& &/property& &property& &name&mapred.tasktracker.map.tasks.maximum&/name& &value&5&/value& &/property& &property& &name&mapred.map.tasks&/name& &value&15&/value& &/property& &property& &name&mapred.tasktracker.reduce.tasks.maximum&/name& &value&5&/value& &description& define mapred.map tasks to be number of slave hosts.the best number is the number of slave hosts plus the core numbers of per host &/description& &/property& &property& &name&mapred.reduce.tasks&/name& &value&15&/value& &description& define mapred.reduce tasks to be number of slave hosts.the best number is the number of slave hosts plus the core numbers of per host &/description& &/property& &property& &name&mapred.system.dir&/name& &value&/home/ysc/mapreduce/system&/value& &/property& &property& &name&mapred.local.dir&/name& &value&/home/ysc/mapreduce/local&/value& &/property& &property&
&name&mapreduce.job.counters.max&/name& &value&12000&/value& &description&Limit on the number of counters allowed per job. &/description& &/property& 7、vi etc/hadoop/yarn-site.xml &property& &name&yarn.resourcemanager.resource-tracker.address&/name& &value&devcluster01:8031&/value& &/property& &property& &name&yarn.resourcemanager.address&/name& &value&devcluster01:8032&/value& &/property& &property& &name&yarn.resourcemanager.scheduler.address&/name& &value&devcluster01:8030&/value& &/property& &property& &name&yarn.resourcemanager.admin.address&/name& &value&devcluster01:8033&/value& &/property& &property& &name&yarn.resourcemanager.webapp.address&/name& &value&devcluster01:8088&/value& &/property& &property& &description&Classpath for typical applications.&/description& &name&yarn.application.classpath&/name& &value& $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $YARN_HOME/*,$YARN_HOME/lib/* &/value& &/property& &property& &name&yarn.nodemanager.aux-services&/name& &value&mapreduce.shuffle&/value& &/property& &property& &name&yarn.nodemanager.aux-services.mapreduce.shuffle.class&/name& &value&org.apache.hadoop.mapred.ShuffleHandler&/value&
&/property& &property& &name&yarn.nodemanager.local-dirs&/name&&value&/home/ysc/h2/data/1/yarn/local,/home/ys c/h2/data/2/yarn/local,/home/ysc/h2/data/3/yarn/local&/value& &/property& &property& &name&yarn.nodemanager.log-dirs&/name&&value&/home/ysc/h2/data/1/yarn/logs,/home/ysc/ h2/data/2/yarn/logs,/home/ysc/h2/data/3/yarn/logs&/value& &/property& &property& &description&Where to aggregate logs&/description& &name&yarn.nodemanager.remote-app-log-dir&/name& &value&/home/ysc/h2/var/log/hadoop-yarn/apps&/value& &/property& &property& &name&mapreduce.jobhistory.address&/name& &value&devcluster01:10020&/value& &/property& &property& &name&mapreduce.jobhistory.webapp.address&/name& &value&devcluster01:19888&/value& &/property& 8、vi etc/hadoop/hdfs-site.xml &property& &name&dfs.permissions.superusergroup&/name& &value&root&/value& &/property& &property& &name&dfs.name.dir&/name& &value&/home/ysc/dfs/filesystem/name&/value& &/property& &property& &name&dfs.data.dir&/name& &value&/home/ysc/dfs/filesystem/data&/value& &/property& &property& &name&dfs.replication&/name& &value&3&/value& &/property& &property& &name&dfs.block.size&/name& &value&&/value& &description&The default block size for new files.&/description& &/property&
9、启动 hadoop bin/hdfs namenode -format sbin/start-dfs.sh sbin/start-yarn.sh 10、访问管理页面 http://devcluster01:8088 http://devcluster01:50070
感觉不错的话,微信扫一扫关注官方微信公众账号,以后找资料更方便
是互联网分享学习的开放平台,汇集亿份高价值的文档资料,涵盖档下载,程序开发文档,教育文档,医药文档,办公文档,考试文档,营销文档,工程文档,分享文档等行业。为您提供愉悦的分享学习体验。
Nutch相关框架安装使用最佳指南
你可以免费下载该文档
没满足你的要求? 查找更多相关内容

我要回帖

更多关于 怎么看手机root了没有 的文章

 

随机推荐