nutch-1.3 分布式terminal操作过程

来源:互联网 发布:js数组中对象的属性 编辑:程序博客网 时间:2024/06/03 23:01
 

kaiwii@master:~/nutch-1.2/bin$ ./hadoop namenode -format
11/08/13 19:52:20 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
Re-format filesystem in /home/kaiwii/tmp/hadoop/hadoop-kaiwii/dfs/name ? (Y or N) Y
11/08/13 19:52:23 INFO namenode.FSNamesystem: fsOwner=kaiwii,kaiwii,adm,dialout,cdrom,floppy,audio,dip,video,plugdev,fuse,lpadmin,admin
11/08/13 19:52:23 INFO namenode.FSNamesystem: supergroup=supergroup
11/08/13 19:52:23 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/08/13 19:52:23 INFO common.Storage: Image file of size 96 saved in 0 seconds.
11/08/13 19:52:23 INFO common.Storage: Storage directory /home/kaiwii/tmp/hadoop/hadoop-kaiwii/dfs/name has been successfully formatted.
11/08/13 19:52:23 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/127.0.1.1
************************************************************/
kaiwii@master:~/nutch-1.2/bin$ ./start-all.sh
starting namenode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-namenode-master.out
localhost: starting datanode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-datanode-master.out
localhost: starting secondarynamenode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-secondarynamenode-master.out
starting jobtracker, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-jobtracker-master.out
localhost: starting tasktracker, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-tasktracker-master.out
kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

kaiwii@master:~/nutch-1.2/bin$ jps
11910 SecondaryNameNode
12305 Jps
11973 JobTracker
11737 NameNode
12048 TaskTracker
kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

kaiwii@master:~/nutch-1.2/bin$ ./stop-all.sh
stopping jobtracker
localhost: stopping tasktracker
stopping namenode
localhost: no datanode to stop
localhost: stopping secondarynamenode
kaiwii@master:~/nutch-1.2/bin$ ./hadoop namenode -format
11/08/13 20:02:30 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/127.0.1.1
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
11/08/13 20:02:30 INFO namenode.FSNamesystem: fsOwner=kaiwii,kaiwii,adm,dialout,cdrom,floppy,audio,dip,video,plugdev,fuse,lpadmin,admin
11/08/13 20:02:30 INFO namenode.FSNamesystem: supergroup=supergroup
11/08/13 20:02:30 INFO namenode.FSNamesystem: isPermissionEnabled=true
11/08/13 20:02:30 INFO common.Storage: Image file of size 96 saved in 0 seconds.
11/08/13 20:02:30 INFO common.Storage: Storage directory /home/kaiwii/tmp/hadoop/hadoop-kaiwii/dfs/name has been successfully formatted.
11/08/13 20:02:30 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/127.0.1.1
************************************************************/
kaiwii@master:~/nutch-1.2/bin$ ./start-all.sh
starting namenode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-namenode-master.out
localhost: starting datanode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-datanode-master.out
localhost: starting secondarynamenode, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-secondarynamenode-master.out
starting jobtracker, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-jobtracker-master.out
localhost: starting tasktracker, logging to /home/kaiwii/hadoop-0.20.2/logs/hadoop-kaiwii-tasktracker-master.out
kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 0 (0 KB)
Present Capacity: 0 (0 KB)
DFS Remaining: 0 (0 KB)
DFS Used: 0 (0 KB)
DFS Used%: �%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 0 (0 total, 0 dead)

kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 20368445440 (18.97 GB)
Present Capacity: 13561008128 (12.63 GB)
DFS Remaining: 13560983552 (12.63 GB)
DFS Used: 24576 (24 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 20368445440 (18.97 GB)
DFS Used: 24576 (24 KB)
Non DFS Used: 6807437312 (6.34 GB)
DFS Remaining: 13560983552(12.63 GB)
DFS Used%: 0%
DFS Remaining%: 66.58%
Last contact: Sat Aug 13 20:07:32 PDT 2011


kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfsadmin -report
Configured Capacity: 20368445440 (18.97 GB)
Present Capacity: 13561008143 (12.63 GB)
DFS Remaining: 13560983552 (12.63 GB)
DFS Used: 24591 (24.01 KB)
DFS Used%: 0%
Under replicated blocks: 0
Blocks with corrupt replicas: 0
Missing blocks: 0

-------------------------------------------------
Datanodes available: 1 (1 total, 0 dead)

Name: 127.0.0.1:50010
Decommission Status : Normal
Configured Capacity: 20368445440 (18.97 GB)
DFS Used: 24591 (24.01 KB)
Non DFS Used: 6807437297 (6.34 GB)
DFS Remaining: 13560983552(12.63 GB)
DFS Used%: 0%
DFS Remaining%: 66.58%
Last contact: Sat Aug 13 20:07:35 PDT 2011


kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfs -copyFromLocal ../urls urls
kaiwii@master:~/nutch-1.2/bin$ ./hadoop dfs -lsr
-rw-r--r--   1 kaiwii supergroup         18 2011-08-13 20:08 /user/kaiwii/urls
kaiwii@master:~/nutch-1.2/bin$ nutch crawl urls -dir crawled -depth 3 -topN 10
bash: nutch: command not found
kaiwii@master:~/nutch-1.2/bin$ ./nutch crawl urls -dir crawled -depth 3 -topN 10
11/08/13 20:09:30 INFO crawl.Crawl: crawl started in: crawled
11/08/13 20:09:30 INFO crawl.Crawl: rootUrlDir = urls
11/08/13 20:09:30 INFO crawl.Crawl: threads = 10
11/08/13 20:09:30 INFO crawl.Crawl: depth = 3
11/08/13 20:09:30 INFO crawl.Crawl: indexer=lucene
11/08/13 20:09:30 INFO crawl.Crawl: topN = 10
11/08/13 20:09:30 INFO crawl.Injector: Injector: starting at 2011-08-13 20:09:30
11/08/13 20:09:30 INFO crawl.Injector: Injector: crawlDb: crawled/crawldb
11/08/13 20:09:30 INFO crawl.Injector: Injector: urlDir: urls
11/08/13 20:09:30 INFO crawl.Injector: Injector: Converting injected urls to crawl db entries.
11/08/13 20:09:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/13 20:09:36 INFO mapred.FileInputFormat: Total input paths to process : 1
11/08/13 20:09:43 INFO mapred.JobClient: Running job: job_201108132007_0001
11/08/13 20:09:44 INFO mapred.JobClient:  map 0% reduce 0%
11/08/13 20:10:30 INFO mapred.JobClient:  map 100% reduce 0%
11/08/13 20:10:52 INFO mapred.JobClient:  map 100% reduce 100%
11/08/13 20:10:55 INFO mapred.JobClient: Job complete: job_201108132007_0001
11/08/13 20:10:57 INFO mapred.JobClient: Counters: 18
11/08/13 20:10:57 INFO mapred.JobClient:   Job Counters
11/08/13 20:10:57 INFO mapred.JobClient:     Launched reduce tasks=1
11/08/13 20:10:57 INFO mapred.JobClient:     Launched map tasks=2
11/08/13 20:10:57 INFO mapred.JobClient:     Data-local map tasks=2
11/08/13 20:10:57 INFO mapred.JobClient:   FileSystemCounters
11/08/13 20:10:57 INFO mapred.JobClient:     FILE_BYTES_READ=6
11/08/13 20:10:57 INFO mapred.JobClient:     HDFS_BYTES_READ=28
11/08/13 20:10:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=82
11/08/13 20:10:57 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=86
11/08/13 20:10:57 INFO mapred.JobClient:   Map-Reduce Framework
11/08/13 20:10:57 INFO mapred.JobClient:     Reduce input groups=0
11/08/13 20:10:57 INFO mapred.JobClient:     Combine output records=0
11/08/13 20:10:57 INFO mapred.JobClient:     Map input records=1
11/08/13 20:10:57 INFO mapred.JobClient:     Reduce shuffle bytes=6
11/08/13 20:10:57 INFO mapred.JobClient:     Reduce output records=0
11/08/13 20:10:57 INFO mapred.JobClient:     Spilled Records=0
11/08/13 20:10:57 INFO mapred.JobClient:     Map output bytes=0
11/08/13 20:10:57 INFO mapred.JobClient:     Map input bytes=18
11/08/13 20:10:57 INFO mapred.JobClient:     Combine input records=0
11/08/13 20:10:57 INFO mapred.JobClient:     Map output records=0
11/08/13 20:10:57 INFO mapred.JobClient:     Reduce input records=0
11/08/13 20:10:57 INFO crawl.Injector: Injector: Merging injected urls into crawl db.
11/08/13 20:10:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/13 20:11:03 INFO mapred.FileInputFormat: Total input paths to process : 1
11/08/13 20:11:05 INFO mapred.JobClient: Running job: job_201108132007_0002
11/08/13 20:11:07 INFO mapred.JobClient:  map 0% reduce 0%
11/08/13 20:11:33 INFO mapred.JobClient:  map 100% reduce 0%
11/08/13 20:11:48 INFO mapred.JobClient:  map 100% reduce 100%
11/08/13 20:11:50 INFO mapred.JobClient: Job complete: job_201108132007_0002
11/08/13 20:11:50 INFO mapred.JobClient: Counters: 18
11/08/13 20:11:50 INFO mapred.JobClient:   Job Counters
11/08/13 20:11:50 INFO mapred.JobClient:     Launched reduce tasks=1
11/08/13 20:11:50 INFO mapred.JobClient:     Launched map tasks=1
11/08/13 20:11:50 INFO mapred.JobClient:     Data-local map tasks=1
11/08/13 20:11:50 INFO mapred.JobClient:   FileSystemCounters
11/08/13 20:11:50 INFO mapred.JobClient:     FILE_BYTES_READ=6
11/08/13 20:11:50 INFO mapred.JobClient:     HDFS_BYTES_READ=86
11/08/13 20:11:50 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44
11/08/13 20:11:50 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=215
11/08/13 20:11:50 INFO mapred.JobClient:   Map-Reduce Framework
11/08/13 20:11:50 INFO mapred.JobClient:     Reduce input groups=0
11/08/13 20:11:50 INFO mapred.JobClient:     Combine output records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Map input records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/08/13 20:11:50 INFO mapred.JobClient:     Reduce output records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Spilled Records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Map output bytes=0
11/08/13 20:11:50 INFO mapred.JobClient:     Map input bytes=0
11/08/13 20:11:50 INFO mapred.JobClient:     Combine input records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Map output records=0
11/08/13 20:11:50 INFO mapred.JobClient:     Reduce input records=0
11/08/13 20:11:50 INFO crawl.Injector: Injector: finished at 2011-08-13 20:11:50, elapsed: 00:02:20
11/08/13 20:11:50 INFO crawl.Generator: Generator: starting at 2011-08-13 20:11:50
11/08/13 20:11:50 INFO crawl.Generator: Generator: Selecting best-scoring urls due for fetch.
11/08/13 20:11:50 INFO crawl.Generator: Generator: filtering: true
11/08/13 20:11:50 INFO crawl.Generator: Generator: normalizing: true
11/08/13 20:11:50 INFO crawl.Generator: Generator: topN: 10
11/08/13 20:11:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
11/08/13 20:11:55 INFO mapred.FileInputFormat: Total input paths to process : 1
11/08/13 20:11:57 INFO mapred.JobClient: Running job: job_201108132007_0003
11/08/13 20:11:58 INFO mapred.JobClient:  map 0% reduce 0%
11/08/13 20:12:19 INFO mapred.JobClient:  map 100% reduce 0%
11/08/13 20:12:29 INFO mapred.JobClient:  map 100% reduce 100%
11/08/13 20:12:31 INFO mapred.JobClient: Job complete: job_201108132007_0003
11/08/13 20:12:31 INFO mapred.JobClient: Counters: 17
11/08/13 20:12:31 INFO mapred.JobClient:   Job Counters
11/08/13 20:12:31 INFO mapred.JobClient:     Launched reduce tasks=1
11/08/13 20:12:31 INFO mapred.JobClient:     Launched map tasks=1
11/08/13 20:12:31 INFO mapred.JobClient:     Data-local map tasks=1
11/08/13 20:12:31 INFO mapred.JobClient:   FileSystemCounters
11/08/13 20:12:31 INFO mapred.JobClient:     FILE_BYTES_READ=6
11/08/13 20:12:31 INFO mapred.JobClient:     HDFS_BYTES_READ=86
11/08/13 20:12:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=44
11/08/13 20:12:31 INFO mapred.JobClient:   Map-Reduce Framework
11/08/13 20:12:31 INFO mapred.JobClient:     Reduce input groups=0
11/08/13 20:12:31 INFO mapred.JobClient:     Combine output records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Map input records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Reduce shuffle bytes=0
11/08/13 20:12:31 INFO mapred.JobClient:     Reduce output records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Spilled Records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Map output bytes=0
11/08/13 20:12:31 INFO mapred.JobClient:     Map input bytes=0
11/08/13 20:12:31 INFO mapred.JobClient:     Combine input records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Map output records=0
11/08/13 20:12:31 INFO mapred.JobClient:     Reduce input records=0
11/08/13 20:12:31 WARN crawl.Generator: Generator: 0 records selected for fetching, exiting ...
11/08/13 20:12:31 INFO crawl.Crawl: Stopping at depth=0 - no more URLs to fetch.
11/08/13 20:12:31 WARN crawl.Crawl: No URLs to fetch - check your seed list and URL filters.
11/08/13 20:12:31 INFO crawl.Crawl: crawl finished: crawled
kaiwii@master:~/nutch-1.2/bin$

 

原创粉丝点击