spark python wordcount

来源:互联网 发布:黄金现货交易软件 编辑:程序博客网 时间:2024/05/24 06:15
#!/usr/bin/python# -*- coding: UTF-8 -*-'''初始化SparkConf, SparkContext从pyspark 导入SparkConf, SparkContext'''from pyspark import SparkConf, SparkContextconf = SparkConf().setMaster("local").setAppName("My App")sc = SparkContext(conf = conf)inputFile = "hdfs://192.168.10.101:9000/input/test.txt"outputFile = "hdfs://192.168.10.101:9000/output"#读取我们的输入数据input = sc.textFile(inputFile)# 把它切分成一个个单词words = input.flatMap(lambda line: line.split(" "))#转换为键值对并计数counts = words.map(lambda word: (word, 1)).reduceByKey(lambda a, b: a + b)#将统计出来的单词总数存入一个文本文件,引发求值counts.repartition(1).saveAsTextFile(outputFile)SparkContext.stop()

原创粉丝点击