数据按列排序

来源:互联网 发布:淘宝客 知乎 编辑:程序博客网 时间:2024/05/11 01:55
对一些有指定分隔符的数据,按照对应列数进行自定义排序


原始数据:
[java] view plain copy print?在CODE上查看代码片派生到我的代码片
  1. hadoop@sh-hadoop:more sourText.txt   
  2. hadoop|234|2346|sdfasdgadfgdfg  
  3. spark|534|65745|fhsdfghdfgh  
  4. hive|65|6585|shsfghfgh  
  5. hbase|98|456|jhgjdfghj  
  6. tachyon|345|567|sfhrtyhert  
  7. kafka|455|567|dghrtyh  
  8. storm|86|345|dgsdfg  
  9. redis|45|56|ergerg  
  10. sqoop|45|765|fghd  
  11. flume|34|67|sdfgrty  
  12. oozie|23|45|adfgdfg  
  13. pig|54|456|dfg  
  14. zookeeper|23|543|dfgd  
  15. solr|75|54|ertgergt  



1、用Mr进行排序,按照第2列进行降序排序:
[java] view plain copy print?在CODE上查看代码片派生到我的代码片
  1. hadoop@sh-hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/input/sourText.txt | wc -l  
  2. 14  
  3. hadoop@sh-hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/Domain800_level2/merge1/out1/* | wc -l  
  4. 14  
  5. hadoop@sh-hadoop:/home/hadoop/blb$ hdfs dfs -text /user/hadoop/libin/Domain800_level2/merge1/out1/* | more  
  6. spark|534|65745|fhsdfghdfgh  
  7. kafka|455|567|dghrtyh  
  8. tachyon|345|567|sfhrtyhert  
  9. hadoop|234|2346|sdfasdgadfgdfg  
  10. hbase|98|456|jhgjdfghj  
  11. storm|86|345|dgsdfg  
  12. solr|75|54|ertgergt  
  13. hive|65|6585|shsfghfgh  
  14. pig|54|456|dfg  
  15. redis|45|56|ergerg  
  16. sqoop|45|765|fghd  
  17. flume|34|67|sdfgrty  
  18. oozie|23|45|adfgdfg  
  19. zookeeper|23|543|dfgd  
  20. hadoop@sh-hadoop:/home/hadoop/blb$   


2、用shell命令进行统计:
-r:sort默认的排序方式是升序,如果想改成降序,加个-r就搞定了。
-n:就要使用-n选项,来告诉sort,“要以数值来排序”!
-t:sort提供了-t选项,后面可以设定间隔符。
-k:指定了间隔符之后,就可以用-k来指定列数了。

2.1、按照第二列进行降序排序:

sort -t "|" -nrk2 sourText.txt 

[java] view plain copy print?在CODE上查看代码片派生到我的代码片
  1. hadoop@sh-hadoop:/home/hadoop/blb$ sort -t "|" -nrk2 sourText.txt   
  2. spark|534|65745|fhsdfghdfgh  
  3. kafka|455|567|dghrtyh  
  4. tachyon|345|567|sfhrtyhert  
  5. hadoop|234|2346|sdfasdgadfgdfg  
  6. hbase|98|456|jhgjdfghj  
  7. storm|86|345|dgsdfg  
  8. solr|75|54|ertgergt  
  9. hive|65|6585|shsfghfgh  
  10. pig|54|456|dfg  
  11. sqoop|45|765|fghd  
  12. redis|45|56|ergerg  
  13. flume|34|67|sdfgrty  
  14. zookeeper|23|543|dfgd  
  15. oozie|23|45|adfgdfg  

2.2、按照第三列进行降序排序:
[java] view plain copy print?在CODE上查看代码片派生到我的代码片
  1. hadoop@sh-hadoop:/home/hadoop/blb$ sort -t "|" -nrk3 sourText.txt   
  2. spark|534|65745|fhsdfghdfgh  
  3. hive|65|6585|shsfghfgh  
  4. hadoop|234|2346|sdfasdgadfgdfg  
  5. sqoop|45|765|fghd  
  6. tachyon|345|567|sfhrtyhert  
  7. kafka|455|567|dghrtyh  
  8. zookeeper|23|543|dfgd  
  9. pig|54|456|dfg  
  10. hbase|98|456|jhgjdfghj  
  11. storm|86|345|dgsdfg  
  12. flume|34|67|sdfgrty  
  13. redis|45|56|ergerg  
  14. solr|75|54|ertgergt  
  15. oozie|23|45|adfgdfg  


排序后倒入新文件中:

 sort -t "|" -nrk2 part-r-00000 |more > merge.txt



附录:

MapReduce实现代码:

[java] view plain copy print?在CODE上查看代码片派生到我的代码片
  1. import java.io.DataInput;  
  2. import java.io.DataOutput;  
  3. import java.io.IOException;  
  4.   
  5. import org.apache.hadoop.conf.Configuration;  
  6. import org.apache.hadoop.fs.Path;  
  7. import org.apache.hadoop.io.LongWritable;  
  8. import org.apache.hadoop.io.NullWritable;  
  9. import org.apache.hadoop.io.Text;  
  10. import org.apache.hadoop.io.WritableComparable;  
  11. import org.apache.hadoop.mapreduce.Job;  
  12. import org.apache.hadoop.mapreduce.Mapper;  
  13. import org.apache.hadoop.mapreduce.Reducer;  
  14. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;  
  15. import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;  
  16. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;  
  17. import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;  
  18. import org.apache.hadoop.util.GenericOptionsParser;  
  19.   
  20. import mapreduce.SegmentUtil;  
  21.   
  22. public class Domain_merge {  
  23.     public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {  
  24.         Configuration conf = new Configuration();  
  25.         String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();  
  26.         if (otherArgs.length != 2) {  
  27.             System.err.println("Usage Domain800_level2 <input> <输出结果>");  
  28.             System.exit(2);  
  29.         }  
  30.   
  31.         Job job4 = Job.getInstance(conf, Domain_merge.class.getSimpleName());  
  32.         job4.setJarByClass(Domain_merge.class);  
  33.         job4.setMapOutputKeyClass(Toptaobao500.class);  
  34.         job4.setMapOutputValueClass(Text.class);  
  35.         job4.setOutputKeyClass(Text.class);  
  36.         job4.setOutputValueClass(NullWritable.class);  
  37.         //job4.setPartitionerClass(MyPartitioner.class);  
  38.         job4.setMapperClass(MyMapper2.class);  
  39.         job4.setNumReduceTasks(1);  
  40.         job4.setReducerClass(MyReducer2.class);  
  41.         job4.setInputFormatClass(TextInputFormat.class);  
  42.         job4.setOutputFormatClass(TextOutputFormat.class);  
  43.         FileInputFormat.addInputPath(job4, new Path(otherArgs[0]));  
  44.         FileOutputFormat.setOutputPath(job4, new Path(otherArgs[1]));  
  45.         job4.waitForCompletion(true);  
  46.     }  
  47.     /** 
  48.      * 第二个Job排序 
  49.      */  
  50.     public static class MyMapper2 extends Mapper<LongWritable, Text, Toptaobao500, Text>{  
  51.         Toptaobao500 mw=new Toptaobao500();  
  52.         @Override  
  53.         protected void map(LongWritable key, Text value,  
  54.                 Mapper<LongWritable, Text, Toptaobao500, Text>.Context context)  
  55.                         throws IOException, InterruptedException {  
  56.             String[] spl=value.toString().split("\\|");  
  57.             String trait=spl[0].trim();  
  58.             String uv=spl[1].trim();  
  59.             String pv=spl[2].trim();  
  60.             String fenlei=spl[3].trim();  
  61.               
  62.             mw.setkind(trait+"|"+uv+"|"+pv+"|"+fenlei);  
  63.             mw.setCount(Long.parseLong(uv.trim()));  
  64.             context.write(mw, new Text(value));  
  65.         }  
  66.     }  
  67.     public static class MyReducer2 extends Reducer<Toptaobao500, Text, Text, NullWritable>{  
  68.         @Override  
  69.         protected void reduce(Toptaobao500 k4, Iterable<Text> v4s, Reducer<Toptaobao500, Text, Text, NullWritable>.Context context)  
  70.                 throws IOException, InterruptedException {  
  71.             for (Text v4 : v4s) {  
  72.                 context.write(v4, NullWritable.get());  
  73.             }  
  74.         }  
  75.     }  
  76.     public static class Toptaobao500 implements WritableComparable<Toptaobao500> {  
  77.         String kind;  
  78.         Long count;  
  79.   
  80.         public Toptaobao500() {  
  81.         }  
  82.   
  83.         public Toptaobao500(String kind, Long count) {  
  84.             this.kind = kind;  
  85.             this.count = count;  
  86.         }  
  87.   
  88.         public void setkind(String kind) {  
  89.             this.kind = kind;  
  90.         }  
  91.   
  92.         public void setCount(Long l) {  
  93.             this.count = l;  
  94.         }  
  95.   
  96.         public String getKind() {  
  97.             return this.kind;  
  98.         }  
  99.   
  100.         public Long getCount() {  
  101.             return this.count;  
  102.         }  
  103.   
  104.         @Override  
  105.         public void write(DataOutput out) throws IOException {  
  106.             out.writeUTF(kind);  
  107.             out.writeLong(count);  
  108.         }  
  109.   
  110.         @Override  
  111.         public void readFields(DataInput in) throws IOException {  
  112.             this.kind = in.readUTF();  
  113.             this.count = in.readLong();  
  114.         }  
  115.   
  116.         @Override  
  117.         public int compareTo(Toptaobao500 o) {  
  118.             long temp=this.count-o.count;    
  119.             if(temp>0){    
  120.                 temp=-1;    
  121.                 return (int) temp;    
  122.             }else if(temp<0){    
  123.                 temp=1;    
  124.                 return (int) temp;    
  125.             }    
  126.             return (int) (this.count-o.count);   
  127.         }  
  128.         @Override    
  129.         public boolean equals(Object obj) {    
  130.             return super.equals(obj);    
  131.         }    
  132.         @Override    
  133.         public int hashCode() {    
  134.             return super.hashCode();    
  135.         }   
  136.         @Override  
  137.         public String toString() {  
  138.             return this.kind;  
  139.         }  
  140.     }  

0 0