
来源:互联网 发布:java开发手册 怎么样 编辑:程序博客网 时间:2024/05/11 16:05


Spark亚太研究院100期公益大讲堂 【第6期互动问答分享】


Q1:spark streaming可以不同数据流 join吗?

       Spark Streaming不同的数据流可以进行join操作;

   Spark Streaming is an extension of the coreSpark API that allows enables high-throughput, fault-tolerant stream processingof live data streams. Data can be ingested from many sources like Kafka, Flume,Twitter, ZeroMQ or plain old TCP sockets and be processed using complexalgorithms expressed with high-level functions like mapreducejoin and window

        join(otherStream, [numTasks]):When called on twoDStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairswith all pairs of elements for each key.

Q2:flume 与  spark streaming 适合 集群模式吗?

        Flume与Spark Streaming是为集群而生的;

        For input streams that receive data over the network (suchas, Kafka, Flume, sockets, etc.), the default persistence level is set toreplicate the data to two nodes for fault-tolerance.

        Using any input source that receives datathrough a network - Fornetwork-based data sources like Kafka and Flume, the received input data isreplicated in memory between nodes of the cluster (default replication factoris 2).







Q4:spark streming现在有生产使用吗?

        Spark Streaming非常易于在生产环境下使用;

        无需部署,只需安装好Spark,,就按照好了Spark Streaming;

        国内像皮皮网等都在使用Spark Streaming

0 0