《大数据基本处理框架原理与实践》PPT课件(共16次课)第十四次课:Spark streaming的实践操作.pptVIP

  • 1
  • 0
  • 约1.79万字
  • 约 49页
  • 2022-03-12 发布于安徽
  • 举报

《大数据基本处理框架原理与实践》PPT课件(共16次课)第十四次课:Spark streaming的实践操作.ppt

Flume与Spark streaming整合-pull (2)创建Spark streaming应用 在IDEA中创建一个基于maven的WordCount项目 project xmlns=/POM/4.0.0 xmlns:xsi=/2001/XMLSchema-instance xsi:schemaLocation=/POM/4.0.0 /maven-v4_0_0.xsd modelVersion4.0.0/modelVersion groupIdcom.liu/groupId artifactIdsocketSparkStreaming/artifactId version1.0-SNAPSHOT/version ? dependencies !-- 对spark core的依赖 -- dependency groupIdorg.apache.spark/groupId artifactIdspark-core_2.11/artifactId version2.4.5/version /dependency !-- 对spark streaming的依赖 -- dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming_2.11/artifactId version2.4.5/version /dependency dependency groupIdorg.apache.spark/groupId artifactIdspark-streaming-flume_2.11/artifactId version2.4.2/version /dependency /dependencies /project Pom.xml文件 Flume与Spark streaming整合-pull (2)创建Spark streaming应用 Scala class文件 import .InetSocketAddress import org.apache.spark.storage.StorageLevel import org.apache.spark.streaming.dstream.{DStream, ReceiverInputDStream} import org.apache.spark.streaming.flume.{FlumeUtils, SparkFlumeEvent} import org.apache.spark.streaming.{Seconds, StreamingContext}? object FlumeWordCount { def main(args :Array[String]) = { //创建一个streamingContext对象,在本地运行,两个线程 //设置划分数据流为片段的时间间隔为20秒 val sc = new StreamingContext(local[2], flumeWordCount, Seconds(20) ) ? //定义一个flume的sink的机器和端口 val ncAddresses = Seq(new InetSocketAddress(localhost,8888)) ? //获取flume中的数据 val inputDstream:ReceiverInputDStream[SparkFlumeEvent]= FlumeUtils.createPollingStream(sc, ncAddresses, StorageLevel.MEMORY_ONLY) ? //将flume输出的event中的数据取出,并转成字符串 val lines: DStream[String] = inputDstream.map(x = new String(x.event.getBody.array())) ? //将输入数据流中的每一行以空格为分隔符分割为单词 val words = lines.flatMap(line=line.split( )) ? //统计一个时间片内的单词个数

您可能关注的文档

文档评论(0)

1亿VIP精品文档

相关文档