李林超博客
首页
归档
留言
友链
动态
关于
归档
留言
友链
动态
关于
首页
大数据
正文
Spark Core案例实操(三)
Leefs
2021-10-30 PM
1635℃
0条
### 一、需求 > Top10 热门品类中每个品类的 Top10 活跃 Session 统计 **说明** 在上个需求的基础上,增加每个品类用户 session 的点击统计 ### 二、功能实现 #### 2.1 实现步骤 ``` 1.过滤原始数据,保留点击和前10品类ID 2.根据品类ID和sessionId进行点击量的统计 3.将统计的结果进行结构的转换 (( 品类ID,sessionId ),sum) => ( 品类ID,(sessionId, sum) ) 4.相同的品类进行分组 5.将分组后的数据进行点击量的排序,取前10名 ``` #### 2.2 代码 ```scala import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} /** * @author lilinchao * @date 2021/10/30 * @description Top10 热门品类中每个品类的Top10活跃Session统计 **/ object HotCategoryTop10SessionAnalysis { def main(args: Array[String]): Unit = { val sparConf = new SparkConf().setMaster("local[*]").setAppName("HotCategoryTop10Analysis") val sc = new SparkContext(sparConf) val actionRDD = sc.textFile("datas/user_visit_action.txt") actionRDD.cache() val top10Ids:Array[String] = top10Category(actionRDD) //1. 过滤原始数据,保留点击和前10品类ID val filterActionRDD = actionRDD.filter( action => { val datas = action.split("_") if(datas(6) != "-1"){ top10Ids.contains(datas(6)) }else{ false } } ) //2. 根据品类ID和sessionId进行点击量的统计 val reduceRDD:RDD[((String,String),Int)] = filterActionRDD.map( action => { val datas = action.split("_") ((datas(6),datas(2)),1) } ).reduceByKey(_ + _) //3. 将统计的结果进行结构的转换 // (( 品类ID,sessionId ),sum) => ( 品类ID,(sessionId, sum) ) val mapRDD = reduceRDD.map{ case ((cid,sid),sum) => { (cid,(sid,sum)) } } //4. 相同的品类进行分组 val groupRDD: RDD[(String, Iterable[(String, Int)])] = mapRDD.groupByKey() //5. 将分组后的数据进行点击量的排序,取前10名 val resultRDD = groupRDD.mapValues( iter => { iter.toList.sortBy(_._2)(Ordering.Int.reverse).take(10) } ) resultRDD.collect().foreach(println) sc.stop() } def top10Category(actionRDD:RDD[String]) = { val flatRDD: RDD[(String, (Int, Int, Int))] = actionRDD.flatMap( action => { val datas = action.split("_") if(datas(6) != "-1"){ //点击的场合 List((datas(6),(1,0,0))) }else if(datas(8) != "null") { //下单的场合 val ids = datas(8).split(",") ids.map(id => (id,(0,1,0))) }else if(datas(10) !="null"){ //支付的场合 val ids = datas(10).split(",") ids.map(id => (id,(0,0,1))) }else{ Nil } } ) val analysisRDD = flatRDD.reduceByKey( (t1, t2) => { ( t1._1+t2._1, t1._2 + t2._2, t1._3 + t2._3 ) } ) analysisRDD.sortBy(_._2, false).take(10).map(_._1) } } ``` **运行结果** ``` (20,List((199f8e1d-db1a-4174-b0c2-ef095aaef3ee,8), (7eacf77a-c019-4072-8e09-840e5cca6569,8), (22e78a14-c5eb-45fe-a67d-2ce538814d98,7), (07b5fb82-da25-4968-9fd8-47485f4cf61e,7), (cde33446-095b-433c-927b-263ba7cd102a,7), (85157915-aa25-4a8d-8ca0-9da1ee67fa70,7), (215bdee7-db27-458d-80f4-9088d2361a2e,7), (5e3545a0-1521-4ad6-91fe-e792c20c46da,7), (ab27e376-3405-46e2-82cb-e247bf2a16fb,7), (d500c602-55db-4eb7-a343-3540c3ec7a36,7))) (19,List((fde62452-7c09-4733-9655-5bd3fb705813,9), (85157915-aa25-4a8d-8ca0-9da1ee67fa70,9), (d4c2b45d-7fa1-4eff-8473-42cecdaffd62,9), (329b966c-d61b-46ad-949a-7e37142d384a,8), (1b5e5ce7-cd04-4e78-9a6f-1c3dbb29ce39,8), (4d93913f-a892-490d-aa58-3a74b9099e29,7), (22e78a14-c5eb-45fe-a67d-2ce538814d98,7), (46e6b58a-ad5b-4330-9e67-bc2651f9c5e2,7), (5e3545a0-1521-4ad6-91fe-e792c20c46da,7), (b61734fd-b10d-456d-b189-2e3fe0adf31d,7))) (15,List((632972a4-f811-4000-b920-dc12ea803a41,10), (9fa653ec-5a22-4938-83c5-21521d083cd0,8), (66a421b0-839d-49ae-a386-5fa3ed75226f,8), (5e3545a0-1521-4ad6-91fe-e792c20c46da,8), (f34878b8-1784-4d81-a4d1-0c93ce53e942,8), (524378b7-e676-43e6-a566-a5493a9058d4,7), (e306c00b-a6c5-44c2-9c77-15e919340324,7), (89278c1a-1e33-45aa-a4b7-33223e37e9df,7), (394490ea-4e99-4b83-a79a-a97aaee5b021,7), (a9b9806c-90fb-4df1-94c6-d0fb12425fa3,7))) (2,List((66c96daa-0525-4e1b-ba55-d38a4b462b97,11), (b4589b16-fb45-4241-a576-28f77c6e4b96,11), (f34878b8-1784-4d81-a4d1-0c93ce53e942,10), (25fdfa71-c85d-4c28-a889-6df726f08ffb,9), (ab27e376-3405-46e2-82cb-e247bf2a16fb,8), (0b17692b-d603-479e-a031-c5001ab9009e,8), (39cd210e-9d54-4315-80bf-bed004996861,8), (213bc2d5-be6b-49a3-9cb6-f9afc5b69b3d,8), (f666d6ba-b3e8-45b1-a269-c5d6c08413c3,8), (eb7e9f6f-e0aa-4b10-b13b-75746150b8eb,7))) (17,List((4509c42c-3aa3-4d28-84c6-5ed27bbf2444,12), (bf390289-5c9d-4037-88b3-fdf386b3acd5,8), (1b5ac69b-5e00-4ff3-8a5c-6822e92ecc0c,8), (0416a1f7-350f-4ea9-9603-a05f8cfa0838,8), (9bdc044f-8593-49fc-bbf0-14c28f901d42,8), (dd3704d5-a2f9-40c1-b491-87d24bbddbdb,8), (ab16e1e4-b3fc-4d43-9c95-3d49ec26d59c,7), (fde62452-7c09-4733-9655-5bd3fb705813,7), (abbf9c96-eca3-4ecf-9c44-04193eb4b562,7), (329b966c-d61b-46ad-949a-7e37142d384a,7))) (13,List((329b966c-d61b-46ad-949a-7e37142d384a,8), (f736ee4a-cc14-4aa9-9a96-a98b0ad7cc3d,8), (1fb79ba2-4780-4652-9574-b1577c7112db,7), (0f227059-7006-419c-87b0-b2057b94505b,7), (632972a4-f811-4000-b920-dc12ea803a41,7), (7eacce38-ffbc-4f9c-a3ee-b1711f8927b0,7), (c0f70b31-fc3b-4908-af97-9b4936340367,7), (1b5e5ce7-cd04-4e78-9a6f-1c3dbb29ce39,7), (1b5ac69b-5e00-4ff3-8a5c-6822e92ecc0c,6), (a1cef5b4-9451-480f-9c9a-a52c9404a51a,6))) (11,List((329b966c-d61b-46ad-949a-7e37142d384a,12), (99f48b83-8f85-4bea-8506-c78cfe5a2136,7), (4509c42c-3aa3-4d28-84c6-5ed27bbf2444,7), (dc226249-ce13-442c-b6e4-bfc84649fff6,7), (2cd89b09-bae3-49b5-a422-9f9e0c12a040,7), (df9f26b6-385a-4ba3-ad75-6d3bd9f95c88,6), (e7f9a91d-ff65-4b5f-9488-2a3195e1d0c6,6), (5d2f3efb-be1c-4ee2-8fd5-545fd049e70c,6), (8fb4a0a9-e128-4d9a-8e11-fd4a9fd22474,6), (45e35ffa-f0e0-400e-a252-5605b4089625,6))) (7,List((a41bc6ea-b3e3-47ce-98af-48169da7c91b,9), (4dbd319c-3f44-48c9-9a71-a917f1d922c1,7), (9fa653ec-5a22-4938-83c5-21521d083cd0,7), (aef6615d-4c71-4d39-8062-9d5d778e55f1,7), (2d4b9c3e-2a9e-41b6-9573-9fde3533ed89,7), (95cb71b8-7033-448f-a4db-ae9861dd996b,7), (f34878b8-1784-4d81-a4d1-0c93ce53e942,7), (1eb7b1af-6532-4b39-ac36-c76b2611b1f1,6), (71f1c966-11e4-450f-81d2-c0b334710ccc,6), (329b966c-d61b-46ad-949a-7e37142d384a,6))) (9,List((199f8e1d-db1a-4174-b0c2-ef095aaef3ee,9), (329b966c-d61b-46ad-949a-7e37142d384a,8), (5e3545a0-1521-4ad6-91fe-e792c20c46da,8), (66c96daa-0525-4e1b-ba55-d38a4b462b97,7), (e306c00b-a6c5-44c2-9c77-15e919340324,7), (f205fd4f-f312-46d2-a850-26a16ac2734c,7), (4f0261cc-2cb1-40e0-9ffb-5587920c1084,7), (cbdbd1a4-7760-4195-bfba-fa44492bf906,7), (1caec16d-6af8-49a5-b5f1-d7f377f98a4e,7), (8a0f8fe1-d0f4-4687-aff3-7ce37c52ab71,7))) (12,List((a4b05ea2-2869-4f20-a82a-86352aa60e9f,8), (b4589b16-fb45-4241-a576-28f77c6e4b96,8), (22a687a0-07c9-4e84-adff-49dfc4fe96df,8), (73203aee-de2e-443e-93cb-014e38c0d30c,8), (64285623-54ad-4a1f-ae84-d8f85ebf94c6,7), (4c90a8a8-91d0-4888-908c-95dad1c5194e,7), (a735881e-4c30-4ddc-a1d9-ef2069d5fb5b,7), (89278c1a-1e33-45aa-a4b7-33223e37e9df,7), (ab16e1e4-b3fc-4d43-9c95-3d49ec26d59c,7), (c32dc073-4454-4fcd-bf55-fbfcc8e650f3,7))) ``` ### 结尾 因为本篇使用的示例数据`user_visit_action.txt`文件由于数据量较大将不在下方贴出。 直接在微信公众号【Java和大数据进阶】回复:**sparkdata**,即可获取。
标签:
Spark
,
Spark Core
非特殊说明,本博所有文章均为博主原创。
如若转载,请注明出处:
https://lilinchao.com/archives/1597.html
上一篇
Spark Core案例实操(二)
下一篇
Spark Core案例实操(四)
评论已关闭
栏目分类
随笔
2
Java
326
大数据
229
工具
31
其它
25
GO
47
NLP
4
标签云
FastDFS
Filter
nginx
随笔
Zookeeper
gorm
Golang基础
字符串
Git
Jenkins
Flink
Stream流
Thymeleaf
递归
MyBatis
线程池
LeetCode刷题
VUE
Http
Livy
BurpSuite
Quartz
Tomcat
Python
队列
国产数据库改造
Jquery
Java编程思想
并发编程
Linux
友情链接
申请
范明明
庄严博客
Mx
陶小桃Blog
虫洞
评论已关闭