Here we are trying to group by keys and run a count against them.
val datardd = sc.parallelize(Seq(“a”->1,”b”->1,”a”->1,”c”->1))
val mydf = datardd.toDF
mydf.groupBy($”name”).agg(“count” -> “count”).
withColumnRenamed(“count(count)”,”noofoccurrences”).
orderBy($”noofoccurrences”.desc).show
name | noofoccurrences |
a | 2 |
b | 1 |
c | 1 |