When a transformation such as a map() is called on an RDD, the operation is not performed instantly. When Spark operates on any dataset, it remembers the instructions. So far, if you have any doubts regarding the apache spark interview questions and answers, please comment below. Here is how the architecture of RDD looks like: RDDs are created by either transformation of existing RDDs or by loading an external dataset from stable storage like HDFS or HBase. RDDs are immutable, fault-tolerant, distributed collections of objects that can be operated on in parallel.RDD’s are split into partitions and can be executed on different nodes of a cluster. Resilient Distributed Datasets are the fundamental data structure of Apache Spark. What is the significance of Resilient Distributed Datasets in Spark?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |