Chapter 10 - Batch Processing Questions

Easy Question

1. What is a key characteristic of batch processing systems?

A. They process data interactively as it arrives

B. They perform large-scale computations on a dataset in discrete jobs

C. They require real-time dashboards for each step

D. They must respond to user queries in milliseconds

Medium Question

2. Which framework popularized the batch processing model of "map" and "reduce" functions?

A. Apache Spark

B. MapReduce (Hadoop)

C. Apache Flink

D. Apache Kafka

Hard Question

3. What advantage does a DAG (Directed Acyclic Graph) based execution engine have over traditional MapReduce?

A. It can only run on a single machine

B. It supports iterative processing and more complex data flows

C. It prevents any form of shuffle or partition

D. It eliminates the need for any distributed file system

Very Hard Question

4. In an incremental batch processing workflow, what is a key challenge?

A. Ensuring the entire dataset is always processed from scratch

B. Managing partial updates and ensuring consistency between old and new data

C. Guaranteeing no data volume changes

D. Only allowing a single job to run at a time