Wednesday, January 1, 2020

BigData Pig Concepts

BigData PIG Concepts

- It is a Data flow language.
- PIG allows us to describe how datasets are filtered, combined, split and delivered from source to final destination.
- PIG commands are internally translated into MapReduce jobs.
  • LOAD
  • FILTER
  • FOREACH ... GENERATE
  • SPLIT
  • GROUP
  • JOIN
- PIG will work on any source of tuples. (Not like SQL where schema specified before data load)
- PIG can use complex, nested data structures.
- PIG supports Streaming of data, bulk read + writes. It also describes a series of operations.

- PIG script defines a logical plan for executing a workflow. No actual data is read until execution time.

PIG data structures:


















Relation(DB) Concepts :


















Relation Operations: Relational algebra contains a set of operations that transform one or more relations into other relations.

  • Selection
  • Projection
  • Cartesian Product
  • Extended Projection
  • Aggregation



Query Optimizer: A relational database Query Optimizer considers all the operations needed to produce the result, and finds the most efficient plan to compute it.






















No comments:

Post a Comment