BigData PIG Concepts
- It is a Data flow language.
- PIG allows us to describe how datasets are filtered, combined, split and delivered from source to final destination.
- PIG commands are internally translated into MapReduce jobs.
- PIG can use complex, nested data structures.
- PIG supports Streaming of data, bulk read + writes. It also describes a series of operations.
- PIG script defines a logical plan for executing a workflow. No actual data is read until execution time.
PIG data structures:
Relation(DB) Concepts :
Relation Operations: Relational algebra contains a set of operations that transform one or more relations into other relations.
Query Optimizer: A relational database Query Optimizer considers all the operations needed to produce the result, and finds the most efficient plan to compute it.
- It is a Data flow language.
- PIG allows us to describe how datasets are filtered, combined, split and delivered from source to final destination.
- PIG commands are internally translated into MapReduce jobs.
- LOAD
- FILTER
- FOREACH ... GENERATE
- SPLIT
- GROUP
- JOIN
- PIG can use complex, nested data structures.
- PIG supports Streaming of data, bulk read + writes. It also describes a series of operations.
- PIG script defines a logical plan for executing a workflow. No actual data is read until execution time.
PIG data structures:
Relation(DB) Concepts :
Relation Operations: Relational algebra contains a set of operations that transform one or more relations into other relations.
- Selection
- Projection
- Cartesian Product
- Extended Projection
- Aggregation
Query Optimizer: A relational database Query Optimizer considers all the operations needed to produce the result, and finds the most efficient plan to compute it.
No comments:
Post a Comment