Elastic MapReduce Essentials:

  • Amazon EMR is a service which deploys out EC2 instances based off of the Hadoop big data framework.
  • EMR is used to analyze and process vast amounts of data.

  • EMR also supports other distributed frameworks, such as

    • Apache Spark
    • HBase
    • Presto
    • Flink

General EMR Workflow

  • Data stored in S3, DynamoDB, or Redshift is sent to EMR.
  • The data is mapped to a "cluster" of Hadoop Master/Slave nodes for processing.
  • Computations (code/created by the developer) are used to process the data.
  • The processed data is then reduced to a single output set of return information.

Other Important EMR Facts:

  • You (the admin) has the ability to access the underlying operating system.
  • You can add user data to EC2 instances launched into the cluster via bootstrapping.
  • EMR takes advantage of parallel processing for faster processing of data.
  • You can resize a running cluster at any time, and you can deploy multiple cluster.

EMR Slave Nodes:

  • There are two types of slave nodes:

    • Core node:
      • A slave node has software components which run tasks AND stores data in the Hadoop Distributed File System (HDFS) on your cluster.
      • The core nodes do the "heavy lifting" with the data.
    • Task node:
      • A slave node that has software components which only run tasks.
      • Task nodes are optional.

EMR Map Phase:

  • Mapping is a function that defines the processes which splits the large data file for processing.
  • During the mapping phase, the data is split into 128 MB "chunks".
  • The larger the instance size used in our EMR cluster, the more chunks you can map and process at the same time.
  • If there are more chunks than nodes/mappers, the chunks will queue for processing.

EMR Reduce Phase:

  • Reducing is a function that aggregates the split data back into one data source.
  • Reduced data needs to be stored (in a service like S3) as data processed by the EMR cluster is not persistent.

results matching ""

    No results matching ""