Hadoop Ecosystem

Get Started. It's Free
or sign up with your email address
Hadoop Ecosystem by Mind Map: Hadoop Ecosystem

1. Non-relational

2. Maps query onto nodes

3. Coordinator jobs are recurrent Oozie Workflow jobs that are triggered by time and data availability.

4. Reduces aggregated results into answers

5. Links jobs

5.1. Workflow processing

6. Bundle provides a way to package multiple coordinator and workflow jobs and to manage the lifecycle of those jobs

6.1. Connects non-Hadoop stores (RDBMS)

6.2. Moves data to & from RDBMS to Hadoop

7. Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions to execute. The Workflow job has to wait

8. Hive

8.1. SQL-like querying

8.2. Combiner can be used to optimize reducer performance

8.3. Structured data warehousing

8.4. Partition columns instead of indexes

9. Pig

9.1. Scripting for Hadoop

10. HBase

10.1. Column store

10.2. Transactional lookups

11. Flume

11.1. Log collector

11.2. Integrates into Hadoop

12. Oozie

13. Avro

13.1. Data parsing

13.2. Binary data serialization

13.3. RPC

13.4. language-neutral

13.5. optional codegen

13.6. schema evolution

13.7. untagged data

13.8. dynamic typing

14. Mahout

14.1. Machine learning

14.2. Applied to MR

15. Sqoop

15.1. Autogens Java InputFormat code for data access

16. MapReduce

16.1. Distributed compute

17. Ambari

17.1. Cluster deployment and admin

17.2. Driven by Hortonworks

18. ZooKeeper

18.1. Coordinator of shared state between apps

18.2. Naming, configuration, and synchronization services

19. YARN

19.1. cluster management

19.2. Hadoop 2

19.3. resource manager

19.4. job scheduler

20. BigTop

20.1. Package Hadoop ecosys

20.2. Test Hadoop ecosys package

21. Related Apache Ecosystems

22. HDFS

22.1. Distributed storage

23. Spark

24. Impala

24.1. SQL query egnine

24.2. Query data stored in HDFS and HBase

24.3. Real time

25. Cascading

25.1. Higher abstraction from MR

25.2. Creates Flow that assembles Map/Reduce jobs