
1. Lambda
1.1. Twitter Summingbird
1.2. Lambdoop
2. Architectures
3. Distributed Framework Manager
3.1. Large-Scale Datatransfer
3.1.1. DistCp
3.2. Scheduling Type
3.2.1. monolithic
3.2.2. two-level
3.2.3. shared state
3.3. Software
3.3.1. Apache Mesos
3.3.1.1. Scheduling
3.3.1.2. Monitoring
3.3.2. Apache Ambari
3.3.2.1. Monitoring
3.3.2.2. Manage Cluster
3.3.2.3. Automated Deployment
3.3.3. Ganglia
3.3.3.1. Monitoring
3.3.4. Ooyala Spark Job-Server
3.3.5. Google Kubernetes
4. Configuration Management
4.1. Software
4.1.1. Apache Zookeeper
5. Core
5.1. Distributed Filesystem
5.1.1. HDFS
5.2. Scheduling Big Data Jobs
5.2.1. Yarn
5.2.1.1. Map-Reduce
5.2.2. Job Schedule Manager
5.2.2.1. Apache Reef
6. Search
6.1. Solr
7. Relational Databases
7.1. General
7.1.1. Cloudera Impala
7.2. Warehouse (OLAP)
7.2.1. Apache Hive
7.2.1.1. Apache Tajo
7.2.1.2. Spark SQL
7.3. Query Language
7.3.1. Hive-QL
7.3.2. SQL
7.4. Read-Only/Low Latency
7.4.1. SploutSQL
7.5. Transactions
7.5.1. Row-Based ACID
7.5.2. ACID
7.5.3. Eventually Consistent
7.5.4. Eventually Durable
7.6. Interfaces
7.6.1. Software
7.6.1.1. Apache Thrift
7.6.2. JDBC
7.6.3. ODBC
7.7. Transactional
7.7.1. Splice
7.7.2. Stinger.next/Apache Hive
8. Event Processing
8.1. Spark Streaming
9. Use Cases
9.1. Large-Scale Logging and Failure Analysis
9.1.1. Apache Chukwa
9.2. Predictive Maintenance
9.3. Personalized Advertisement
9.4. Master Data Management
9.5. Preference Learning
9.6. Gamification
9.7. Business Warehouse
10. NoSQL
10.1. Data Storage Type
10.1.1. Key/Value
10.1.1.1. Apache HBase
10.1.1.1.1. SQL
10.1.1.2. Apache Accumulo
10.1.2. Graph
10.1.2.1. Neo4J
10.1.2.2. Apache Giraph
10.1.2.2.1. Bagel
10.1.2.3. GraphX
10.1.3. Columnar
10.1.3.1. Parquet (Storageformat)
10.1.3.2. Apache Drill
10.1.3.3. Apache Cassandra
10.1.4. GIS
10.1.4.1. GIS Tools for Hadoop
10.1.4.2. Spatial Hadoop
10.2. Query Language
10.2.1. Apache Pig
11. Processing Paradigm
11.1. Batch-Processing
11.1.1. Map-Reduce
11.1.2. TEZ
11.1.3. Spark
11.2. Stream-Processing(Realtime)
11.2.1. Software
11.2.1.1. Apache Spark
11.2.1.2. Apache Flink
11.3. Integration of both
11.3.1. Software
11.3.1.1. Twitter Summingbird
11.4. in-memory
11.4.1. Apache Tez
11.4.2. Apache Tachyon
11.4.3. Apache Ignite
11.4.4. Apache Flink
11.5. Libraries
11.5.1. Apache Crunch
12. Statistical Analytics/Machine Learning
12.1. Software
12.1.1. RHadoop
12.1.2. RHipe
12.1.3. Apache Mahout
12.1.4. SparkR
12.1.5. Apache Hama
12.1.6. mllib
12.1.7. Weka (distributedWekaHadoop)
12.1.8. DDF.io - Distributed Data Frame
12.1.9. Kepler
12.2. Languages
12.2.1. R
12.2.2. Java
12.2.3. Python
12.3. GUI
12.3.1. Browser
12.3.1.1. RStudioWeb
12.3.1.2. Cloudera Hue
12.3.1.3. Apache Zeppelin
12.4. in-database analytics
12.4.1. hivemall
13. Alternatives
13.1. Event-Processing
13.1.1. Apache Storm
14. Managing Environments
14.1. Software
14.1.1. Puppet
14.1.2. Chef
14.1.3. Google Kubernetes
14.2. Software Container
14.2.1. Software
14.2.1.1. Docker
14.3. Deploy
14.3.1. Software
14.3.1.1. Apache Slider
15. Cloud Manager
15.1. Software
15.1.1. Apache Delta Cloud
15.1.2. Ubuntu Juju
15.1.3. Apache Whirr
15.1.4. Cloudera Cloud Manager
15.1.5. OpenStack
15.1.5.1. Apache Savanna
16. Data Import/Export
16.1. Software
16.1.1. Apache Flume
16.1.2. Apache Sqoop
17. Reporting
17.1. Software
17.1.1. R
17.1.1.1. MarkDown
17.1.1.2. Knit
18. Workflows
18.1. Software
18.1.1. Apache Oozie
18.1.1.1. Apache Falcon
18.1.2. Apache Flink (Stratosphere)
18.1.3. Spotify Luigi
18.2. Run-time / Query Optimization
18.3. Data transformation
19. Packaging/Distribution
19.1. Cloud
19.1.1. Amazon Elastic MapReduce (EMR)
19.1.2. Microsoft Azure HDInsight
19.1.3. Google Compute Hadoop
19.1.4. Altiscale
19.2. On-Premise
19.2.1. MapR
19.2.2. Apache BigTop
19.2.3. HortonWorks
19.2.4. Microsoft HDInsight
19.2.5. Cloudera Enterprise
19.2.6. Buildoop
19.2.6.1. Lambda Architecture
20. Distributed File Systems
20.1. Windows Azure Blob Storage
20.2. CassandraFS
20.3. CephFS
20.4. CleverSafe Object Store
20.5. Google Cloud Storage Connector
20.6. ClusterFS
20.7. GridGrain
20.8. Lustre
20.9. MapR FileSystem
20.10. OrangeFS
20.11. Quantcast File System
20.12. Symantec Veritas Cluster File System
20.13. Amazon S3
21. Security
21.1. Cluster
21.1.1. Software
21.1.1.1. Apache Knox
21.2. Data
21.2.1. Authorization
21.2.1.1. Software
21.2.1.1.1. Apache Sentry
22. Messaging
22.1. Software
22.1.1. Apache Kafka
22.1.1.1. Apache Samza
22.1.2. Akka
23. System Tools
23.1. JVM Garbage Collection
23.1.1. GCViewer
23.2. HDFS live Statistics
23.2.1. Twitter HDFS Du
23.3. Disk Image Analytics
23.3.1. HDFS FSImage
23.4. UserMonitor
23.4.1. LinkedIn White Elephant
23.5. MapReduce Monitor
23.5.1. Twitter Hraven