Toll Free 1800-123-321-5

Big Data Architecture Workshop

Big data took the IT industry by storm in 2012. Enterprises are now looking to leverage the big data environment require Big Data Architect who can design and build large-scale development and deployment of Hadoop applications. Market research has indicated that the market for Big data frameworks like Hadoop will grow at a CAGR off 58% and will be worth $1 BN by 2022. IIHT’s Big Data Architecture Workshop is designed to transform course participants into skilled, qualified Big Data Architects. The training program enables them to master different Big Data technologies like Hadoop, Kafka, and Impala. The objective of this workshop is to bring a multitude of technical contributors together to design and architect solutions for complex business challenges.

Understand how enterprises set up and leverage Big Data clusters

Master concepts of Big Data Architect like Clusters, Scalability, Configuration

Master concepts of Big Data Architect like Clusters, Scalability, Configuration

Print Friendly, PDF & Email


Candidates attending this course I required to have basic knowledge of APIs and SQL


Course Content


Workshop Application Use Cases

  • Architectural questions

Application Vertical Slice

  • Definition
  • Minimizing risk of an unsound architecture
  • Selecting a vertical slice

Application Processing

  • Real time, near real time processing
  • Batch processing
  • Data access patterns
  • Delivery and processing guarantees
  • Machine Learning pipelines

Application Data

  • Three V’s of Big Data
  • Data Lifecycle
  • Data Formats
  • Transforming Data


 Scalable Applications

  • Scale up, scale out, scale to X
  • Determining if an application will scale
  • Poll: scalable airport terminal designs
  • Hadoop and Spark Scalability

Fault Tolerant Distributed Systems

  • Principles
  • Transparency
  • Hardware vs. Software redundancy
  • Tolerating disasters
  • Stateless functional fault tolerance
  • Stateful fault tolerance
  • Replication and group consistency
  • Fault tolerance in Spark and Map Reduce
  • Application tolerance for failures

Security and Privacy

  • Principles
  • Privacy
  • Threats
  • Technologies


  • Cluster sizing and evolution
  • On-premise vs. Cloud
  • Edge computing

Technology Selection

  • HDFS
  • HBase
  • Kudu
  • Relational Database Management Systems
  • Map Reduce
  • Spark, including streaming, SparkSQL and SparkML
  • Hive
  • Impala
  • Cloudera Search
  • Data Sets and Formats

Software Architecture

  • Architecture artifacts
  • One platform or multiple, lambda architecture
  • Team activity: produce high level architecture, selected technologies, revisit vertical slice
  • Vertical Slice demonstration