Designing Data Intense Application – Chapter 10: Batch Processing

A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. —Donald Knuth Three different types of systems: Services (online systems): Response time is usually the primary measure…

Designing Data Intense Application – Chapter 9: Consistency and Consensus

Is it better to be alive and wrong or right and dead? —Jay Kreps, A Few Notes on Kafka and Jepsen (2013)  In this chapter, we will talk about some examples of algorithms and protocols for building fault-tolerant distributed systems. We will assume that all the problems from Chapter 8 can occur:  packets can be…

Designing Data Intense Application – Chapter 8: The Trouble with Distributed Systems

This chapter is a thoroughly pessimistic and depressing overview of things that may go wrong in a distributed system. Networks issues Clocks & timing issues Faults and Partial Failures Single machine software is deterministic; An individual computer with good software is usually either fully functional or entirely broken, but not something in between. In distributed…

Designing Data Intense Application – Chapter 7:Transactions

Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. —James Corbett…

Designing Data Intense Application – Chapter 6: Partitioning

The main reason for wanting to partition data is scalability. For very large datasets, or very high query throughput, that is not sufficient: we need to break the data up into partitions, also known as sharding. What we call a partition here is called a shard in MongoDB, Elasticsearch, and SolrCloud; it’s known as a…

Designing Data-Intensive Applications – Chapter 5: Replication

The major difference between a thing that might go wrong and a thing that cannot possibly go wrong is that when a thing that cannot possibly go wrong goes wrong it usually turns out to be impossible to get at or repair. —Douglas Adams, Mostly Harmless (1992) Replication means keeping a copy of the same…