Designing Data Intense Application – Chapter 12: The Future of Data Systems

If a thing be ordained to another as to its end, its last end cannot consist in the preservation of its being. Hence a captain does not intend as a last end, the preservation of the ship entrusted to him, since a ship is ordained to something else as its end, viz. to navigation. (Often…

Designing Data Intense Application – Chapter 11: Stream Processing

A complex system that works is invariably found to have evolved from a simple system that works. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. —John Gall, Systemantics (1975) Batch process is under the assumption all the data is Bounded, which…

Designing Data Intense Application – Chapter 10: Batch Processing

A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. —Donald Knuth Three different types of systems: Services (online systems): Response time is usually the primary measure…

Designing Data Intense Application – Chapter 9: Consistency and Consensus

Is it better to be alive and wrong or right and dead? —Jay Kreps, A Few Notes on Kafka and Jepsen (2013)  In this chapter, we will talk about some examples of algorithms and protocols for building fault-tolerant distributed systems. We will assume that all the problems from Chapter 8 can occur:  packets can be…

Designing Data Intense Application – Chapter 8: The Trouble with Distributed Systems

This chapter is a thoroughly pessimistic and depressing overview of things that may go wrong in a distributed system. Networks issues Clocks & timing issues Faults and Partial Failures Single machine software is deterministic; An individual computer with good software is usually either fully functional or entirely broken, but not something in between. In distributed…

Designing Data Intense Application – Chapter 7:Transactions

Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions. —James Corbett…