Designing Data Intense Application – Chapter 12: The Future of Data Systems

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> If a thing be ordained to another as to its end, its last end cannot consist in the preservation of its being. Hence a captain does not intend as a last end, the preservation of the ship entrusted to him, since a ship…

Designing Data Intense Application – Chapter 11: Stream Processing

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> A complex system that works is invariably found to have evolved from a simple system that works. The inverse proposition also appears to be true: A complex system designed from scratch never works and cannot be made to work. —John Gall, Systemantics (1975)…

Designing Data Intense Application – Chapter 10: Batch Processing

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> A system cannot be successful if it is too strongly influenced by a single person. Once the initial design is complete and fairly robust, the real test begins as people with many different viewpoints undertake their own experiments. —Donald Knuth Three different types…

Designing Data Intense Application – Chapter 9: Consistency and Consensus

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> Is it better to be alive and wrong or right and dead? —Jay Kreps, A Few Notes on Kafka and Jepsen (2013)  In this chapter, we will talk about some examples of algorithms and protocols for building fault-tolerant distributed systems. We will assume…

Designing Data Intense Application – Chapter 8: The Trouble with Distributed Systems

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> This chapter is a thoroughly pessimistic and depressing overview of things that may go wrong in a distributed system. Networks issues Clocks & timing issues Faults and Partial Failures Single machine software is deterministic;  An individual computer with good software is usually either…

Designing Data Intense Application – Chapter 7:Transactions

<Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems> Some authors have claimed that general two-phase commit is too expensive to support, because of the performance or availability problems that it brings. We believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks…