刘旭的个人网站

Kafka relies heavily on the filesystem for storing and caching messages. There is a general perception that “disks are slow” which make people skeptical that a persistent structure can offer competitive performance. In fact disks are both much slower and much faster than people expect depending on how they are used; and a properly designed disk structure can often be as fast as the network.

A modern operating system provides read-ahead and write-behind techniques that prefectch data in large block multiples and group smaller logical writes into large physical writes. Modern operating systems have become increasingly aggressive in their use of main memory for disk caching.

Kafka persistent queue is built on simple reads and appends to files as is commonly the case with logging sulutions. This structure has the advantage that all operations are O(1) and reads do not block writes or each other. This has obvious performance advantages since the performance is completely decoupled from the data size.

Having access to virtually unlimited disk space without any performance penalty means that we can provide some features not usually found in a messaging system. For example, in Kafka, instead of attempting to delete messages as soon as they are consumed, we can retain messages for a relatively long period. This leads to a great deal of flexibility for consumers.