Streaming vs Memoization

Streaming vs Memoization: A Deep Dive

Reactive programming revolves around the capability to subscribe to changes. Think of a reactive program as something that calculates values, and you want to observe and know how these values change when the inputs they use change. This whole concept differs slightly from incremental computing, which focuses on minimal re-computation upon any change. However, these two approaches often work in tandem. If you’re striving for efficient reactive programming, you’ll likely need to embrace some form of incremental computing. Why? Because the most straightforward reactive model could re-evaluate everything upon a change – and that’s not particularly efficient. Thus, one cannot talk about reactive programming without worrying about performance.

There are two prevalent methodologies or “schools of thought” in implementing reactive programming:

Streaming

Think of this as a network where updates or changes (visualize them as streams of water in interconnected pipes) drive computations. Essentially, when an input receives a change, it emits an update that travels through these “pipes”, causing other dependent computations to adjust accordingly. Streaming is eager; it immediately updates.

Memoization

With memoization, computations are stored or “cached”. When the program requires a value, it first checks if it’s been previously computed and cached. If not, it calculates it. Memoization drives computation based on the need for specific values. It’s more of a lazy approach, using stored values when required and updating them based on changes.

Which one should you choose?

When do you employ one over the other? In reactive programs that are complex enough, you’ll likely need a mix of both. Streaming, being eager, is quintessential for tasks where immediate updates are needed, like constructing an index (what’s the point in building an index lazily?). Conversely, for processes like determining a transitive closure (i.e., a set of all reachable nodes), the lazy, cached (or memoized) approach is more efficient.

At SkipLabs, all of our reactive programs are written in SKIP. And SKIP defines a Reactive Store called SKStore to structure the graph of computation. SKStore has two types of directories: ‘eager’ and ‘lazy’. Eager directories are the moral equivalent of streaming; they update quickly but take up more memory, like fast-acting storage spaces. Lazy directories, on the other hand, work like memoization; they update slower, waiting until you ask for something, and use less memory because they can “forget” values that aren’t needed.

Let’s take an example. First, we will need an input directory. All input directories are eager (by definition).

  // Keys are integers and values also
  myInput = SKStore.mkdir(
    context,
    SKStore.DirName::create("/input/")
  );

Now, we can create an index out of this input directory, using the “map” method. The result is an eager directory called /index/.

  // To build an index we need an eager (streaming) approach
  index = myInput.map(
    SKStore.IID::keyType,     // The type of the keys
    SKStore.IntFile::type,    // The type of the values
    context,
    SKStore.DirName::create("/index/"),
    (context, writer, key, values) ~> {
      for(value in values) {
        writer.write(
  	SKStore.IID(value.value),
  	SKStore.IntFile(key.value));
      }
    }
  );

That went well, because we want the data in myInput to “drive” the computation. When myInput changes, we want the index to update accordingly. But that is not always going to be the case. Some times, it is much more convenient to let the values drive the computation. For example, to compute the transitive closure of the key/value pairs involved. What would the code look like with an eager directory? It would be rather unnatural. You would have to keep on iterating to find a fixpoint.

The code is a lot more natural using a lazy directory. Because they let us look up the objects we need directly.

  // To build the transitive closure, lazy is better
  transitiveClosure = LHandle::create(
    SKStore.IID::keyType,     // The type of the keys
    SKStore.IntFile::type,    // The type of the values
	context,
	SKStore.DirName::create("/transClosure/"),
	(context, self, key) ~> {
	  assocs = myInput.getArray(key);
	  result = mutable Vector[];
	  for(file in assocs) {
	    result.extend(self.getArray(file.value));
	  };
	  result.toArray();
	}
  );

Conclusion

In the world of reactive programming, there’s a distinct difference between ‘eager’ (streaming) and ‘lazy’ (memoization). Eager directories are primarily driven by updates, meaning they actively respond and adjust to changes in their input directories. Think of them as always on their toes, ready to adapt the moment something changes. On the other hand, lazy directories, instead of being proactive, react only when a user specifically requests a value. This distinction helps in optimizing memory usage and responsiveness based on the specific needs of the system.

In simple words, a reactive program is a program that changes when its inputs do. With SKStore, programmers can mix both streaming and memoization methods, using the same abstraction, to build fast and effective systems.

Streaming vs Memoization

Streaming vs Memoization