The BSE Engine
==============

The following gives an outline of the conceptual approach to
simulate flow-system behavior and carries details of the
implementation along.

Introductionary remarks:
------------------------
The BSE Engine simulates signal flow systems by mapping them
onto a network of signal processing modules which are connected
via signal streams.
The signal streams are value quantized and time discrete, where
floats are used to store the quantized values and the samples are
taken at arbitrary but equidistant points in time. Also, the
sampling periods are assumed to be synchronous for all nodes.
In the public BSE C API, engine modules are exposed as BseModule
structures, for the internal engine implementation, each
BseModule is embedded in an EngineNode structure.

Node Model:
-----------
* a node has n_istreams input streams
* a node has n_jstreams joint input streams facilities, that is,
  an unlimited number of output streams can be connected to each
  "joint" input stream
* a node has n_ostreams output streams
* all streams are equally value quantized as IEEE754 floats,
  usually within the range -1..1
* all streams are synchronously time discrete
* the flow-system behavior can be iteratively approximated
  by calculating a node's output streams from its input streams
* since all streams are equally time discrete, n output
  values for all output streams can be calculated from n input
  values at all input streams of a single network
* some nodes always react delayed ("deferred" nodes) and can
  guarantee that they can always produce n output values ahead
  of receiving the corresponding n input values, with n>=1
* a node that has no output facilities (n_ostreams==0) or
  has none of its output streams connected and is flagged by
  the user as a "consumer node" must be processed

Node Methods:
-------------
->process()
  This method specifies through one of its arguments
  the number of iterations the node has to perform,
  and therefore the number of values that are supplied
  in its stream input buffers and which have to be supplied
  in its stream output buffers.
->process_deferred()
  This method specifies the number of input values supplied
  and the number of output values that should be supplied.
  The number of input values may be smaller than the number
  of output values requested, in which case the node may return
  less output values than requested.
->reset()
  The purpose of this method is to reset local state kept in
  a node to the initial state it possessed before the first call
  to process() or process_deferred().

Node Relationships:
-------------------
Node B is an "input" of node A if:
  * one of A's input streams is connected to one of B's output streams,
  or
  * node C is an "input" of A and B is an "input" of C

Processing Order:
-----------------
If node A has an input node B and A is not a deferred node, B has to
be processed prior to processing A.

Connection Cycles:
------------------
Nodes A and B "form a cycle" if A is an input to B and B is an
input to A.

Invalid Connections:
--------------------
For nodes A and B (not necessarily distinct) which form a cycle,
the connections that the cycle consists of are only valid if
the following is true:
  (C is a deferred node) and
  (C==A or C==B or (if C is completely disconnected, the nodes
  A and B do not anymore form the cycle))

Implementation Notes
====================
* if a node is deferred, all output channels are delayed
* independent leaf nodes (nodes that have no inputs) can be
  scheduled separately
* nodes contained in a cycle have to be scheduled together

Scheduling Algorithm
--------------------
To schedule a consumer and its dependency nodes, schedule_query() it:

Query and Schedule Node:
* tag current node
* ignore scheduled input nodes
* schedule_query_node on untagged input nodes, then do one of:
  * schedule input node (if it has no cycles)
  * resolve all input nodes cycles and then schedule
    the input nodes cycle (if not self in cycle)
  * take over cycle dependencies from input node
* a tagged node is added to the precondition list (opens new cycle)
* own leaf level is MAX() of input node leaf-levels + 1
* untag node

Resolving Cycles:
* at each scheduling stage, eliminate immediate child from
  precondition list; once the list is empty the cycle is resolved
* at least one node being eliminated has to be deferred
  for the cycle to be valid

Scheduling:
* nodes need to be processed in the order of leaf-level
* within leaf-levels, processing is determined by a per-node
  processing costs hint (cheap, normal, expensive)

Suspending and sample accurate activation
-----------------------------------------
In music synthesis practice, large branches of a flow-graph are
often unused, e.g. a certain branch might make up a logical voice
which is muted for long times. To save processing power, nodes
can be suspended and resumed at arbitrary times (not necessarily
block boundaries). Being suspended causes a node's processing method
to be skipped and its output buffers to be filled up with zeros.
Ordinary connection/disconnection of nodes is not sufficient to
fullfill this purpose, because connection changes can only happen
at block boundaries, while sample accurate timing is required
(for voice activation), and nodes which hold internal state usually
need to be reset (by means of calling their reset() method) upon
resumption.
Suspending a node automatically suspends its input nodes unless
they have outputs connected to other nodes which are not suspended.
Due to the structural nature of connections within branches
(connections may form cycles or rhomboids of nodes), two propagation
passes are required to propagate the activation (suspend) time
through a branch.
In order to determine the activation time of a node for which an output
node is known to have been suspended (this output node doesn't need to 
be directly connected), all directly connected output nodes need to be
examined and must contain updated activation time stamps already.
It is thus mandatory to maintain information about the validity of a
node's activation time stamp which results in the requirement for
a second recursion pass.
Upon the first pass, all possibly affected nodes, these are all inputs
of the node which changed activation time, are flagged to require
updates of their activation time.
During the second pass, a node's activation time is determined from the
activation time of the nodes connected to its outputs (which may first
need updating themselves).
Various optimizations are possible during the two recursion passes,
e.g. by allowing suspending only at block boundaries or by integrating
one recursion pass with other recursive tasks on the graph (scheduling).
Nodes should not be activated or suspended due to a cyclic connection
to one of their outputs, so to determine the activation time of a node
cyclic connection paths must be ignored.

Virtual modules
---------------
In order to ease implementations of object networks which facilitate
the engine for actual calculation, virtual modules are supported.
Virtual modules are not scheduled and thus don't consume processor
cycles during calculation phase. Their input/output streams are
mere reconnection points for streams of real processing modules.
As virtual modules are not part of the active calculation schedule,
flow and suspend jobs may not be queued on them, but input recursive
suspend/activation propagation (and thus indirect resumption)
propagates correctly across virtual modules (though not seperately
along their input/output streams).
To let activation time propagate per-stream, multiple virtual
modules can be used.

Deferred Node Implementation Considerations:
--------------------------------------------
For deferred nodes, the number n specifying the amount of output
values that are produced ahead of input can be considered
mostly-fixed. That is, it's unlikely to change often and will do
so only at block boundaries.
Supporting n to be completely variable or considering it mostly
fixed has certain implications. Here're the considerations that
led to supporting a completely variable n for the implementation:

n is block-boundary fixed:
+ for complex cycles (i.e. cycles that contain other cycles,
  "subcycles"), the subcycles can be scheduled separately
  if the n of the subcycle is >= block_size
- if n is the only thing that changed at a block-boundary,
  rescheduling the flow-graph is required in the cases
  where n = old_n + x with old_n < block_size or if x < 0
- deferred nodes can not change their delay in response to
  values of an input stream

n is variable for every iteration step:
+ no rescheduling is required if n changes at block-boundary
- subcycles can not be scheduled separately from their outermost
  cycle
+ the delay of deferred nodes can correlate to an input stream

Threads, communication, main loops
==================================

Thread types:
* UserThread; for the scope of the engine (the functions exposed in
  bseengine.h), only one user thread may execute API functions
  at a time.
  i.e. if more than one user thread need to call engine API
  functions, the user has to take measures to avoid concurrency
  in calling these functions, e.g. by using a SfiMutex which is
  to be locked around engine API calls.
* MasterThread; the engine, if configured accordingly,
  sets up one master thread which
  - processes transactions from the UserThread
  - schedules processing order of engine modules
  - processes single modules when required
  - processes module cycles when required
  - passes back processed transactions and flow jobs to the
    UserThread for garbage collection.
* SlaveThread; the engine can be configured to spawn slave threads which,
  in addition to the master thread
  - process single modules when required
  - process module cycles when required.

Communication at thread boundaries:
* Job transaction queue; the UserThread constructs job transactions and
  enqueues them for the MasterThread. The UserThread also dequeues already
  processed transactions, in order for destroy functions of modules and
  accessors to only be executed within the UserThread.
  Also, the UserThread can wait (block) until all pending transactions
  have been processed by the MasterThread (in order to sync state with
  module network contained in the engine).
* Flow job collection list; the MasterThread adds processed flow jobs into
  a collection queue, the UserThread then collects the queued flow jobs
  and frees them.
* Module/cycle pool; the MasterThread fills in the module/cycle pool with
  modules which need to be processed. The MasterThread and the SlaveThreads
  pop modules/cycles from this pool, process them, and push back processed
  nodes.
* load control; // FIXME

Main loop integration:
in order to process certain engine modules only from within
the UserThread and to drive the engine even without master
or slave threads, the engine can be hooked up to a main loop
mechanism supplied by the UserThread.
The engine provides API entry points to:
- export file descriptors and timeout, suitable for main loop backends
  such as poll(2)
- check whether dispatching is necessary
- dispatch outstanding work to be performed by the engine

TODO:
=====
- self-input cycles need to be resolved in parent as well
- needs description of pollfd/callback jobs
- load control for slave threads

Nov 27 2004	Tim Janik
	* reflect renames in the code base
Sep 06 2003	Tim Janik
	* introduce time stamps indicating suspend state
	  of a node (activation time)
	* update TODO
Jul 15 2002	Tim Janik
	* describe virtual modules
Jun 30 2002	Tim Janik
	* describe reset()
	* describe suspension
	* fix consumer node definition
	* TODO cleanups
Jan 07 2002	Tim Janik
	* cosmetic updates, flow jobs
Aug 19 2001	Tim Janik
	* notes on threads, communication, main loops
Jul 29 2001	Tim Janik
	* wording/spelling fixups
May 05 2001	Tim Janik
	* initial writeup

LocalWords:  BSE API BseModule EngineNode istreams ostreams A's B's sync
LocalWords:  bseengine SfiMutex UserThread MasterThread SlaveThread SlaveThreads