# Documentation for pyadjoint¶

pyadjoint is a generic algorithmic differentiation framework. The aim of this page is to introduce the reader to how pyadjoint works and how it can be used. It can be useful to also have a look at the pyadjoint API. We stress that this is not an algorithmic differentiation tutorial. For an introduction to algorithmic differentiation see for example the book by Naumann

## The core idea¶

In general, we will be interested in differentiating some output vector \(y \in \mathbb{R}^m\) with respect to some input vector \(x \in \mathbb{R}^n\). Assuming we can decompose the forward computations into

then the derivative of the output in direction \(\hat{x}\) is

for some intermediate solutions \(w_i, i = 1, 2, ..., k-1\). In reverse mode algorithmic differentiation, we are interested in the adjoints:

for some weights \(\bar{y} \in \mathbb{R}^m\). In order to compute the adjoints we must somehow remember all the operations performed in the forward computations, and either remember or recompute the intermediate variables.

In pyadjoint, the `Tape`

class is responsible for remembering these operations.
The operations are stored in a `list`

, and can be thought of as nodes in a directed acyclic graph.
The `Tape`

objects thus offer ways to do graph manipulations and visualisations.

The operations or nodes are represented by instances of subclasses of `Block`

.
These subclasses implement operation-specific methods, while the parent class, `Block`

,
implements common methods for use with the tape.
These `Block`

instances also keep track of their inputs and outputs,
which are represented by `BlockVariable`

instances.

For all this to be set up correctly, we require a way to populate the `Tape`

and instantiate `Block`

objects.
To achieve this, we create *overloaded functions* that replace original functions and
that are responsible for both of these tasks.
An overloaded function should, for the user, act exactly as the original function.

In addition, to deal with mutating data objects used as inputs/outputs,
all data types should have an overloaded counterpart.
The overloaded data-types should inherit from both the original data-type and the pyadjoint class `OverloadedType`

.
`OverloadedType`

ensures that the data-type has a `BlockVariable`

instance attached,
and declares some abstract methods for interaction with pyadjoint API functions that should be implemented by
the specific data-types.

The core classes of pyadjoint are thus `Tape`

, `Block`

, `BlockVariable`

and `OverloadedType`

.
We will now discuss each class individually, starting with `OverloadedType`

.

## OverloadedType¶

The pyadjoint user-API provides several useful functions that act on the tape.
For example, the function `taylor_test()`

for verifying implementation, and `minimize()`

in the optimization subpackage for
minimizing functionals. To allow these functions to work without any knowledge of the structure of the data-types,
some logic is moved into abstract methods of the `OverloadedType`

, and are expected to be implemented
for the individual data-types. At pyadjoint API you can see the individual abstract methods.
Some methods are more important than others, because some of the abstract methods are only required for specific functions,
while for instance `OverloadedType._ad_create_checkpoint()`

and `OverloadedType._ad_restore_at_checkpoint()`

are required for just working with the tape at all.

The `OverloadedType`

class also has a single attribute, `block_variable`

, which holds an instance of `BlockVariable`

.
In addition it defines the method `OverloadedType.create_block_variable()`

which sets `block_variable`

attribute to a new
`BlockVariable`

instance, and returns it. This is used when adding the data-type as an output to a block.
More information on that below.

To ensure that all pyadjoint specific methods are available, all data-type instances exposed to the end-user
must be converted to overloaded versions.
This is achieved through the `create_overloaded_object()`

function,
which combines a dictionary mapping original data-types to overloaded data-types, and the individually implemented
`OverloadedType._ad_init_object()`

method.

To populate the dictionary map, one must call `register_overloaded_type()`

.
This can conveniently be accomplished by using the function as a decorator when defining the overloaded data-type.
In that case, you must use `OverloadedType`

as the first base class, and the original data-type as second base class.
Apart from implementing the abstract methods, one must also remember to call the constructor `OverloadedType.__init__()`

in the overloaded data-type constructor.

## BlockVariable¶

To track intermediate solutions, pyadjoint employs the class `BlockVariable`

.
Storing `interm_sol = y`

does not guarantee that `interm_sol`

remains the same until the end
of the program execution if `y`

is a mutable type.
Thus, to ensure that the right values are kept, we create copies of the values used as input/output to operations.

Every time an instance of a data-type changes values, it should be assigned a new `BlockVariable`

.
Hence, `BlockVariable`

can be thought of as an identifier for a specific version of a specific data-type object.

The `BlockVariable`

class is quite simple.
It holds a reference to the `OverloadedType`

instance that created it, a checkpoint,
some attributes for storing values in adjoint and tangent linear sweeps, and some flags.

The checkpoint is a copy of the values of the data-type (`OverloadedType`

) instance.
It does not have to be an exact copy of the instance.
All that is required is that it is possible to restore an instance of the same type with the same values
at a later point.
This is implemented in the `OverloadedType._ad_create_checkpoint()`

and `OverloadedType._ad_restore_at_checkpoint()`

methods in the `OverloadedType`

class.
As an example, if a data-type was a function parameterised by a `float`

, then a checkpont only requires storing this `float`

,
and the restoring method can create the same function using the same parameter value.

The attribute `tlm_value`

holds the derivative direction for the forward mode AD sweep.
This should be an instance of the same type as the corresponding `OverloadedType`

instance.

The attribute `adj_value`

holds the computed adjoint derivative action (can be thought of as the gradient).
The type is in theory up to the block implementations, but to ensure compatibility across different blocks it should
be an array-like type, such as a numpy array.
Also it must be ensured that the choice of inner product is consistent between blocks.
Thus, it is recommended that all blocks employ the \(l^2\) inner product, i.e \((u, v)_{l^2} = u^Tv\)
where \(u, v \in \mathbb{R}^n\).
If the gradient with some other inner-product is desired, one can use Riesz representation theorem
in the `OverloadedType._ad_convert_type()`

method of `OverloadedType`

.

The attribute `hessian_value`

holds the computed hessian vector action.
This should have the same type as `adj_value`

.

## Block¶

Before we go into how Blocks are implemented, let us take a look at a basic implementation of an overloaded function.
Instead of using `overload_function()`

we manually define the overloaded function in a similar way that the
pyadjoint function would automatically do for you.

```
backend_example_function = example_function
def example_function(*args, **kwargs):
annotate = annotate_tape(kwargs)
if annotate:
tape = get_working_tape()
b_kwargs = ExampleBlock.pop_kwargs(kwargs)
b_kwargs.update(kwargs)
block = ExampleBlock(*args, **b_kwargs)
tape.add_block(block)
with stop_annotating():
output = backend_example_function(*args, **kwargs)
output = create_overloaded_object(output)
if annotate:
block.add_output(output.create_block_variable())
return output
```

Let us go line by line through this. First we store a reference to the original function,
then we start defining the overloaded function.
Since overloaded functions can take some extra keyword arguments, one should use varying length keyword arguments
in the function definition.
Then we pass the keyword arguments to the pyadjoint function `annotate_tape()`

.
This will try to pop the keyword argument *annotate* from the keyword arguments dictionary,
and return whether annotation is turned on. If annotation is turned on, we must add the operation to the tape.
We first fetch the current tape using `get_working_tape()`

, then we pop block-specific keyword arguments
and merge them with the actual keyword arguments. These are then used when we instantiate the block, which
in our example is `ExampleBlock`

. Then the block instance is added to the tape.

No matter if we annotate or not, we must run the original function.
To prevent the inner code of the original function to be annotated, we use the pyadjoint context manager `stop_annotating()`

.
After calling the original function, we convert the returned output to `OverloadedType`

.
Finally, if we are annotating then we create a new block variable for the output and add it as output of the block.

We now focus on the implementation of the block (`ExampleBlock`

in the case above).
The implementation of the constructor of the Block is largely up to the implementing user,
as the main requirement is that the overloaded function and the block constructor are on the same page
regarding how inputs/outputs are passed and what should be handled in the constructor and what is handled in the overloaded function.

For our example above, the constructor must first call the parent-class constructor, and also add the *dependencies* (inputs)
using the `Block.add_dependency()`

method. This method takes a block variable as input and appends it to a `list`

,
and thus it is important that all objects that are to be added to the dependencies should be an overloaded type.
Below we show an example of a block constructor.

```
class ExampleBlock(Block):
def __init__(self, *args, **kwargs):
super(ExampleBlock, self).__init__()
self.kwargs = kwargs
for arg in args:
self.add_dependency(arg.block_variable)
```

Note that not necessarily all arguments need to be dependencies. Only the inputs for which we wish to enable derivatives are strictly needed as dependencies.

Similarly to the dependencies, the output is also a list of block variables.
Although it is often not needed, we can obtain the list of dependencies or outputs using the `Block.get_dependencies()`

and `Block.get_outputs()`

methods.
It is important to note that the block only stores `BlockVariable`

instances in these lists, and that to get the real values you need to access attributes of the `BlockVariable`

.
For example, to restore the checkpoint and get the restored object, use `x = block_variable.saved_output`

.

The core methods of `Block`

that allow for recomputations and derivatives to be computed are
`Block.recompute()`

, `Block.evaluate_adj()`

, `Block.evaluate_tlm()`

and `Block.evaluate_hessian()`

.
These methods are implemented in the abstract `Block`

class, and by default delegate
to the abstract methods `*_component()`

(i.e `Block.evaluate_adj_component()`

).

We first inspect how `Block.recompute()`

works.
The core idea is to use dependency checkpoints to compute new outputs and overwrite the output checkpoints with these new values.
In the most basic form, the recompute method can be implemented as follows.

```
def recompute(self, markings=False):
x = self.get_dependencies()[0].saved_output
y = backend_example_function(x)
self.get_outputs()[0].checkpoint = y
```

Here we have assumed that there is only one real dependency, hence `self.get_dependencies()`

is a list of
length one. Similarly we assume that this is the only input needed to the original function, and that the
output is given explicitly through the return value of the original function. Lastly, we assume that the
block has only one output and thus the length of `self.get_outputs()`

is one.

The optional keyword argument `markings`

is set to `True`

when relevant block variables have been flagged.
In that case, the recompute implementation can do optimizations by not recomputing outputs that are not relevant for
what the user is interested in.

This unwrapping and working with attributes of `BlockVariable`

instances may seem unnecessarily complicated,
but it offers great flexibility.
The `Block.recompute_component()`

method tries to impose a more rigid structure,
but can be replaced by individual blocks by just overloading the `Block.recompute()`

method directly.

The following is an example of the same implementation with `Block.recompute_component()`

```
def recompute_component(self, inputs, block_variable, idx, prepared):
return backend_example_function(inputs[0])
```

Here the typically important variables are already sorted for you. `inputs`

is a list of the new input values
i.e the same as making a list of the `saved_output`

of all the dependencies.
Furthermore, each call to the `Block.recompute_compontent()`

method is only for recomputing a single output,
thus alleviating the need for code that optimizes based on block variable flags when `markings == True`

.
The `block_variable`

parameter is the block variable of the output to recompute, while the `idx`

is
the index of the output in the `self.get_outputs()`

list.

Sometimes you might want to do something once, that is common for all output recomputations.
For example, your original function might return all the outputs, or you must prepare the input in a special way.
Instead of doing this repeatedly for each call to `Block.recompute_component()`

,
one can implement the method `Block.prepare_recompute_component()`

. This method by default returns `None`

,
but can return anything. The return value is supplied to the `prepared`

argument of `Block.recompute_component()`

.
For each time `Block.recompute()`

is called, `Block.prepare_recompute_component()`

is called once and
`Block.recompute_component()`

is called once for each relevant output.

Now we take a look at `Block.evaluate_tlm()`

. This method is used for the forward AD sweep and should
compute the Jacobian vector product. More precisely, using the decomposition above, the method should compute

where \(\hat{w}_i\) is some derivative direction, and \(g_{i+1}\) is the operation represented by the block.
In `Block.evaluate_tlm()`

, \(\hat{w}_i\) has the same type as the function inputs (block dependencies) \(w_{i}\).
The following is a sketch of how `Block.evaluate_tlm()`

can be implemented

```
def evaluate_tlm(self, markings=False):
x = self.get_dependencies()[0].saved_output
x_hat = self.get_dependencies()[0].tlm_value
y_hat = derivative_example_function(x, x_hat)
self.get_outputs()[0].add_tlm_output(y_hat)
```

We have again assumed that the example function only has one input and one output.
Furthermore, we assume that we have implemented some derivative function in `derivative_example_function()`

.
The last line is the way to propagate the derivative directions forward in the tape.
It essentially just adds the value to the `tlm_value`

attribute of the output block variable,
so that the next block can fetch it using `tlm_value`

.

As with the recompute method, pyadjoint also offers a default `Block.evaluate_tlm()`

implementation,
that delegates to `Block.evaluate_tlm_component()`

for each output.
In our case, with only one output, the component method could look like this

```
def evaluate_tlm_component(self, inputs, tlm_inputs, block_variable, idx, prepared):
return derivative_example_function(inputs[0], tlm_inputs[0])
```

The `prepared`

parameter can be populated in the `Block.prepare_evaluate_tlm()`

method.

`Block.evaluate_adj()`

is responsible for computing the adjoint action or vector Jacobian product.
Using the notation above, `Block.evaluate_adj()`

should compute the following

where the adjoint operator should be defined through the \(l^2\) inner product. Assuming \(g_{i} : \mathbb{R}^n \rightarrow \mathbb{R}^m\), then the adjoint should be defined by

for all \(u \in \mathbb{R}^n, v \in \mathbb{R}^m\). Where \((a, b)_{\mathbb{R}^k} = a^Tb\) for all \(a,b \in \mathbb{R}^k, k \in \mathbb{N}\).

Using the same assumptions as earlier the implementation could look similar to this

```
def evaluate_adj(self, markings=False):
y_bar = self.get_outputs()[0].adj_value
x = self.get_dependencies()[0].saved_output
x_bar = derivative_adj_example_function(x, y_bar)
self.get_dependencies()[0].add_adj_output(x_bar)
```

There is also a default implementation for `Block.evaluate_adj()`

,
that calls the method `Block.evaluate_adj_component()`

for each relevant dependency.
This method could be implemented as follows

```
def evaluate_adj_component(self, inputs, adj_inputs, block_variable, idx, prepared):
return derivative_adj_example_function(inputs[0], adj_inputs[0])
```

If there is any common computations across dependencies, these can be implemented in
`Block.prepare_evaluate_adj()`

.

## Tape¶

As we have seen, we store the created block instances in a `Tape`

instance.
Each `Tape`

instance holds a `list`

of the block instances added to it.
There can exists multiple `Tape`

instances, but only one can be the current *working tape*.
The working tape is the tape which is annotated to, i.e in which we will store any block instances created.
It is also the tape that is by default interacted with when you run different pyadjoint functions that rely on
a tape. The current working tape can be set and retrieved with the functions `set_working_tape()`

and
`get_working_tape()`

.

Annotation can be temporarily disabled using `pause_annotation()`

and enabled again using `continue_annotation()`

.
It is recommended to use `stop_annotating`

and `no_annotations()`

for annotation control.
`stop_annotating`

is a context manager and should be used as follows

```
with stop_annotating():
# Code without annotation
...
```

`no_annotations()`

is a decorator for disabling annotation within functions or methods.
To check if annotation is enabled, use the function `annotate_tape()`

.

Apart from storing the block instances, the `Tape`

class offers a few methods for interaction
with the computational graph. `Tape.visualise()`

can be used to visualise the computational graph
in a graph format. This can be useful for debugging purposes. `Tape.optimize()`

offers a way to
remove block instances that are not required for a reduced function. For optimizing the tape based on either
a reduced output or input space, use the methods `Tape.optimize_for_functionals()`

and `Tape.optimize_for_controls()`

.
Because these optimize methods mutate the tape, it can be useful to use the `Tape.copy()`

method to
keep a copy of the original list of block instances.
To add block instances to the tape and retrieve the list of block instances, use `Tape.add_block()`

and `Tape.get_blocks()`

.

Other `Tape`

methods are primarily used internally and users will rarely access these directly.
However, it can be useful to know and use these methods when implementing custom overloaded functions.
The tape instance methods that activate the `Block.evaluate_adj()`

and `Block.evaluate_tlm()`

methods are
`Tape.evaluate_adj()`

, `Tape.evaluate_tlm()`

.
These methods just iterate over all the blocks and call the corresponding evaluate method of the block.
Usually some initialization is required, which is why these methods will likely not be called directly by the user.
For example, for the backward sweep (`Tape.evaluate_adj()`

) to work, you must initialize your functional
adjoint value with the value 1. This is the default behaviour of the `compute_gradient()`

function.

Similarly, to run the `Tape.evaluate_tlm()`

properly, a direction, \(\hat{x}\), must be specified.
This can be done as follows

```
y = example_function(x)
x.block_variable.tlm_value = x_hat
tape = get_working_tape()
tape.evaluate_tlm()
dydx = y.block_variable.tlm_value
```

In a similar way, one can compute the gradient without using `compute_gradient()`

```
y = example_function(x)
y.block_variable.adj_value = y_bar
tape = get_working_tape()
tape.evaluate_adj()
grady = x.block_variable.adj_value
```

Where `y_bar`

could be 1 if `y`

is a float.
However, `compute_gradient()`

also performs other convenient operations.
For example, it utilizes the markings flag in the `Block.evaluate_adj()`

method.
The markings are applied using the context manager `Tape.marked_nodes()`

.
In addition, `compute_gradient()`

converts `adj_value`

to overloaded types using the
`OverloadedType._ad_convert_type()`

method.