Basics#

Basics#

Note

What you see in this notebook will depend on whether you’ve run this notebook before and written annotations to the annotations.db database! For reproducibility, the rest of the notebook will assume the annotations.db has been deleted (if it exists).

Setup#

This basic example will be of a time series where we want to annotate various time intervals to illustrate the basics of the annotation system. Note that annotators can annotate all sorts of elements (e.g., Image, Scatter, etc.) with many different region types, which will be demonstrated later.

import hvplot.pandas
import numpy as np
import pandas as pd
from holonote.annotate import Annotator


speed_data = pd.read_parquet('../assets/example.parquet')
speed_curve = speed_data.hvplot('TIME', 'SPEED')
speed_curve

In the simplest case, simply wrap the element (here in a curve) in an Annotator:

annotator = Annotator(speed_curve, fields=['description'])

The fields argument lists the fields associated with the annotations we will be defining. When working with tabular data (the typical case), you can think of fields as the columns of your table containing information about annotated regions.

Here we supplied an element to annotator to the Annotator but note that most of the functionality of annotators can be made available by specifying the key dimensions and their types. The following is equivalent to the above declaration:

annotator = Annotator({'TIME': np.datetime64}, fields=['description'])

Now we can create an overlay of our element, a dynamicmap that shows the defined annotation regions and a dynamicmap used to define new regions:

annotator * speed_curve  # If you have a database file generated by a previous run, your annotations will now be displayed
Note: The tools made available by the region editor are appropriate to both the enable region types as well as the dimensionality of the element. Here, a single key dimension on the x-axis.

Basic operations on annotations#

Adding single annotations#

Using the select tool, you can define a region of interest to annotate and run the following cell:

annotator.add_annotation(description='My first annotation!')

You can set the range of interest programmatically as well:

annotator.set_regions(TIME=(np.datetime64('2022-06-06'), np.datetime64('2022-06-08')))
annotator.add_annotation(description='A programmatically defined annotation')

You should now see that annotated regions have appeared in the plot above. We can view a DataFrame of the data collected as follows:

annotator.df
start[TIME] end[TIME] description
uuid
31077baf2fbc46f798e385def9c369fa 2022-06-06 2022-06-08 A programmatically defined annotation

It is important to note the automatically generated uuid index (by default), which will be discussed in the next section.

To persist these annotations, we call the .commit() method:

annotator.commit()

Now if we restart the notebook session you will see your annotations are automatically loaded and displayed.

Simple selection of annotations#

The uuid index column of the dataframe above is how we refer to individual annotations. We may use this column directly, for instance we could get the uuid of the last annotation directly as follows:

uuid_of_last_annotation = annotator.df.index[-1]
f'Last UUID in the dataframe: {uuid_of_last_annotation}'
'Last UUID in the dataframe: 31077baf2fbc46f798e385def9c369fa'

Note that UUID values are randomly generated (by default), which means we do not know what these values will be ahead of time. As a result, we need a programmatic way to access them. Using the dataframe index directly is awkward, so annotators offer a more natural, interactive way to select annotations - simply click on them in the plot to select them.

Click on a range region in the plot above and run the following cell to see its UUID:

annotator.selected_index  # None if no annotations are selected

Deleting single annotations#

Now we have added some annotations and have a way to select them, we can delete them.

Select an annotation on the plot and run the following cell to delete it:

selected_index = annotator.selected_index if annotator.selected_index else annotator.df.index[-1]
annotator.delete_annotation(selected_index)

Updating annotations#

First, let us add a new annotation to update:

annotator.set_regions(TIME=(np.datetime64('2022-06-15'), np.datetime64('2022-06-18')))
annotator.add_annotation(description='An annotation description we will update...')

Now click on the new annotation and run the following cell:

annotator.update_annotation_fields(annotator.selected_index if annotator.selected_index else annotator.df.index[-1],
                                   description='The description is now updated!')

To verify this operation worked, note how the hover information has been updated in the plot above.

Remember that all your annotation changes are not persisted in the database until you call commit! Frequent commits are recommended.

annotator.commit()

Explicit primary keys#

We have the option to specify our own index when creating an annotation if necessary:

input_uuid = 'deadcafe'
description = f'Annotation with set UUID {input_uuid!r}'
annotator.set_regions(TIME=(np.datetime64('2022-06-10'), np.datetime64('2022-06-13')))
annotator.add_annotation(description=description, uuid=input_uuid)
description
"Annotation with set UUID 'deadcafe'"

This option gives you complete control over what is entered into the database. However, it is recommended that you do not specify the primary key value yourself unless you need to due to the following caveats:

Caveats when picking your own primary keys#

While it may occasionally be convenient to name your annotations in notebooks with set primary key values, you should be aware that the primary key value you pick is then supplied to the database. This implies the following restrictions:

  • It is your responsibility to ensure the key is unique and not used by any other annotation in the database.

  • It is your responsibility to ensure the key is of the valid type and format for storage in the database.

For these reasons, it is generally recommended you allow the annotator to pick the key values automatically (a process you can customize as detailed in the Persisting Annotations notebook) and then refer to annotations via the dataframe index or interactive selection as previously demonstrated.

Now we have demonstrated the creation of an explicitly named annotation, we can delete it (as well as the remaining annotation) and revert to using the recommended mechanisms for selecting annotations:

annotator.delete_annotation(['deadcafe', annotator.df.index[0]])  # Example of deleting multiple annotations
annotator.commit()

Adding and deleting multiple annotations#

Loading from a dataframe#

Sometimes, we already have a DataFrame with annotations in it, and we want to load, and it does not make sense to loop over the rows for insertion. Suppose we have the following dataframe:

starts = pd.date_range("2022-06-06", freq="3D", periods=3)
ends = starts + pd.Timedelta("2D")
descriptions = ["Annotation 0", "Annotation 1", "Annotation 2"]
data = pd.DataFrame({'start':starts, 'end':ends, 'description':descriptions})
data
start end description
0 2022-06-06 2022-06-08 Annotation 0
1 2022-06-09 2022-06-11 Annotation 1
2 2022-06-12 2022-06-14 Annotation 2

To load this data, we use define_annotations and pass in the columns from the DataFrame.

annotator.define_annotations(data, TIME=("start", "end"), description="description")

If a column name matches with a name of region or a field it will be used this means the description="description in the above line is not needed.

annotator.commit()

Preserving the index#

Sometimes, the annotations you are loading have meaningful primary keys defined elsewhere (e.g., some pre-existing database) that need to be preserved. This is possible by supplying index=True in the define_annotations method.

Note: The user bears the same responsibility for using appropriate index values as described in the Caveats when picking your own primary keys section.
uuids = ['DEADC0DE', 'CAFED00D', 'BAADF00D']
indexed_data = pd.DataFrame({'uuid':uuids,
                             'start':[s+pd.Timedelta(0.5, unit='days') for s in starts],
                             'end': [e+pd.Timedelta(0.5, unit='days') for e in ends],
                             'description':[f'Labelled {el}' for el in descriptions]}).set_index('uuid')
indexed_data
start end description
uuid
DEADC0DE 2022-06-06 12:00:00 2022-06-08 12:00:00 Labelled Annotation 0
CAFED00D 2022-06-09 12:00:00 2022-06-11 12:00:00 Labelled Annotation 1
BAADF00D 2022-06-12 12:00:00 2022-06-14 12:00:00 Labelled Annotation 2

To preserve the index call, the define_annotations must be called with index=True:

annotator.define_annotations(indexed_data, TIME=("start", "end"), description="description", index=True)
annotator.df
start[TIME] end[TIME] description
uuid
4e05751db8c644db809ac5b43a163e75 2022-06-06 00:00:00 2022-06-08 00:00:00 Annotation 0
adefbaf832bb48fbaa140e4a2526af86 2022-06-09 00:00:00 2022-06-11 00:00:00 Annotation 1
a1ecd473712a414e9814569c2bce6d84 2022-06-12 00:00:00 2022-06-14 00:00:00 Annotation 2
DEADC0DE 2022-06-06 12:00:00 2022-06-08 12:00:00 Labelled Annotation 0
CAFED00D 2022-06-09 12:00:00 2022-06-11 12:00:00 Labelled Annotation 1
BAADF00D 2022-06-12 12:00:00 2022-06-14 12:00:00 Labelled Annotation 2
annotator.commit()
This web page was generated from a Jupyter notebook and not all interactivity will work on this website. Right click to download and run locally for full Python-backed interactivity.

Right click to download this notebook from GitHub.