Basics#
Basics#
Note
What you see in this notebook will depend on whether you’ve run this notebook before and written annotations to the annotations.db
database! For reproducibility, the rest of the notebook will assume the annotations.db
has been deleted (if it exists).
Setup#
This basic example will be of a time series where we want to annotate various time intervals to illustrate the basics of the annotation system. Note that annotators can annotate all sorts of elements (e.g., Image
, Scatter
, etc.) with many different region types, which will be demonstrated later.
import hvplot.pandas
import numpy as np
import pandas as pd
from holonote.annotate import Annotator
speed_data = pd.read_parquet('../assets/example.parquet')
speed_curve = speed_data.hvplot('TIME', 'SPEED')
speed_curve
In the simplest case, simply wrap the element (here in a curve) in an Annotator
:
annotator = Annotator(speed_curve, fields=['description'])
The fields
argument lists the fields associated with the annotations we will be defining. When working with tabular data (the typical case), you can think of fields as the columns of your table containing information about annotated regions.
Here we supplied an element to annotator to the Annotator
but note that most of the functionality of annotators can be made available by specifying the key dimensions and their types. The following is equivalent to the above declaration:
annotator = Annotator({'TIME': np.datetime64}, fields=['description'])
Now we can create an overlay of our element, a dynamicmap that shows the defined annotation regions and a dynamicmap used to define new regions:
annotator * speed_curve # If you have a database file generated by a previous run, your annotations will now be displayed
Basic operations on annotations#
Adding single annotations#
Using the select tool, you can define a region of interest to annotate and run the following cell:
annotator.add_annotation(description='My first annotation!')
You can set the range of interest programmatically as well:
annotator.set_regions(TIME=(np.datetime64('2022-06-06'), np.datetime64('2022-06-08')))
annotator.add_annotation(description='A programmatically defined annotation')
You should now see that annotated regions have appeared in the plot above. We can view a DataFrame
of the data collected as follows:
annotator.df
start[TIME] | end[TIME] | description | |
---|---|---|---|
uuid | |||
31077baf2fbc46f798e385def9c369fa | 2022-06-06 | 2022-06-08 | A programmatically defined annotation |
It is important to note the automatically generated uuid
index (by default), which will be discussed in the next section.
To persist these annotations, we call the .commit()
method:
annotator.commit()
Now if we restart the notebook session you will see your annotations are automatically loaded and displayed.
Simple selection of annotations#
The uuid
index column of the dataframe above is how we refer to individual annotations. We may use this column directly, for instance we could get the uuid of the last annotation directly as follows:
uuid_of_last_annotation = annotator.df.index[-1]
f'Last UUID in the dataframe: {uuid_of_last_annotation}'
'Last UUID in the dataframe: 31077baf2fbc46f798e385def9c369fa'
Note that UUID values are randomly generated (by default), which means we do not know what these values will be ahead of time. As a result, we need a programmatic way to access them. Using the dataframe index directly is awkward, so annotators offer a more natural, interactive way to select annotations - simply click on them in the plot to select them.
Click on a range region in the plot above and run the following cell to see its UUID:
annotator.selected_index # None if no annotations are selected
Deleting single annotations#
Now we have added some annotations and have a way to select them, we can delete them.
Select an annotation on the plot and run the following cell to delete it:
selected_index = annotator.selected_index if annotator.selected_index else annotator.df.index[-1]
annotator.delete_annotation(selected_index)
Updating annotations#
First, let us add a new annotation to update:
annotator.set_regions(TIME=(np.datetime64('2022-06-15'), np.datetime64('2022-06-18')))
annotator.add_annotation(description='An annotation description we will update...')
Now click on the new annotation and run the following cell:
annotator.update_annotation_fields(annotator.selected_index if annotator.selected_index else annotator.df.index[-1],
description='The description is now updated!')
To verify this operation worked, note how the hover information has been updated in the plot above.
Remember that all your annotation changes are not persisted in the database until you call commit
! Frequent commits are recommended.
annotator.commit()
Explicit primary keys#
We have the option to specify our own index when creating an annotation if necessary:
input_uuid = 'deadcafe'
description = f'Annotation with set UUID {input_uuid!r}'
annotator.set_regions(TIME=(np.datetime64('2022-06-10'), np.datetime64('2022-06-13')))
annotator.add_annotation(description=description, uuid=input_uuid)
description
"Annotation with set UUID 'deadcafe'"
This option gives you complete control over what is entered into the database. However, it is recommended that you do not specify the primary key value yourself unless you need to due to the following caveats:
Caveats when picking your own primary keys#
While it may occasionally be convenient to name your annotations in notebooks with set primary key values, you should be aware that the primary key value you pick is then supplied to the database. This implies the following restrictions:
It is your responsibility to ensure the key is unique and not used by any other annotation in the database.
It is your responsibility to ensure the key is of the valid type and format for storage in the database.
For these reasons, it is generally recommended you allow the annotator to pick the key values automatically (a process you can customize as detailed in the Persisting Annotations notebook) and then refer to annotations via the dataframe index or interactive selection as previously demonstrated.
Now we have demonstrated the creation of an explicitly named annotation, we can delete it (as well as the remaining annotation) and revert to using the recommended mechanisms for selecting annotations:
annotator.delete_annotation(['deadcafe', annotator.df.index[0]]) # Example of deleting multiple annotations
annotator.commit()
Adding and deleting multiple annotations#
Loading from a dataframe#
Sometimes, we already have a DataFrame with annotations in it, and we want to load, and it does not make sense to loop over the rows for insertion. Suppose we have the following dataframe:
starts = pd.date_range("2022-06-06", freq="3D", periods=3)
ends = starts + pd.Timedelta("2D")
descriptions = ["Annotation 0", "Annotation 1", "Annotation 2"]
data = pd.DataFrame({'start':starts, 'end':ends, 'description':descriptions})
data
start | end | description | |
---|---|---|---|
0 | 2022-06-06 | 2022-06-08 | Annotation 0 |
1 | 2022-06-09 | 2022-06-11 | Annotation 1 |
2 | 2022-06-12 | 2022-06-14 | Annotation 2 |
To load this data, we use define_annotations
and pass in the columns from the DataFrame.
annotator.define_annotations(data, TIME=("start", "end"), description="description")
If a column name matches with a name of region or a field it will be used this means the description="description
in the above line is not needed.
annotator.commit()
Preserving the index#
Sometimes, the annotations you are loading have meaningful primary keys defined elsewhere (e.g., some pre-existing database) that need to be preserved. This is possible by supplying index=True
in the define_annotations
method.
uuids = ['DEADC0DE', 'CAFED00D', 'BAADF00D']
indexed_data = pd.DataFrame({'uuid':uuids,
'start':[s+pd.Timedelta(0.5, unit='days') for s in starts],
'end': [e+pd.Timedelta(0.5, unit='days') for e in ends],
'description':[f'Labelled {el}' for el in descriptions]}).set_index('uuid')
indexed_data
start | end | description | |
---|---|---|---|
uuid | |||
DEADC0DE | 2022-06-06 12:00:00 | 2022-06-08 12:00:00 | Labelled Annotation 0 |
CAFED00D | 2022-06-09 12:00:00 | 2022-06-11 12:00:00 | Labelled Annotation 1 |
BAADF00D | 2022-06-12 12:00:00 | 2022-06-14 12:00:00 | Labelled Annotation 2 |
To preserve the index call, the define_annotations
must be called with index=True
:
annotator.define_annotations(indexed_data, TIME=("start", "end"), description="description", index=True)
annotator.df
start[TIME] | end[TIME] | description | |
---|---|---|---|
uuid | |||
4e05751db8c644db809ac5b43a163e75 | 2022-06-06 00:00:00 | 2022-06-08 00:00:00 | Annotation 0 |
adefbaf832bb48fbaa140e4a2526af86 | 2022-06-09 00:00:00 | 2022-06-11 00:00:00 | Annotation 1 |
a1ecd473712a414e9814569c2bce6d84 | 2022-06-12 00:00:00 | 2022-06-14 00:00:00 | Annotation 2 |
DEADC0DE | 2022-06-06 12:00:00 | 2022-06-08 12:00:00 | Labelled Annotation 0 |
CAFED00D | 2022-06-09 12:00:00 | 2022-06-11 12:00:00 | Labelled Annotation 1 |
BAADF00D | 2022-06-12 12:00:00 | 2022-06-14 12:00:00 | Labelled Annotation 2 |
annotator.commit()