Some facts about the extension of the Bimanual Actions Dataset.
Subjects | 6 subjects (3 female, 3 male; 5 right-handed, 1 left-handed) |
Tasks | 2 tasks (in a kitchen context) |
Recordings | 120 recordings in total (6 subjects performed 2 tasks with 10 repetitions) |
Playtime | 1 hours and 2 minutes |
Quality | 1920 px × 1080 px image resolution; 30 fps (640 px × 576 px for depth data) |
Objects | 6 objects (bowl, cup, plate, rolling pin, spoon, whisk) |
Annotations | Actions fully labelled for both hands individually; x xxx frames labelled with object bounding boxes |
If you use the Bimanual Categories Extension of the KIT Bimanual Actions Dataset, please consider citing our corresponding work.
@inproceedings{krebsleven2023, title = {Recognition of Bimanual Manipulation Categories in RGB-D Human Demonstration}, author = {Krebs, Franziska and Leven, Leonie and Asfour, Tamim}, booktitle = {IEEE-RAS 22th International Conference on Humanoid Robots (Humanoids)}, year = {2023}, organization = {IEEE} }
In the following sections you can download the RGB-D dataset, derived data, and relevant documents. Clicking the icon will redirect you to the relevant part in the information page for an overview on the corresponding data format or other relevant information.
Data which was derived from the RGB-D dataset, like human pose or object bounding boxes. The data of the original KIT Bimanual Actions Dataset and the extension are combined. From the original KIT Bimanual Actions Dataset the following scenarios are included: cooking, cooking with bowls, pouring, wiping, cereals.The data from subjects with the same number is combined from both datasets. In addition, the data is mirrored to account for underrepresentation of left-handed subjects. Each downloadable file contains the information of all subjects.
File | Updated | Size | SHA256 hash |
---|---|---|---|
Training data (combined dataset + mirrored) | computing… | computing… | computing… |
Ground truth label (all subjects and all szenarios) | computing… | computing… | computing… |
The RGB-D dataset split on individual subjects and annotations. The subjects are different from the subjects of the KIT Bimanual Actions Dataset.
File | Updated | Size | SHA256 hash |
---|---|---|---|
RGB-D videos part 1/6 (subject 1) | computing… | computing… | computing… |
RGB-D videos part 2/6 (subject 2) | computing… | computing… | computing… |
RGB-D videos part 3/6 (subject 3) | computing… | computing… | computing… |
RGB-D videos part 4/6 (subject 4) | computing… | computing… | computing… |
RGB-D videos part 5/6 (subject 5) | computing… | computing… | computing… |
RGB-D videos part 6/6 (subject 6) | computing… | computing… | computing… |
Data which was derived from the RGB-D dataset, like human pose or object bounding boxes. Each downloadable file contains the information of all subjects.
File | Updated | Size | SHA256 hash |
---|---|---|---|
3D human body pose (Azure Kinect Body Tracking) | computing… | computing… | computing… |
2D object bounding boxes (Yolo) | computing… | computing… | computing… |
3D bounding boxes | computing… | computing… | computing… |
3D spatial relations | computing… | computing… | computing… |
Relevant documents for this dataset.
File | Updated | Size | SHA256 hash |
---|---|---|---|
Original briefing document | computing… | computing… | computing… |
Refer to the following table for a mapping of bimanual category IDs and their symbolic name.
# | Action |
---|---|
0 | no action |
1 | unimanual left |
2 | unimanual right |
3 | loosely |
4 | tightly symmetric |
5 | tightly asymmetric right dominant |
6 | tightly asymmetric left dominant |
In this recordings, ArUco markers are used to construct a global coordinate system. One marker is attached to each of the two frontal corners respectively. The left one defines the origin of the global coordinate system.
The new recordings were recorded to provide more data for the classification of bimanual categories in the context of kitchen activities. Therefore, the new recordings and the old scenarios with the kitchen context are labeled based on these categories. The structure is similar to the other ground truth files:
{"category": [0, 0, 38, 3, 111, 5, 315, 3, 574, 5, 847, 3, 950, 0, 1014]}
All even elements depict key frames. All odd elements depict the Bimanual Category ID.
TODO: List for which of the old recordings those are available
In contrast to the old recordings, for the new ones Azure Kinect Body Tracking is used, to extract the human body pose. Apart from the hand's bounding boxes in the 3D Object files, the raw body tracking data of the whole body is also provided in separated files. The structure of these files is as follows: Every entry of the frames array contains an array with the different bodies recognized in this frame (always one entry in our case). For this body, there are the joint positions ordered by their index (see here).
{
"bodies": [
{
"body_id": 1,
"joint_positions": [
[
96.6239242553711,
-952.4153442382813,
1994.73486328125
],
[
94.67913818359375,
-1012.8400268554688,
1946.5592041015625
],
[...]
[
22.749197006225586,
-1078.6497802734375,
1924.5601806640625
]
]
}
],
"frame_id": 0,
"num_bodies": 1,
"timestamp_usec": 51066
}
Since we need a balance between right-handed and left-handed subjects, we mirrored the newly recorded data along the x-axis. For this, the labels were changed from right to left and the other way around. In addition, the relations left of and right of were switched. The object's x-coordinates were multiplied by -1. With this, we have every subject as right-handed and left-handed one.
Additional information about the format for the 2d bounding boxes and the spatial relations can be found here.