Bimanual Categories

Download Information

The Dataset in Numbers

Some facts about the extension of the Bimanual Actions Dataset.

Subjects 6 subjects (3 female, 3 male; 5 right-handed, 1 left-handed)
Tasks 2 tasks (in a kitchen context)
Recordings 120 recordings in total (6 subjects performed 2 tasks with 10 repetitions)
Playtime 1 hours and 2 minutes
Quality 1920 px × 1080 px image resolution; 30 fps (640 px × 576 px for depth data)
Objects 6 objects (bowl, cup, plate, rolling pin, spoon, whisk)
Annotations Actions fully labelled for both hands individually; x xxx frames labelled with object bounding boxes

Cite

If you use the Bimanual Categories Extension of the KIT Bimanual Actions Dataset, please consider citing our corresponding work.

@inproceedings{krebsleven2023,
  title        = {Recognition of Bimanual Manipulation Categories in RGB-D Human Demonstration},
  author       = {Krebs, Franziska and Leven, Leonie and Asfour, Tamim},
  booktitle    = {IEEE-RAS 22th International Conference on Humanoid Robots (Humanoids)},
  year         = {2023},
  organization = {IEEE}
}

Data Download

In the following sections you can download the RGB-D dataset, derived data, and relevant documents. Clicking the icon will redirect you to the relevant part in the information page for an overview on the corresponding data format or other relevant information.

Category Ground Truth Labels

Data which was derived from the RGB-D dataset, like human pose or object bounding boxes. The data of the original KIT Bimanual Actions Dataset and the extension are combined. From the original KIT Bimanual Actions Dataset the following scenarios are included: cooking, cooking with bowls, pouring, wiping, cereals.The data from subjects with the same number is combined from both datasets. In addition, the data is mirrored to account for underrepresentation of left-handed subjects. Each downloadable file contains the information of all subjects.

File Updated Size SHA256 hash
Training data (combined dataset + mirrored) computing… computing… computing…
Ground truth label (all subjects and all szenarios) computing… computing… computing…

Extension Bimanual Actions RGB-D Dataset

The RGB-D dataset split on individual subjects and annotations. The subjects are different from the subjects of the KIT Bimanual Actions Dataset.

File Updated Size SHA256 hash
RGB-D videos part 1/6 (subject 1) computing… computing… computing…
RGB-D videos part 2/6 (subject 2) computing… computing… computing…
RGB-D videos part 3/6 (subject 3) computing… computing… computing…
RGB-D videos part 4/6 (subject 4) computing… computing… computing…
RGB-D videos part 5/6 (subject 5) computing… computing… computing…
RGB-D videos part 6/6 (subject 6) computing… computing… computing…

Extension Bimanual Actions Dataset, Appendix: Derived Data

Data which was derived from the RGB-D dataset, like human pose or object bounding boxes. Each downloadable file contains the information of all subjects.

File Updated Size SHA256 hash
3D human body pose (Azure Kinect Body Tracking) computing… computing… computing…
2D object bounding boxes (Yolo) computing… computing… computing…
3D bounding boxes computing… computing… computing…
3D spatial relations computing… computing… computing…

Documents

Relevant documents for this dataset.

File Updated Size SHA256 hash
Original briefing document computing… computing… computing…

Information

Bimanual Category Label Mapping

Refer to the following table for a mapping of bimanual category IDs and their symbolic name.

# Action
0no action
1unimanual left
2unimanual right
3loosely
4tightly symmetric
5tightly asymmetric right dominant
6tightly asymmetric left dominant

RGB-D Coordinate System

In this recordings, ArUco markers are used to construct a global coordinate system. One marker is attached to each of the two frontal corners respectively. The left one defines the origin of the global coordinate system.

Ground Truth

The new recordings were recorded to provide more data for the classification of bimanual categories in the context of kitchen activities. Therefore, the new recordings and the old scenarios with the kitchen context are labeled based on these categories. The structure is similar to the other ground truth files:

{"category": [0, 0, 38, 3, 111, 5, 315, 3, 574, 5, 847, 3, 950, 0, 1014]}

All even elements depict key frames. All odd elements depict the Bimanual Category ID.

TODO: List for which of the old recordings those are available

Human Body Pose Data

In contrast to the old recordings, for the new ones Azure Kinect Body Tracking is used, to extract the human body pose. Apart from the hand's bounding boxes in the 3D Object files, the raw body tracking data of the whole body is also provided in separated files. The structure of these files is as follows: Every entry of the frames array contains an array with the different bodies recognized in this frame (always one entry in our case). For this body, there are the joint positions ordered by their index (see here).

{
    "bodies": [
        {
            "body_id": 1,
            "joint_positions": [
                [
                    96.6239242553711,
                    -952.4153442382813,
                    1994.73486328125
                ],
                [
                    94.67913818359375,
                    -1012.8400268554688,
                    1946.5592041015625
                ],
                [...]
                [
                    22.749197006225586,
                    -1078.6497802734375,
                    1924.5601806640625
                ]
            ]
        }
    ],
    "frame_id": 0,
    "num_bodies": 1,
    "timestamp_usec": 51066
}
            

Data Mirroring

Since we need a balance between right-handed and left-handed subjects, we mirrored the newly recorded data along the x-axis. For this, the labels were changed from right to left and the other way around. In addition, the relations left of and right of were switched. The object's x-coordinates were multiplied by -1. With this, we have every subject as right-handed and left-handed one.

Additional information about the format for the 2d bounding boxes and the spatial relations can be found here.