Processing Collected Data

Once you collect some new data on the Stick, you need to process it into a dataset before you can train policies on it. This step will help you get started on that.

Clone the Repo

git clone git@github.com:notmahi/dobb-e
cd dobb-e/stick-data-collection

Installation

  • On your machine, in a new conda/virtual environment

    mamba env create -f conda_env.yaml

Usage

For extracting a single environment:

  1. Compress video taken from the Record3D app:

  2. Get the files on your machine.

    1. Using Google drive:

      1. [Only once] Generate Google Service Account API key to download from private folders on Google Drive. There are some instructions on how to do so in this Stackoverflow link https://stackoverflow.com/a/72076913

      2. [Only once] Rename the .json file to client_secret.json and put it in the same directory as gdrive_downloader.py

      3. Upload .zip file into its own folder on Google Drive, and copy folder_id from URL to put it in the GDRIVE_FOLDER_ID in the ./do-all.sh file.

    2. Manually:

      • Comment out the GDRIVE_FOLDER_ID line from ./do-all.sh and create the following hierarchy locally

        dataset/
        |--- task1/
        |------ home1/
        |--------- env1/
        |------------ {data_file}.zip
        |--------- env2/
        |------------ {data_file}.zip
        |--------- env.../
        |------------ {data_file}.zip
        |------ home2/
        |------ home.../
        |--- task2/
        |--- task.../
      • The .zip files should contain .r3d files exported from the Record3D app in the previous step.

  3. Modify required variables in do-all.sh.

    1. TASK_NO task id, see gdrive_downloader.py for more information.

    2. HOME name or ID of the home.

    3. ROOT_FOLDER folder where the data is stored after downloading.

    4. EXPORT_FOLDER folder where the dataset is stored after processing. Should be different from ROOT_FOLDER.

    5. ENV_NO current environment number in the same home and task set.

    6. GRIPPER_MODEL_PATH path to the gripper model. It should be in the github repo already, and can be downloaded from http://dl.dobb-e.com/models/gripper_model.pth.

  4. Change current working directory to local repository root folder and run

    ./do-all.sh
  5. Split the extracted data to include a validation set for each environment. The data should follow the following hierarchy: (Be sure change the corresponding paths in r3d_files.txt to include “_val”)\

    dataset/
    |--- task1/
    |------ home1/
    |--------- env1/
    |--------- env1_val/
    |--------- env2/
    |--------- env2_val/
    |--------- env.../
    |------ home2/
    |--------- env1/
    |--------- env1_val/
    |--------- env.../
    |------ home.../
    |--- task2/
    |------ home1/
    |--------- env1/
    |--------- env1_val/
    |--------- env.../
    |------ home2/
    |--------- env1/
    |--------- env1_val/
    |--------- env.../
    |------ home.../
    |--- task.../
    |--- r3d_files.txt

Last updated