scenarionet/documentation/operations.rst

###########
Operations
###########

How to run
~~~~~~~~~~

We provide various basic operations allowing users to modify the built database for ML applications.
These operations include building database from different data providers;aggregating datasets from diverse source;
splitting datasets to training/test set;sanity check/filtering scenarios.
All commands can be run with ``python -m scenarionet.[command]``, e.g. ``python -m scenarionet.list`` for listing available operations.
The parameters for each script can be found by adding a ``-h`` flag.

.. note::
    When running ``python -m``, make sure the directory you are at doesn't contain a folder called ``scenarionet``.
    Otherwise, the running may fail.
    This usually happens if you install ScenarioNet or MetaDrive via ``git clone`` and put it under a directory you usually work with like home directory.

List
~~~~~

This command can list all operations with detailed descriptions::

    python -m scenarionet.list


Convert
~~~~~~~~

.. generated by python -m convert.command -h | fold -w 80

**ScenarioNet doesn't provide any data.**
Instead, it provides converters to parse common open-sourced driving datasets to an internal scenario description, which comprises scenario databases.
Thus converting scenarios to our internal scenario description is the first step to build the databases.
Currently,we provide convertors for Waymo, nuPlan, nuScenes (Lyft) datasets.

Convert Waymo
------------------------

.. code-block:: text

    python -m scenarionet.convert_waymo [-h] [--database_path DATABASE_PATH]
                            [--dataset_name DATASET_NAME] [--version VERSION]
                            [--overwrite] [--num_workers NUM_WORKERS]
                            [--raw_data_path RAW_DATA_PATH]
                            [--start_file_index START_FILE_INDEX]
                            [--num_files NUM_FILES]

    Build database from Waymo scenarios

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            A directory, the path to place the converted data
      --dataset_name DATASET_NAME, -n DATASET_NAME
                            Dataset name, will be used to generate scenario files
      --version VERSION, -v VERSION
                            version
      --overwrite           If the database_path exists, whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use
      --raw_data_path RAW_DATA_PATH
                            The directory stores all waymo tfrecord
      --start_file_index START_FILE_INDEX
                            Control how many files to use. We will list all files
                            in the raw data folder and select
                            files[start_file_index: start_file_index+num_files]
      --num_files NUM_FILES
                            Control how many files to use. We will list all files
                            in the raw data folder and select
                            files[start_file_index: start_file_index+num_files]


This script converted the recorded scenario into our scenario descriptions.
Detailed guide is available at Section :ref:`waymo`.

Convert nuPlan
-------------------------

.. code-block:: text

    python -m scenarionet.convert_nuplan [-h] [--database_path DATABASE_PATH]
                         [--dataset_name DATASET_NAME] [--version VERSION]
                         [--overwrite] [--num_workers NUM_WORKERS]
                         [--raw_data_path RAW_DATA_PATH] [--test]

    Build database from nuPlan scenarios

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            A directory, the path to place the data
      --dataset_name DATASET_NAME, -n DATASET_NAME
                            Dataset name, will be used to generate scenario files
      --version VERSION, -v VERSION
                            version of the raw data
      --overwrite           If the database_path exists, whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use
      --raw_data_path RAW_DATA_PATH
                            the place store .db files
      --test                for test use only. convert one log


This script converted the recorded nuPlan scenario into our scenario descriptions.
It needs to install ``nuplan-devkit`` and download the source data from https://www.nuscenes.org/nuplan.
Detailed guide is available at Section :ref:`nuplan`.

Convert nuScenes (Lyft)
------------------------------------

.. code-block:: text

    python -m scenarionet.convert_nuscenes [-h] [--database_path DATABASE_PATH]
                               [--dataset_name DATASET_NAME] [--version VERSION]
                               [--overwrite] [--num_workers NUM_WORKERS]

    Build database from nuScenes/Lyft scenarios

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            directory, The path to place the data
      --dataset_name DATASET_NAME, -n DATASET_NAME
                            Dataset name, will be used to generate scenario files
      --version VERSION, -v VERSION
                            version of nuscenes data, scenario of this version
                            will be converted
      --overwrite           If the database_path exists, whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use


This script converted the recorded nuScenes scenario into our scenario descriptions.
It needs to install ``nuscenes-devkit`` and download the source data from https://www.nuscenes.org/nuscenes.
For Lyft datasets, this API can only convert the old version Lyft data as the old Lyft data can be parsed via `nuscenes-devkit`.
However, Lyft is now a part of Woven Planet and the new data has to be parsed via new toolkit.
We are working on support this new toolkit to support the new Lyft dataset.
Detailed guide is available at Section :ref:`nuscenes`.

Convert PG
-------------------------

.. code-block:: text

    python -m scenarionet.convert_pg [-h] [--database_path DATABASE_PATH]
                         [--dataset_name DATASET_NAME] [--version VERSION]
                         [--overwrite] [--num_workers NUM_WORKERS]
                         [--num_scenarios NUM_SCENARIOS]
                         [--start_index START_INDEX]

    Build database from synthetic or procedurally generated scenarios

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            directory, The path to place the data
      --dataset_name DATASET_NAME, -n DATASET_NAME
                            Dataset name, will be used to generate scenario files
      --version VERSION, -v VERSION
                            version
      --overwrite           If the database_path exists, whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use
      --num_scenarios NUM_SCENARIOS
                            how many scenarios to generate (default: 30)
      --start_index START_INDEX
                            which index to start


PG refers to Procedural Generation.
Scenario database generated in this way are created by a set of rules with hand-crafted maps.
These scenarios are collected by driving the ego car with an IDM policy in different scenarios.
Detailed guide is available at Section :ref:`pg`.


Merge
~~~~~~~~~

This command is for merging existing databases to build a larger one.
This is why we can build a ScenarioNet!
After converting data recorded in different format to this unified scenario description,
we can aggregate them freely and enlarge the database.

.. code-block:: text

    python -m scenarionet.merge [-h] --database_path DATABASE_PATH --from FROM [FROM ...]
                    [--exist_ok] [--overwrite] [--filter_moving_dist]
                    [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]

    Merge a list of databases. e.g. scenario.merge --from db_1 db_2 db_3...db_n
    --to db_dest

    optional arguments:
    -h, --help            show this help message and exit
    --database_path DATABASE_PATH, -d DATABASE_PATH
                        The name of the new combined database. It will create
                        a new directory to store dataset_summary.pkl and
                        dataset_mapping.pkl. If exists_ok=True, those two .pkl
                        files will be stored in an existing directory and turn
                        that directory into a database.
    --from FROM [FROM ...]
                        Which datasets to combine. It takes any number of
                        directory path as input
    --exist_ok            Still allow to write, if the dir exists already. This
                        write will only create two .pkl files and this
                        directory will become a database.
    --overwrite           When exists ok is set but summary.pkl and map.pkl
                        exists in existing dir, whether to overwrite both
                        files
    --filter_moving_dist  add this flag to select cases with SDC moving dist >
                        sdc_moving_dist_min
    --sdc_moving_dist_min SDC_MOVING_DIST_MIN
                        Selecting case with sdc_moving_dist > this value. We
                        will add more filter conditions in the future.


Split
~~~~~~~~~~

The split action is for extracting a part of scenarios from an existing one and building a new database.
This is usually used to build training/test/validation set.

.. code-block:: text

    python -m scenarionet.split [-h] --from FROM --to TO [--num_scenarios NUM_SCENARIOS]
                [--start_index START_INDEX] [--random] [--exist_ok]
                [--overwrite]

    Build a new database containing a subset of scenarios from an existing
    database.

    optional arguments:
      -h, --help            show this help message and exit
      --from FROM           Which database to extract data from.
      --to TO               The name of the new database. It will create a new
                            directory to store dataset_summary.pkl and
                            dataset_mapping.pkl. If exists_ok=True, those two .pkl
                            files will be stored in an existing directory and turn
                            that directory into a database.
      --num_scenarios NUM_SCENARIOS
                            how many scenarios to extract (default: 30)
      --start_index START_INDEX
                            which index to start
      --random              If set to true, it will choose scenarios randomly from
                            all_scenarios[start_index:]. Otherwise, the scenarios
                            will be selected sequentially
      --exist_ok            Still allow to write, if the to_folder exists already.
                            This write will only create two .pkl files and this
                            directory will become a database.
      --overwrite           When exists ok is set but summary.pkl and map.pkl
                            exists in existing dir, whether to overwrite both
                            files


Copy (Move)
~~~~~~~~~~~~~~~~

As the the database built by ScenarioNet stores the scenarios with virtual mapping,
directly move or copy an existing database to a new location with ``cp`` or ``mv`` command will break the soft link.
For moving or copying the scenarios to a new path, one should use this command.
When ``--remove_source`` is added, this ``copy`` command will be changed to ``move``.

.. code-block:: text

    python -m scenarionet.copy [-h] --from FROM --to TO [--remove_source] [--copy_raw_data]
                   [--exist_ok] [--overwrite]

    Move or Copy an existing database

    optional arguments:
      -h, --help       show this help message and exit
      --from FROM      Which database to move.
      --to TO          The name of the new database. It will create a new
                       directory to store dataset_summary.pkl and
                       dataset_mapping.pkl. If exists_ok=True, those two .pkl
                       files will be stored in an existing directory and turn that
                       directory into a database.
      --remove_source  Remove the `from_database` if set this flag
      --copy_raw_data  Instead of creating virtual file mapping, copy raw
                       scenario.pkl file
      --exist_ok       Still allow to write, if the to_folder exists already. This
                       write will only create two .pkl files and this directory
                       will become a database.
      --overwrite      When exists ok is set but summary.pkl and map.pkl exists in
                       existing dir, whether to overwrite both files


Num
~~~~~~~~~~

Report the number of scenarios in a database.

.. code-block:: text

    python -m scenarionet.num [-h] --database_path DATABASE_PATH

    The number of scenarios in the specified database

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            Database to check number of scenarios


Filter
~~~~~~~~

Some scenarios contain overpasses, short ego-car trajectory or traffic signals.
This scenarios can be filtered out from the database by using this command.
Now, we only provide filters for ego car moving distance, number of objects, traffic lights, overpasses and scenario ids.
If you would like to contribute new filters,
feel free to create an issue or pull request on our `Github repo <https://github.com/metadriverse/scenarionet>`_.

.. code-block:: text

    python -m scenarionet.filter [-h] --database_path DATABASE_PATH --from FROM
                          [--exist_ok] [--overwrite] [--moving_dist]
                          [--sdc_moving_dist_min SDC_MOVING_DIST_MIN]
                          [--num_object] [--max_num_object MAX_NUM_OBJECT]
                          [--no_overpass] [--no_traffic_light] [--id_filter]
                          [--exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]]

    Filter unwanted scenarios out and build a new database

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            The name of the new database. It will create a new
                            directory to store dataset_summary.pkl and
                            dataset_mapping.pkl. If exists_ok=True, those two .pkl
                            files will be stored in an existing directory and turn
                            that directory into a database.
      --from FROM           Which dataset to filter. It takes one directory path
                            as input
      --exist_ok            Still allow to write, if the dir exists already. This
                            write will only create two .pkl files and this
                            directory will become a database.
      --overwrite           When exists ok is set but summary.pkl and map.pkl
                            exists in existing dir, whether to overwrite both
                            files
      --moving_dist         add this flag to select cases with SDC moving dist >
                            sdc_moving_dist_min
      --sdc_moving_dist_min SDC_MOVING_DIST_MIN
                            Selecting case with sdc_moving_dist > this value.
      --num_object          add this flag to select cases with object_num <
                            max_num_object
      --max_num_object MAX_NUM_OBJECT
                            case will be selected if num_obj < this argument
      --no_overpass         Scenarios with overpass WON'T be selected
      --no_traffic_light    Scenarios with traffic light WON'T be selected
      --id_filter           Scenarios with indicated name will NOT be selected
      --exclude_ids EXCLUDE_IDS [EXCLUDE_IDS ...]
                            Scenarios with indicated name will NOT be selected


Build from Errors
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This script is for generating a new database to exclude (include) broken scenarios.
This is useful for debugging broken scenarios or building a completely clean datasets for training or testing.

.. code-block:: text

    python -m scenarionet.generate_from_error_file [-h] --database_path DATABASE_PATH --file
                                   FILE [--overwrite] [--broken]

    Generate a new database excluding or only including the failed scenarios
    detected by 'check_simulation' and 'check_existence'

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            The path of the newly generated database
      --file FILE, -f FILE  The path of the error file, should be xyz.json
      --overwrite           If the database_path exists, overwrite it
      --broken              By default, only successful scenarios will be picked
                            to build the new database. If turn on this flog, it
                            will generate database containing only broken
                            scenarios.


Sim
~~~~~~~~~~~

Load a database to simulator and replay the scenarios.
We provide different render mode allows users to visualize them.
For more details of simulation,
please check Section :ref:`simulation` or the `MetaDrive document <https://metadrive-simulator.readthedocs.io/en/latest/>`_.

.. code-block:: text

    python -m scenarionet.sim [-h] --database_path DATABASE_PATH
              [--render {none,2D,3D,advanced}]
              [--scenario_index SCENARIO_INDEX]

    Load a database to simulator and replay scenarios

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            The path of the database
      --render {none,2D,3D,advanced}
      --scenario_index SCENARIO_INDEX
                            Specifying a scenario to run


Check Existence
~~~~~~~~~~~~~~~~~~~~~

We provide a tool to check if the scenarios in a database are runnable and exist on your machine.
This is because we include the scenarios to a database, a folder, through a virtual mapping.
Each database only records the path of each scenario relative to the database directory.
Thus this script is for making sure all original scenario file exists and can be loaded.

If it manages to find some broken scenarios, an error file will be generated to the specified path.
By using ``generate_from_error_file``, a new database can be created to exclude or only include these broken scenarios.
In this way, we can debug the broken scenarios to check what causes the error or just ignore and remove the broke
scenarios to make the database intact.

.. code-block:: text

    python -m scenarionet.check_existence [-h] --database_path DATABASE_PATH
                              [--error_file_path ERROR_FILE_PATH] [--overwrite]
                              [--num_workers NUM_WORKERS] [--random_drop]

    Check if the database is intact and all scenarios can be found and recorded in
    internal scenario description

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            Dataset path, a directory containing summary.pkl and
                            mapping.pkl
      --error_file_path ERROR_FILE_PATH
                            Where to save the error file. One can generate a new
                            database excluding or only including the failed
                            scenarios.For more details, see operation
                            'generate_from_error_file'
      --overwrite           If an error file already exists in error_file_path,
                            whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use
      --random_drop         Randomly make some scenarios fail. for test only!

Check Simulation
~~~~~~~~~~~~~~~~~

This is a upgraded version of existence check.
It not only detect the existence and the completeness of the database, but check whether all scenarios can be loaded
and run in the simulator.

.. code-block:: text

    python -m scenarionet.check_simulation [-h] --database_path DATABASE_PATH
                           [--error_file_path ERROR_FILE_PATH] [--overwrite]
                           [--num_workers NUM_WORKERS] [--random_drop]

    Check if all scenarios can be simulated in simulator. We recommend doing this
    before close-loop training/testing

    optional arguments:
      -h, --help            show this help message and exit
      --database_path DATABASE_PATH, -d DATABASE_PATH
                            Dataset path, a directory containing summary.pkl and
                            mapping.pkl
      --error_file_path ERROR_FILE_PATH
                            Where to save the error file. One can generate a new
                            database excluding or only including the failed
                            scenarios.For more details, see operation
                            'generate_from_error_file'
      --overwrite           If an error file already exists in error_file_path,
                            whether to overwrite it
      --num_workers NUM_WORKERS
                            number of workers to use
      --random_drop         Randomly make some scenarios fail. for test only!

Check Overlap
~~~~~~~~~~~~~~~~

This script is for checking if there are some overlaps between two databases.
The main goal of this command is to ensure that the training and test sets are isolated.

.. code-block:: text

    python -m scenarionet.check_overlap [-h] --d_1 D_1 --d_2 D_2 [--show_id]

    Check if there are overlapped scenarios between two databases. If so, return
    the number of overlapped scenarios and id list

    optional arguments:
      -h, --help  show this help message and exit
      --d_1 D_1   The path of the first database
      --d_2 D_2   The path of the second database
      --show_id   whether to show the id of overlapped scenarios