Proposal#

Project Name: End-to-end autonomous vehicle driving based on text-based instructions.

Abstract#

JdeRobot is an open-source toolkit for developing Robotics applications. Amongst various projects in Behavior Metrics for evaluating DL models’ performance in autonomous driving tasks. In this project, the aim is to integrate a Language Model (LM) system with an end-to-end autonomous driving model. By combining previous knowledge and successful projects, the goal is to enable users to provide text-based commands directly to the vehicle, similar to interacting with a real-life taxi. The project will commence with a focus on simplicity, utilizing models like BERT, and gradually iterating towards more complex architectures.

_images/BERT-open-loop-light.svg

Integrating BERT or a similar Language Model (LM) for generating High-Level Commands (HLC), to perform autonomous driving using Carla simulation.#

_images/BERT-open-loop-dark.svg

Integrating BERT or a similar Language Model (LM) for generating High-Level Commands (HLC), to perform autonomous driving using Carla simulation.#

_images/BERT-closed-loop-light.svg

Using Vision Models such a LLaVA for providing feedback to language model.#

_images/BERT-closed-loop-dark.svg

Using Vision Models such a LLaVA for providing feedback to language model.#

Timeline#

Time

Tasks

Community Bonding Period

• Thoroughly familiarize myself with the code base.
• Set up a blog website for project documentation and updates.
• Conduct a comprehensive literature survey to identify relevant LM architectures and fine-tuning techniques.
• Discuss project groundwork and implementation strategies with mentors.

Week 1, 2 & 3
(May 27 - Jun 17)

• Implement a basic NLP-based controller using BERT or a similar LM.
• Develop an initial prototype for text-based command input and vehicle control.
• Finetuning and training BERT model for generating HighLevelCommands(HLCs).
• This will update get_random_hlc and instead of randomized HLC, text-based HLCs will be produced.

Week 4 & 5
(Jun 17 - 30)

• Understanding and integrating vision encoders (like LMDrive) for closed-loop control. (The above is open-loop setting)
• Study and discuss the feasibility of reproducing other approaches like Driving-with-LLMs within the project framework.

Evaluation Week 6 & 7
(Jul 1 - 12)

• Train the integrated system on the LMDrive dataset for performance evaluation.
• Explore the possibility of creating a custom dataset using data_collector.py to further enhance training data diversity and model robustness.
• Meet the Phase 1 Evaluation deadline.

Jul 12

Phase 1 Evaluation deadline

Week 8 & 9
(Jul 15 - 29)

• Explore the use of Vision-Language Models (VLMs) like LLaVA to improve the system’s understanding of visual inputs.

Week 10 & 11
(Jul 29 - Aug 12)

• Investigate extending the evaluation metrics using Visual Question Answering (VQA) techniques, such as LingoQA, to enhance system comprehension and response accuracy.

Week 12 & 13
(Aug 12 - 26)

• Finalize project deliverables, including code, documentation, and any additional materials.
• Conduct thorough testing and validation of the integrated system to ensure reliability and performance consistency.
• Prepare the final report summarizing project outcomes, challenges faced, solutions implemented, and future directions for potential improvements.

Week 14 & 15

• Buffer period for any unexpected delays or additional tasks.
• Finalize project deliverables and ensure all code and documentation are properly organized and submitted.

References#