Proposal#

Project Name: End-to-end autonomous vehicle driving based on text-based instructions.

Abstract#

JdeRobot is an open-source toolkit for developing Robotics applications. Amongst various projects in Behavior Metrics for evaluating DL models’ performance in autonomous driving tasks. In this project, the aim is to integrate a Language Model (LM) system with an end-to-end autonomous driving model. By combining previous knowledge and successful projects, the goal is to enable users to provide text-based commands directly to the vehicle, similar to interacting with a real-life taxi. The project will commence with a focus on simplicity, utilizing models like BERT, and gradually iterating towards more complex architectures.

_images/BERT-open-loop-light.svg — Integrating BERT or a similar Language Model (LM) for generating High-Level Commands (HLC), to perform autonomous driving using Carla simulation.#

_images/BERT-open-loop-dark.svg — Integrating BERT or a similar Language Model (LM) for generating High-Level Commands (HLC), to perform autonomous driving using Carla simulation.#

_images/BERT-closed-loop-light.svg — Using Vision Models such a LLaVA for providing feedback to language model.#

_images/BERT-closed-loop-dark.svg — Using Vision Models such a LLaVA for providing feedback to language model.#

Timeline#

Time	Tasks
Community Bonding Period	• Thoroughly familiarize myself with the code base. • Set up a blog website for project documentation and updates. • Conduct a comprehensive literature survey to identify relevant LM architectures and fine-tuning techniques. • Discuss project groundwork and implementation strategies with mentors.
Week 1, 2 & 3 (May 27 - Jun 17)	• Implement a basic NLP-based controller using BERT or a similar LM. • Develop an initial prototype for text-based command input and vehicle control. • Finetuning and training BERT model for generating HighLevelCommands(HLCs). • This will update get_random_hlc and instead of randomized HLC, text-based HLCs will be produced.
Week 4 & 5 (Jun 17 - 30)	• Understanding and integrating vision encoders (like LMDrive) for closed-loop control. (The above is open-loop setting) • Study and discuss the feasibility of reproducing other approaches like Driving-with-LLMs within the project framework.
Evaluation Week 6 & 7 (Jul 1 - 12)	• Train the integrated system on the LMDrive dataset for performance evaluation. • Explore the possibility of creating a custom dataset using data_collector.py to further enhance training data diversity and model robustness. • Meet the Phase 1 Evaluation deadline.
Jul 12	Phase 1 Evaluation deadline
Week 8 & 9 (Jul 15 - 29)	• Explore the use of Vision-Language Models (VLMs) like LLaVA to improve the system’s understanding of visual inputs.
Week 10 & 11 (Jul 29 - Aug 12)	• Investigate extending the evaluation metrics using Visual Question Answering (VQA) techniques, such as LingoQA, to enhance system comprehension and response accuracy.
Week 12 & 13 (Aug 12 - 26)	• Finalize project deliverables, including code, documentation, and any additional materials. • Conduct thorough testing and validation of the integrated system to ensure reliability and performance consistency. • Prepare the final report summarizing project outcomes, challenges faced, solutions implemented, and future directions for potential improvements.
Week 14 & 15	• Buffer period for any unexpected delays or additional tasks. • Finalize project deliverables and ensure all code and documentation are properly organized and submitted.

Proposal#

Abstract#

Timeline#

References#