Generalization of a Machine Learning Classifier of CAN Bus Signals
thesisposted on 01.08.2020, 00:00 by Andrea Tricarico
In the last decades, the automotive field has evolved considerably. Modern cars are composed of a complex network of Electronic Control Units (ECUs). These ECUs are small computing units that elaborate data captured from different sensors and then perform actions throw actuators. They can optimize a variety of different actions performed by the driver (e.g. the fuel injection during acceleration) and activate safety mechanisms. Moreover, in the latest years, manufacturers have introduced new functionalities, like autonomous driving features and the possibility to control the car remotely, which require a complex network of computing units inside the vehicle. All the ECUs are connected with an in-vehicle network which is usually designed following the CAN bus protocol developed by Bosch in the '80s (which is the de-facto standard for the internal networks of modern vehicles). This protocol is particularly suitable to design real-time networks that are simple and cheap. The security against external attacks was not a problem at that time because the vehicles were not accessible from the external world. However, in the last years, more and more cars have introduced features to connect the vehicle to external devices (like the smartphone of the driver) and networks (like the Internet). These changes have considerably increased the attack surface and the lack of security in the CAN bus protocol is becoming a serious problem. Many studies have demonstrated that it is possible, for an attacker, to gain control of some vehicle's functionalities through the injection of malicious packets in the CAN bus and that this can be done even remotely exploiting vulnerabilities of the computing units that are connected to external devices. Unfortunately, manufacturers are still not paying enough attention to the security of the in-vehicles networks and they rely on a security-by-obscurity paradigm (keeping secret the detailed specification about the logic inside the ECUs and the syntax of the packets). Researchers that study the security of modern vehicles and their vulnerabilities have to manually reverse engineer the packets sent from the ECUs to understand their internal logic and find their vulnerabilities. In this work, we propose a new approach to the reverse engineering of CAN bus packets to reduce the effort and time needed for this process. We analyze directly the raw logs that the researchers collect from their vehicles during a driving session. We extract the different signals that are encoded in the recorded sequence of CAN packets and process them. We then use a machine learning classifier based on LSTM networks (a type of recurrent neural network particularly suited for the analysis of time-series) to find some specific physical signals without any prior knowledge on the vehicle architecture and without performing long experiments on the vehicle. In our evaluation phase, we demonstrate that this model can drastically reduce the time needed to reverse engineer messages and signals of a vehicle without its proprietary specification. We then explain how to collect a suitable dataset to train the model and make it able to generalize the knowledge to classify signals of unknown vehicles.