The AI of Your ML Products Can Also be an Attack Surface


Even though non-ML fields in computer science are swiftly turning into auxiliary and domain-specific machine learning research fields (e.g. look at the number of papers using or addressing machine learning in this year’s International Conference for Software Engineering), the core research in ML and deep learning moves so fast, that it’s impossible these days for researchers from other fields to keep track of all the new developments. Serious researchers, however, have already started to look into an important, but often overlooked aspect due to the speed of new developments, in machine learning systems - security.

Security, in this article, is probably a catch-all for a lot of related but, theoretically speaking, distinct concepts such as robustness, safety, and adversarial-proof machine learning systems. If we use the artifact-based approach to software engineering, security for machine learning systems should focus on these main artifacts -

  1. Training data
  2. Trained models
  3. Inference endpoints or APIs

Training data, especially when sourced from ELT pipelines and automatically labeled, can be the first source of poisoning if, either meaningful data quality metrics are not set-up and visible (more on this in a later post), or if active-learning is set up where adversarial edge-case inputs are added to the training datasets without correcting the labels first.

Trained models can further add to the attack surface of an ML-system when it is not updated over the course of business use-case serving, and they drift due to the initial assumptions about the independent variables or problem statements not being valid anymore. Moreover, it can be argued that deep models whose outputs cannot be sufficiently explained, can be hard to debug and are a compliance and security liability, in themselves.

Inference endpoints, typically in the form of Rest or GraphQL APIs, may, in addition to being vulnerable to common software vulnerabilities, also be susceptible to being abused by, e.g. training proxy models by attackers or inferring on adversarial inputs.