top of page

As Artificial Intelligence (AI) and Machine Learning (ML) technologies advance, the need for methods to test them also increases. Several methods for testing ML and AI systems have recently been developed by the ML/AI and software development communities. These methods assume that the team that is developing the system is also responsible for testing the system, therefore, they have access to the datasets on which the ML models were trained, and the knowledge of the environment in which the system is expected to operate. However, this is not true in certain situations such as in the Department of Defense (DoD) acquisition where the systems are developed by other organizations, due to which, existing methods for Test and Evaluation (T&E) of AI-enabled systems are not adequate for DoD acquisition. To address this gap we propose a multi-fidelity approach for testing and evaluation that consists of (i) a representation of the model space with dimensions along which different fidelities of models can be developed, and (ii) a method to integrate multiple fidelities for continuous T&E of AI-enabled systems. The approach is illustrated using an example of a visual perception system in an autonomous vehicle (AV) use case, where a simulation space across different fidelities is constructed to test how well the system meets the listed requirements. A model space is first identified, in which models are characterized for their cost and performance. A method to generate test plans is then devised to maximize the utility across the span of given system requirements. We illustrate how the proposed approach can be used to develop test combinations that minimize the cost and maximize the utility under a set of system requirements to be tested.

With the broad range of Machine Learning (ML) and Artificial Intelligence (AI)-enabled systems being deployed across industries, there is a need for systematic ways to test these systems. While there is a significant effort from ML and AI researchers to develop new methods for testing ML/AI models, many of these approaches are inadequate for DoD acquisition programs because of the unique challenges introduced by the organizational separation between the entities that develop AI-enabled systems and the government organizations that test them. To address these challenges, the emerging paradigm is to rigorously test these systems throughout the development process and after their deployment. Therefore, to implement testing across the acquisition life cycle, there is a need to use the data and models at various levels of fidelity to develop a cohesive body of evidence that can be used to support the design and execution of efficient test programs for systems acquisition. This paper provides a perspective on what is needed to implement efficient and effective testing for an AI-enabled system based on a literature review and walk-through of an illustrative computer vision example.

bottom of page