This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Outlining Traceability: A Principle for Operationalizing Accountability in Computing Systems

Joshua A. Kroll [email protected] Naval Postgraduate SchoolMonterey, CA
(2021)
Abstract.

Accountability is widely understood as a goal for well governed computer systems, and is a sought-after value in many governance contexts. But how can it be achieved? Recent work on standards for governable artificial intelligence systems offers a related principle: traceability. Traceability requires establishing not only how a system worked but how it was created and for what purpose, in a way that explains why a system has particular dynamics or behaviors. It connects records of how the system was constructed and what the system did mechanically to the broader goals of governance, in a way that highlights human understanding of that mechanical operation and the decision processes underlying it. We examine the various ways in which the principle of traceability has been articulated in AI principles and other policy documents from around the world, distill from these a set of requirements on software systems driven by the principle, and systematize the technologies available to meet those requirements. From our map of requirements to supporting tools, techniques, and procedures, we identify gaps and needs separating what traceability requires from the toolbox available for practitioners. This map reframes existing discussions around accountability and transparency, using the principle of traceability to show how, when, and why transparency can be deployed to serve accountability goals and thereby improve the normative fidelity of systems and their development processes.

traceability, accountability, transparency, AI principles, AI ethics
journalyear: 2021copyright: licensedusgovmixedconference: ACM Conference on Fairness, Accountability, and Transparency; March 1–10, 2021; Virtual Event, Canadabooktitle: Conference on Fairness, Accountability, and Transparency (FAccT ’21), March 3–10, 2021, Virtual Event, Canadaprice: 15.00doi: 10.1145/3442188.3445937isbn: 978-1-4503-8309-7/21/03conference: FAccT ’21: The ACM Conference on Fairness, Accountability, and Transparency; March, 2021; Toronto, CA (Virtual)booktitle: FAccT ’21: ACM Conference on Fairness, Accountability, and Transparency, March 2021, Toronto, CA (Virtual)ccs: Software and its engineering Traceabilityccs: Computer systems organization Maintainability and maintenanceccs: Software and its engineering Software version control

1. Introduction

Accountability is a long sought-after value in decision-making systems, especially when those systems are computerized (Jabbra and Dwivedi, 1989; Nissenbaum, 1996; Friedman et al., 1999; Mulgan, 2003; Breaux et al., 2006a; Weitzner et al., 2007; Feigenbaum et al., 2012; Editorial, 2016; Kroll et al., 2017; Raji et al., 2020; Kroll, 2020). It is a multifaceted concept, presenting challenges for operationalization (Wieringa, 2020). Work to date on operationalizing accountability in computing systems focuses on keeping records along the dimensions of time, information, and action (Feigenbaum et al., 2012), on ensuring that misbehavior in a protocol can be attributed to a specific participant (Argyraki et al., 2007; Haeberlen et al., 2007; Küsters et al., 2010; Künnemann et al., 2019), and on demonstrating partial information about the contents of those records to affected people or to oversight entities (Kroll et al., 2017). But, despite work on requirements engineering (Breaux et al., 2006a, 2009), there remains no clear approach to defining what requirements exist for generating and maintaining records, neither regarding what records must contain nor regarding how to store them. Further, there is minimal work describing how records lead to accountability in the sense of responsibility or in the sense of fidelity to values and norms, such as fairness, nondiscrimination, or safety (Kroll, 2020). However, the related principle of traceability has recently been espoused by many organizations as an approach to making software systems robustly transparent in order to facilitate accountability. This article explores the notion of traceability, asking how it might be operationalized in practical systems and how it serves the broader value of accountability.

Traceability refers broadly to the idea that the outputs of a computer system can be understood through the process by which that system was designed and developed (Kroll, 2018). Specifically, traceability requires that transparency about the development process and its goals be tied to outcomes through the auditability of the methods used in both the creation and the operation of the system itself. This includes ensuring the existence and legibility of records related to technology choice, design procedure, development process, operation, data sources, and system documentation. In this way, traceability unifies many desirable goals in the governance of computing systems, relating transparency about system design, construction (e.g., components and data), and operation to the auditability of the system to investigate its properties and to establish responsibility for system outcomes. Tracing how a system functions must go beyond providing a simple, mechanical description of how operations led from input to output or what the system’s dynamics and behaviors are. Instead, traceability demands an answer to the question of why the system works the way it does and what decisions led to that design or operation. In other words, traceability relates the objects of transparency (disclosures about a system or records created within that system) to the goals of accountability (holding the designers, developers, and operators of a computer system responsible for that system’s behaviors and ultimately assessing that the system reflects and upholds desired norms).

In this article, we unpack the concept of traceability analytically, examining several ways the principle of traceability has been articulated in official policy guidance and consider how it might be operationalized in real systems to achieve the lofty goal of connecting the operation of a computer system to normative concerns such as fidelity to policy goals or embodiment of desired values. In addition, we systematize existing technological tools which can help achieve traceability and suggest a set of system requirements implied by the principle and its goals. We find that traceability ties together a number of existing threads of research around the responsible and ethical use of software technology for socially important applications. For example, traceability relates work on accounting for computing systems concretely (Argyraki et al., 2007; Backes et al., 2009; Haeberlen et al., 2007), calls for better governance of computer systems (Citron, 2007, 2008; Mulligan and Bamberger, 2018; Citron and Calo, 2020), demands for reproducibility of software design and construction (Stodden and Miguez, 2014; Stodden et al., 2014; Lamb et al., 2020; Nikitin et al., 2017), work on the security of software supply chains (Ellison et al., 2010), and the burgeoning field of trustworthy artificial intelligence (Brundage et al., 2020; Floridi, 2019).

In another sense, traceability is a human factors problem. For a system to be traceable, key information about the development and operation of the system must be understandable to relevant stakeholders. Thus, traceability relates to questions of whether systems can be adequately understood for their given purpose, including an understanding of their provenance and development and of their operation (Doshi-Velez et al., 2017). We refrain, however, from saying that traceability requires explainability (at least as that concept is understood within computer science as a mechanical description of the mapping between inputs and outputs) — it does not. As noted above, a mechanical tracing of inputs to outputs, even if causal, does not describe a system’s origins or answer questions about why it has particular dynamics or exhibits particular behavior. Establishing appropriate human factors evaluation will depend on an understanding of stakeholders and their needs (Young et al., 2019a; Wolf et al., 2018; Lee et al., 2019a; Lee et al., 2019b, 2020), along with robust explanations for operational behavior that are appropriately causal, contrastive, and selective (Miller, 2019).

Ultimately, we find that traceability provides a grounding for other ethical principles and normative desiderata within computing systems. Traceability is made more easily amenable to measurement and assessment than other principles because it is clear when efforts towards traceability have been undertaken and when those efforts have been successful. This is not true for other commonly espoused but more abstract principles (such as fairness and equitability or even transparency and accountability). Partly, this is because traceability, like transparency, is well understood as an instrumental value that serves as a path to achieving other goals, rather than an end to achieve in itself. That is not to say traceability is not worthy of pursuit, but rather to say that its pursuit serves other ends.

Achieving traceability may be more concrete than achieving other ethical goals such as fairness, but it is by no means an easy task. Although the path to reaching traceable systems is straightforward, few if any systems can claim to have successfully navigated it. We identify gaps in achieving real-world traceability in systems at the level understood within policy documents as a requirement or obligation on the entities bound by those policies. These gaps fit into four basic categories: relevant technologies such as tools for reproducibility or requirements on lifecycle and versioning management for software and data science artifacts have not been adopted; related human factors problems defining what it means for people to understand system provenance and its implication for their goals and tasks have not been solved; there are not yet accepted standards or even shared best practices for test and evaluation of software systems for critical applications in many domains; and there is often insufficient oversight and review of the behavior of computing systems or capacity for disaffected persons to seek redress. Some work has aimed at each of these gaps—traceability systematizes the project of these disparate threads into the operationalization of a single ethical principle for computing systems.

Finally, a note about terminology: in this work, we explicitly avoid the term “artificial intelligence”, which eludes a rigorous definition. Instead, we speak of automation, or the embodiment of tasks in technological artifacts (which may or may not be software-based), and computing systems, which use computational processes, software, and related technologies (e.g., data science and machine learning) to implement automation. The use of the term “systems” here is purposeful – artifacts on their own (such as models or programs) cannot exhibit traceability, which is a property of a tool in a context of use. We can model this by considering traceability as a property of a system, a set of elements which interact to produce aggregated behavior by virtue of those interactions (Mitchell, 2009; Martin Jr et al., 2020). Although a tool such as a piece of software can support traceability, because traceability relates the specifics of the tool’s design to the effects of its use, the tool cannot on its own be said to be traceable. Traceability is a functional property of the tool in use, and is poorly defined without this context.

2. Adoption of Traceability

Traceability has been adopted as an explicit principle either by organizations as a principle for the responsible use of technology or as policy guidance in official documents. For example, the United States President’s Executive Order 13960 (Executive Office of the President of the United States, 2020) gives nine principles for the development of “trustworthy” AI requires that agencies of the U.S. Federal Government must make AI systems “Responsible and Traceable” when “designing, developing, acquiring, and using” AI. Specifically, that principle states:

Responsible and traceable. Agencies shall ensure that human roles and responsibilities are clearly defined, understood, and appropriately assigned for the design, development, acquisition, and use of AI. Agencies shall ensure that AI is used in a manner consistent with these Principles and the purposes for which each use of AI is intended. The design, development, acquisition, and use of AI, as well as relevant inputs and outputs of particular AI applications, should be well documented and traceable, as appropriate and to the extent practicable.

The order is unfortunately light on details, deferring these to reports demanded of agencies in the weeks following its release. However, we see that traceability is related to the documentation of development and acquisition of “AI” (a term the order also leaves undefined).

Almost a year prior to this government-wide policy directive, the United States Department of Defense adopted traceability as one of its five “AI Ethics Principles” (Office of the Secretary of Defense, 2020). Specifically, department guidance states the principle:

Traceable. The Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.

This language originates in a report from a Federal Advisory Committee, the Defense Innovation Board (Defense Innovation Board, 2019). That study recommends as part of traceability various improvements in software development discipline, including “simulation environments, modeling, automated testing, and validation tools” but also improvements to design methodology and assurance that relevant stakeholders are apprised of development progress. In addition, traceability is operationalized during system deployment through a combination of online auditing and careful testing, possibly in simulated environments. The principle of traceability is also referenced in documents from the U.S. National Security Commission on Artificial Intelligence, which adopts the Defense Department language and approach (National Security Commission on Artificial Intelligence, 2020).

A similar principle that AI systems have “Documentation of Purpose, Parameters, Limitations, and Design Outcomes” can also be found in the United States Intelligence Community’s “AI Ethics Framework” (United States Office of the Director of National Intelligence, 2020). This principle goes further, calling for documentation stored in a way that is “accessible to all potential consumers” of the technology as well as “how to verify and validate” it. This principle fits it a larger framework that also demands “Accounting for Builds, Versions, and Evolutions of an AI” as well as documentation of “the test methodology, results, and changes made based on the test results”. Overall, although the IC framework does not explicitly use the term “traceability”, it clearly espouses the concepts this term signifies in other policy documents.

The U.S. Federal Data Strategy echoes language from an earlier Executive Order 13859 (Executive Office of the President of the United States, 2019), “Maintaining American Leadership in Artificial Intelligence”, which calls for U.S. government agencies to “Enhance access to high-quality and fully traceable Federal data, models, and computing resources to increase the value of such resources for AI R&D, while maintaining safety, security, privacy, and confidentiality protections consistent with applicable laws and policies [emphasis added].” This meaning is expanded in the Federal Data Strategy “2020 Action Plan” (Leveraging Data as a Strategic Asset Cross-Agency Priority Team, 2020) to cover:

  • Expanding access to government data and enhancing its quality;

  • Improving guidance around maintaining inventory of data and models of that data;

  • Developing standard metadata and formats for identified assets to facilitate a government-wide data inventory; and

  • Establishing pilot projects to demonstrate this traceability.

Here, traceability is focused on the provenance of software systems and the decisions made during their creation. However, unlike transparency, which is often framed as a burden placed on system creators and controllers, traceability is described as an enabling value, providing a route to new and more capable systems and a way to tie access to data and other resources to the provenance of systems.

Traceability is by no means limited to U.S. policy documents, however. The E.U. High Level Expert Group’s “Ethics Guidelines for Trustworthy AI” (European Commission Independent High-Level Expert Group on Artificial Intelligence, 2020) calls for traceability as a part of its broader theme of transparency, saying:

The data sets and the processes that yield the AI system’s decision […] should be documented to the best possible standard to allow for traceability and an increase in transparency. This also applies to the decisions made by the AI system. This enables identification of the reasons why an AI-decision was erroneous which, in turn, could help prevent future mistakes. Traceability facilitates auditability as well as explainability.

Again, we see that traceability is tied explicitly to improvements in the development process, such as documenting design methodology, but also to improvements in test and validation as well as examining outcomes driven by the system.

Traceability also appears in international policy guidance. For example, the Organization for Economic Cooperation and Development (OECD) states in its “Recommendation of the Council on Artificial Intelligence” that AI systems should have “Robustness, security, and safety” and “[t]o this end, AI actors should ensure traceability, including in relation to datasets, processes and decisions made during the AI system lifecycle, to enable analysis of the AI system’s outcomes and responses to inquiry, appropriate to the context and consistent with the state of art” (Organization for Economic Cooperation and Development, 2019). In this formulation, traceability is explicitly called out for its information security value, and it is also made clear that operationalizing traceability must be done in a way that is “appropriate to the context” and which “enable[s] analysis of the […] outcome”. However, we see again a focus on making clear the reasons for a system’s design and the origins of its components as well as the tools and datasets those components rely on.

Beyond policy documents from major Western superpowers and related transnational coordinating institutions, traceability also appears in China’s “Governance Principles” for “Responsible AI” (National New Generation Artificial Intelligence Governance Expert Committee (Ministry of Science and Technology convening), 2019). Here it comes in somewhat different guise, attached to a principle that AI must be “Secure/Safe and controllable”, although this principle also addresses issues of transparency and provenance (here described in terms of “tamper-resistance”) as the above principles do. Of particular interest is that the document calls not for traceability as a requirement but rather says that “AI systems should […] gradually achieve auditability, supervisability, traceability, and trustworthiness.” These principles should be viewed in the context of the earlier Beijing AI Principles, which similarly aspire to traceability without claiming it as a requirement. Thus, non-Western conceptions of traceability are similar in substance if perhaps different in emphasis, calling for “various parties related to AI development” to “form AI security assessment and management capabilities” while describing transparency, accountability, and traceability all in aspirational terms rather than as requirements.

Across a variety of global policy documents, then, we see that traceability has emerged as a key requirement for the responsible use of software systems. This property entails systems where the design methodology, underlying data sources, and problem definitions are clearly documented and released to stakeholders (a kind of structured transparency of the system’s structure and development). Additionally, traceability requires connecting this transparency to outcomes and behaviors of the system, encompassing auditability of the system-in-operation as well as the testability of the system during both development and operation. Further, traceability seeks to relate disclosed information to the problem of whom to hold responsible for these behaviors in cases both of failure and success, providing a link between transparency and disclosure of system provenance and determinations of accountability (Kroll, 2020; Wieringa, 2020). An expansive requirement, traceability lies at the core of system hazard mitigation and risk management decisions by system controllers.

2.1. Values Served by Traceability

To understand the requirements demanded by this conception of traceability, we must explore the goals articulated by the documents which espouse it. Traceability is an expansive concept, serving many values both concrete and abstract.

Although traceability is often described as a kind of transparency, it does not speak directly to the question of what systems exist or what the scope of their operations and outputs is, a classical goal of transparency requirements in policy (Gellman, 2017; Bruening and Kroll, 2019). Instead, as noted in Section 1, traceability ties the reasons a system works as it does to its actual operation, supporting audit and interrogation into the structure and function of a system, thereby serving to operationalize accountability. However, in requiring structured disclosure about systems, traceability does serve to enhance transparency where it exists, providing a link between transparency’s instrumental operation and the values it serves and showing to what end that transparency is useful. Additionally, traceability provides a path to understanding a system’s integrity in contexts where a system’s supply chain may be in doubt both for reasons of complexity or for reasons of security. A system which is robustly traceable is less likely to suffer manipulation by an adversary, as the provenance of the system’s components and their arrangement is made plain to affected stakeholders. Thus, if an adversary were to substitute or modify components of or inputs to the system for some or all decisions, robust traceability would require that this manipulation become visible to affected parties. And in systems where the origin of a decision may be complex, robust traceability requires that a mechanistic justification for that decision be producible on demand, supporting both system governance and the contestability of system outputs (Mulligan et al., 2019a). Relatedly, a traceable system must be understandable to the humans intended to trace its operation, so traceability encompasses aspects of explainability and is at least partially a human factors question. Finally, traceability serves to make plain the reasons behind failures, showing where investigators and analysts can interrogate a system once an undesired behavior occurs and relating design choices and operational facts to specific outcomes.

3. Requirements for Traceability

In this section, we expand on the notion of traceability as adopted in policy to understand what it requires in technology. Our explication of requirements is driven by the conceptualization of traceability and the goals it serves in the policy documents described in Section 2. These requirements should be viewed as a lower bound: traceability requires doing these at a minimum, but may require other activities depending on the application context.

3.1. Design Transparency

The primary demand of traceability is that the design choices made by system designers be made available to system stakeholders affected by the system’s operation. This could be accomplished via transparency such as making system source documentation, code, or data available (Citron, 2008) or through more abstracted disclosures such as impact assessments (Reisman et al., 2018; Selbst, 2017).

Many proposed approaches would provide standardized disclosures in support of transparency or traceability (Partnership on AI, 2019). These include data sheets (Gebru et al., 2018), fact sheets (Arnold et al., 2019), data statements (Bender and Friedman, 2018), data nutrition labels (Holland et al., 2018), model cards (Mitchell et al., 2019), and other standardized disclosure formats. But disclosure alone does not provide traceability, and while traceability requires disclosure, they must not be equated. Indeed, if traceability is more akin to an operationalization of accountability than of transparency, it may be said that while such tools improve traceability, they do so only to the extent that they reflect and enable a broader process of assessment, test, and evaluation.

Design proceeds from requirements. Because traceability asks that stakeholders be able to understand why design decisions were taken, traceability also requires that requirements be disclosed as part of transparency about design. Requirements reflect the way a particular system’s goal was articulated and approached by designers, the critical aspect of problem formulation not otherwise subject to investigation by persons affected by a computing system and often invisible in even required disclosures of artifacts like code, data, or documentation (Passi and Barocas, 2019). When requirements are not specified formally, they should at least be described in an unambiguous natural language form as relied on during development. Thus, systems that result from an exploratory process (e.g., many models derived from standard data science practices) must be augmented with descriptions of the governance attendant to that exploration, which controls why particular avenues were or were not explored and how the fruits of that exploration were considered to be worthy of application in a particular context.

Transparency of design not only provides a window into how systems work, it provides visibility into where design choices were taken that have significant impact. For example, software is used to automate the work of lab technicians who analyze forensic evidence, but it is unclear how determinations made by that software are made, a problem which has spawned much litigation (Kwong, 2017). One tool, New York City’s Forensic Statistical Tool, has essentially fallen out of use after journalists raised questions around its accuracy, leading to a court-ordered release of its source code and subsequent public scrutiny (Kirchner, 2017). Many commercial offerings remain trade secrets, despite being used in criminal proceedings regularly. As a hypothetical example, imagine software which measures a parameter of a forensic sample and performs differing analysis based on whether the measured value was above or below a given threshold. The very existence of this threshold and the process by which it was determined may be unknown without sufficient traceability. And without that understanding, the suitability of the analysis, the thresholding that triggered it, nor the value of that threshold can be challenged or reviewed. Thus, sufficient traceability means raising decisions about how parameters internal to the design of a system are set so they are legible to those outside the design process.

3.1.1. Testing

In some descriptions, traceability is established by the ability to test or audit for particular outcomes. Testing of software systems is an entire discipline with a long history (Gelperin and Hetzel, 1988; Myers et al., 1979), but it can help support external interrogations of a system’s behavior under specific conditions (Desai and Kroll, 2018; Kroll, 2018). Traceability requires developers and those involved in testing and system evaluation to minimize the gap between what is known to system developers through their knowledge of design decisions and actualization of those decisions through a test and evaluation regime and what is known to outside stakeholders who do not see the development or the output of developmental testing. Because the halting problem prevents testing external to development from being formally sound (Rice, 1953), minimizing this gap necessarily requires disclosing information about the design as well as information about the system’s performance under test.

Thus, traceable systems must have and release information about robust test and evaluation plans. Further, such systems must be designed to be testable during development and operation, and ideally to be testable by outsiders as well as developers. This is driven by the close relationship between traceability and auditability.

3.2. Reproducibility

Related to the requirement to minimize the gap between the view of developers and other insiders and stakeholders outside the process of creating or operating the system is the issue of reproducing a system’s behavior so it can be evaluated for correctness under the stated set of requirements. If a system’s behavior cannot be reproduced by a developer, it cannot be made plain to an outside stakeholder. A practical issue is that even disclosures of source code and data (or the use of fully open-source code and data) cannot be related to compiled software or trained models unless that compilation or training can be precisely reproduced, which is possible using an emerging set of tools and software practices (Lamb et al., 2020). More careful reasoning about compilation can enable efficient updating (Nikitin et al., 2017) or make verification of software function straightforward from transparency (Ka-Ping Yee, 2007).

More broadly, it should be possible to reproduce even abstract conclusions from data or any reported experimental results that claim scientific authority (Stodden and Miguez, 2014; Stodden et al., 2014). Without this, an external stakeholder aiming to verify why and how a system was designed a particular way or what a system did will be unable to.

A related risk is that loss or modification of development information (code, data, built components) will lead to a situation where the system-as-deployed no longer relates to supporting information that might be disclosed, possibly subsequently. Thus, robust reproducibility of both artifacts and conclusions must be a requirement for any traceable system.

3.3. Operational Recordkeeping

Beyond traceability at the design stage, traceability is understood to apply to system operations. Thus, traceable systems must keep records of their behaviors. But what records must they keep, how and how long must those records be maintained or retained, and what must those records show? These questions are highly contextual, and mapping the associated concrete requirements demands the engagement of subject-matter experts able to understand both the application area and the technology intervening on it. Abstractly the requirement must be that records support sufficient inquiry by external stakeholders or oversight entities (Kroll et al., 2017; Wieringa, 2020; Kroll, 2020).

A further problem is relating the contents of records kept during operation to a system’s observed behavior. For systems which are fully reproducible, this is possible in principle through transparency or through the intervention of trusted oversight entities which can receive disclosures under seal (e.g., law enforcement agencies, courts, or adversarial parties in litigation). However, it is possible to use tools from cryptography to bind the contents of records to the computations performed on those records to make this relation both more efficient to establish and possible to establish more broadly even when portions of the retained information must remain secret or private (Kroll, 2015). For example, cryptocurrencies such as ZeroCash contain protocols to convince the receiver of a payment that the digital currency the payment represents has not previously been spent, without revealing the payment’s origin or full history, simulating the privacy properties of physical cash (Sasson et al., 2014).

An alternative model is to vest recordkeeping and data access in a trusted third party such as an oversight entity or an independent data trust (Young et al., 2019b). Such an entity can make assertions about records and determine the equities of their release on a case-specific basis.

Finally, the maintenance of records of routine system activity along with records of failures and near-miss incidents provides a foundation upon which an understanding of the system’s patterns of function can be built, enabling the development of interventions that improve outcomes, safety, and overall system function and reliability (Cook, 1998; Barach and Small, 2000; Leveson, 2016).

3.4. Human Understanding

As noted in Section 2, the principle of traceability is often justified by the need for humans to understand the decision rules embedded in the system. Principles call for “appropriate understanding of the technology” and transparency which “enables identification of the reasons an AI-decision was erroneous”. Achieving these abstract principles requires not only transparency about system function, but careful system engineering to ensure that systems do not confuse their human operators (Leveson et al., 1997), obscure paths of accountability for accidents where affordances in the design of technology lead human operators to make judgments that lead to accidents (Elish and Hwang, 2015; Elish, 2019), or confuse the loci of agency and power within a system (Bainbridge, 1983).

This requires thinking beyond the current focus on explainability and causality in AI (Halpern and Pearl, 2005b; Doshi-Velez et al., 2017; Doshi-Velez and Kim, 2017; Guidotti et al., 2018; Molnar, 2018) and a push toward tying explanations and disclosures to shifts in power (Miller, 2019; Selbst and Powles, 2017; Edwards and Veale, 2017b, a; Selbst and Barocas, 2018; Selbst et al., 2019; Kalluri, 2020). Such thinking has been a robust component of research in human factors (Salvendy, 2012; Sendak et al., 2020), and developing appropriate strategies for establishing when and why and how much humans understand technology remains an important research domain necessary to enable robust human-machine collaboration (Abdul et al., 2018; Rader et al., 2018). Traceability thus requires that systems be transparent not just about their function but about whether that function is appropriately communicated to operators and other affected humans. This is also important in the context of failure analysis, as many accidents result from inappropriate modeling of machines by human operators, of humans by machines and their designers, or at the point of handoff between human and machine (Bainbridge, 1983; Mulligan and Nissenbaum, 2020)

3.5. Auditability

Another component of the traceability principles as stated is that they support the auditability of systems both before they are fielded and during operation. This has several meanings.

First, systems must maintain sufficient records during development and operation that their creation can be reliably established and reproduced. This requirement is largely encapsulated in the reproducibility and operational/developmental recordkeeping requirements listed above.

Beyond requiring that evidence of how a system operated be established, auditability requires that this evidence be amenable to review and critique of the system’s operation, as well as comparison of the fidelity of that evidence to reality. Such assessments can be qualitative or quantitative in nature, and could happen during development or once a system is fielded. In the accounting literature, an audit compares recorded evidence (“accounting” for a system’s behavior) to reality to determine whether that evidence is reliable; alternatively, an assessment is the ascription of some value to that evidence or a judgment about the meaning of that evidence (Espeland and Vannebo, 2007). Scholarly critiques have noted that understanding the ethical and social implications of computing systems as simple as sorting (Sandvig, 2015) and as complex as news curation (Sandvig et al., 2014) require assessment, but describe this requirement as one of auditing.

Although the term “audit” is widely used in policy and principles documents to describe what is needed (and performed) for computing systems (Chin and Ozsoyoglu, 1982; Blocki et al., 2013; Hannak et al., 2013, [n.d.]; Sandvig et al., 2014; Bashir et al., 2016; Chen and Wilson, 2017; Kearns et al., 2017; Kim, 2017; Amit Elazari Bar On, 2018; Reyes et al., 2018; Chen et al., 2018; Raji et al., 2020), the task referenced by the traceability principles is more in line with assessment (Jagadeesan et al., 2009). Likely, this is due to the history in computer security of the use of “audit methods” to assess the security state of a system (Lunt, 1988; Habra et al., 1992; Bishop, 1995; Colbert and Bowen, 1996), to control disclosure of sensitive information in databases (Helman and Liepins, 1993; Kenthapadi et al., 2005; Nabar et al., 2006; Dwork et al., 2006; Nabar et al., 2008), and to establish security policy compliance of the behavior of system infrastructure (Haeberlen et al., 2007; Haeberlen et al., 2010; Haeberlen, 2010). Practical applications often proceed by applying standard assessment models that use well defined checklists and controls to achieve desired outcomes (National Institute of Standards and Technology, 2018, 2020). Other work looks to connect “audit” data to sociotechnical goals like the correct operation of an election to establish an assessment of the entire sociotechnical system (Waters et al., 2004; Adida, 2008; Hall, 2010). However, some practitioners have criticized the gap between audit (in a compliance sense) and the assessment of security derived from it (Bellovin, 2006; Clark, 2018). Still others have begun to seek data on the validity of audit approaches for building assessable metrics of abstract properties such as security (de Castro et al., 2020).

Scholars from the social sciences have been critical of this over-quantification of auditing, and the collapse of auditing and assessment into purely quantitative methods (Carruthers and Espeland, 1991; Porter, 1992; Espeland and Vannebo, 2007). It is notable that widely used governmental auditing standards allow for both quantitative methods and qualitative assessment (Government Accountability Office, 2018). The collapse of the word “audit” onto system assessments in this context is likely due to the naming of a particular discrimination detection methodology, the “audit study”, in which similarly qualified test subjects differing only in (perceived) race or gender or other protected characteristic under study are subjected to the same process to test for facial evidence of discrimination in the process (Siegelman and Heckman, 1993), an approach which has been and must be updated to include studies of the behavior of automated systems (Ajunwa, 2021; Consumer Financial Protection Bureau, 2014).

Impact assessments are also proffered as a requirement for the governance of computing systems, and such assessments could guide requirements for auditability of the system during development or after it is fielded (Selbst, 2017; Reisman et al., 2018). As with other transparency tools, impact assessments are valuable insofar as they enable traceability, and considering the requirements of traceability can help establish the scope of appropriate impact assessment.

Finally, the extent to which a system can be audited effectively is a question both of design and governance. The system must have interfaces that enable audit and also policies that allow effective uses of those interfaces. In bureaucracies, this is driven by freedom-of-information laws in addition to standards that define the sorts of engagements with auditors the system expects. To assess whether something like an online advertising system admits a sufficient level of auditing, we need the system to report out information auditors want, for the system’s controller to allow (or to be compelled) that information to be given to auditors, and for there to be standards around how that auditing will take place. This is to say nothing of the problem of finding a sufficient number of skilled auditors who can apply such standards, a problem which exists even in domains where standards are clear.

4. Supporting Traceability

In order to meet the requirements laid out in Section 3 and to support traceability in built systems, we need technical, organizational, and policy-level tools. Many such tools exist, but many more do not. Rarely are these tools brought together in an assemblage that resembles anything like the traceability principles espoused by many organizations and summarized in Section 2 or the requirements unpacked from these principles in Section 3. And yet, building robustly accountable computing systems requires embodying this principle in its full power.

In this section, we summarize known tools and relate their capabilities and limitations to the requirements laid out in Section 3. Our focus is primarily on technical tools here, though as noted many nontechnical tools are also necessary. It is likely this cross-functional nature of operationalizing traceability (and, indeed, any ethical principle) in technology that makes doing so such a challenge.

4.1. Development Methodology

The history of system development, especially software system development, is littered with failures (Ewusi-Mensah, 2003)—failures of the system to meet its stated requirements, failures of the system to function once delivered, or failures of the system to be developed to the point of operability at all. These risks have long been recognized—Brooks famously pointed out in 1975, based on experience developing IBM’s System 360 mainframe operating system, that “adding manpower to a late software project makes it later.” (Brooks Jr, 1995).

Early development methodologies flowed linearly from requirements to specifications to deliverables—a “waterfall” process, where the system’s components flow along a defined path to delivery (Petersen et al., 2009). But all too often, diversions along this path or changes in the environment by the time a system is delivered mean that the system does not meet the stated requirements upon delivery (Brooks Jr, 1995). This has led to the development of various “iterative” modalities of software delivery, such as the principle of “iterative enhancement” (Basil and Turner, 1975), the Agile movement (Beck et al., [n.d.]), or test-driven development (Beck, 2003). In all cases, the goal is to feed back outputs of the development process into future rounds of development for continuous, incremental improvement and learning. The result is meant to be a product which is closer in its delivered form to envisaged requirements.

However, although iterative development often leads to better outcomes faster, issues can arise from the way the problem to be solved is defined initially (Argyris, 1977) and these methods may be based on unfounded assumptions in many cases (Turk et al., 2005). These methods have also been critiqued for attending too much to problems and bugs identified early in a project’s history (as the visibility and tracking of issues demands their rectification and continuous improvement) as well as for creating a model unsuitable for high-assurance tasks (as the project is under continuous revision, there is not an obvious point at which to verify or validate its functionality and these assessments can become problematic in a continuous-assessment setting) (Boehm, 2002; McBreen and Foreword By-Beck, 2002).

Yet there has been some progress towards traceability in software development. Discipline around version control of digital artifacts like code, data sets, and model products has been enhanced by the adoption of versioning-focused methodologies (Bass et al., 2015; Carter and Sholler, 2016) and the introduction of powerful, general digital object versioning systems (Loeliger and McCullough, 2012). Although the problem of keeping digital archives of any sort is a major challenge (Hedstrom, 1997; Rothenberg, 1999), standards exist for creating trustworthy repositories and auditing their reliability (ISO, 2012), especially for high-assurance applications like space data systems (Consultative Committee for Space Data Systems, 2011). Coupled with careful testing of desired invariants and augmented by disclosure of decision points in the development methodology, these tools can be extended to methods for trustworthy system development (Brundage et al., 2020).

4.2. Reproducibility and Provenance

Many tools exist to support the reproducibility of created software artifacts and models of data sets (Lamb et al., 2020; Boettiger, 2015). This is critical to scientific research (Stodden and Miguez, 2014; Stodden et al., 2014; Geiger and Halfaker, 2017), and to the practice of software development and data science in industry both for reasons of maintaining clarity of system function (Bass et al., 2015) and security (Multistakeholder Process on Software Component Transparency, 2019). Despite the importance of this function, software projects and especially data science and machine learning products are rarely able to precisely reproduce their work to a sufficient level to provide reliability, let alone traceability (Warden, 2018).

Another relevant genre of tooling and research effort concerns the maintenance of system-level metadata which can be used to establish data and artifact provenance (Muniswamy-Reddy et al., 2006; Moreau et al., 2008; Buneman et al., 2001; Muniswamy-Reddy et al., 2010; Herschel et al., 2017; Pérez et al., 2018). Work in this area out of the operating systems and database research communities has led to efficient systems that can be transparently layered into existing scientific workflows (McPhillips et al., 2015; Ludäscher, 2016) or standard infrastructure components (Muniswamy-Reddy et al., 2010).

These tools support the requirements of reproducibility and can be used along with development methodology and discipline to support requirements about documenting and substantiating the origin of data and components. Unlike other aspects of traceability, this is an area with well developed, field-tested tooling, and solutions which can be brought immediately into practice for clear gains, both for traceability and for broader benefits (e.g., the management of these artifacts; capabilities to share results within communities of practice or research).

4.3. Design and Structure

The stated principles of traceability tie system records to system understanding and ultimately to the way that systems embody values. There is a long history of research into the way design reflects values, and the analytical framework of this line of research gives a foundational framework for making both of these links (Moor, 1985; Star and Ruhleder, 1994; Friedman, 1996; Star and Ruhleder, 1996; Nissenbaum, 2001; Flanagan et al., 2005; Nissenbaum, 2005; Friedman et al., 2008; Le Dantec et al., 2009; Gürses et al., 2011; Knobel and Bowker, 2011; Irani and Silberman, 2014; Jackson et al., 2014; Shilton et al., 2014; JafariNaimi et al., 2015; Steinhardt, 2016; Wagenknecht et al., 2016; Ziewitz, 2017; Mulligan and Bamberger, 2018; Shilton, 2018; Zhu et al., 2018; Young et al., 2019a). Specifically, this line of work notes that systems must be designed at a fundamental level to embody values and that it is not enough to add abstraction layers or minor changes at the end of a design process divorced from values. Tools such as analytic frameworks for operationalizing values (Burrell, 2016; Mulligan et al., 2016, 2019b), techniques for explicitly capturing certain concrete operationalizations of contested values in systems (Albarghouthi et al., 2016; Bonchi et al., 2016; Chouldechova, 2017; Albarghouthi and Vinitsky, 2019; Beutel et al., 2019; Wong and Mulligan, 2019), and mechanisms for observing the ultimate value-sensitivity of the resulting system (Lum and Isaac, 2016; Lipton et al., 2017; Buolamwini and Gebru, 2018; Raghavan et al., 2020) all arise from this rich and deep vein of work and provide approaches to capturing the value of traceability in real systems. We hope the requirements of Section 3 can serve as a similar analytic framework for the principle of traceability, and that the techniques summarized in this section can be useful in making such requirements actionable in real systems.

One well-studied value that can be operationalized through design is privacy. Privacy-by-design is a principle recognized both by technologists and lawyers, academics and practitioners (Cavoukian et al., 2009; Cavoukian, 2011; Computing Community Consortium, 2015c, b, a). The exact operationalization of privacy can be a contested, or even an essentially contested point (Mulligan et al., 2016). Many systems purport to protect privacy through such design approaches, generally operationalizing privacy as the restriction of some information from some parties in the system (Camenisch et al., 2006; Barth et al., 2007; Balasch et al., 2010). Some scholars have argued that privacy-by-design leads toward a narrow view of what privacy is and away from issues of power and control of information (Dwork and Mulligan, 2013). Legal obligations for privacy-by-design may hypothetically limit the number and severity of privacy incidents in practical systems (Rubinstein and Good, 2013). Empirical studies have demonstrated that attitudes toward privacy and compliance with data protection law vary drastically by sector and by country (Bamberger and Mulligan, 2010, 2015).

Of course, design means little if the ultimate implementation does not comport with the design. Translating values-sensitive design to values-embodying implementation remains a key open challenge, as does assuring a relationship between specification and implementation. Closing this specification-implementation gap is a core competency of the traditional computer science field of software verification (Boehm, 1984; Appel, 2011). A related problem is that of validation, ensuring that a given specification captures the intended value. The discipline of requirements engineering has developed formal and semi-formal approaches to the latter problem (Breaux et al., 2006b; Gordon and Breaux, 2013).

Capturing principles in the design of systems that include software and computers is not an idea that originated with values. Indeed, some of the earliest large computing systems derived their designs from core principles, such as the Internet’s “end-to-end” principle (Gillespie, 2006). This focus on design principles even under an imagined “value free” ideal remains to this day in the design of Internet platforms such as ISPs, social media and forum websites, and hosting providers (Gillespie, 2010).

Traceability demands that the design of systems be visible to affected stakeholders. Another approach to legible design, beyond disclosure of design desiderata, is the use of transparent and inclusive design processes, a topic which has received much attention in the technical literature recently, but which has a longer history in political science (Rosner and Ames, 2014; Wolf et al., 2018; Young et al., 2019b; Young et al., 2019a; Lee et al., 2019b; Lee et al., 2019a; Lee et al., 2020). Such community-driven design can lead to better overall decision-making affected stakeholders also find more acceptable, even given tradeoffs (Chouldechova et al., 2018). Ethnographic work has found substantially similar attitudes toward decision-making which is fully embodied in a system across disparate domains (Christin, 2017). Survey-driven human factors work has similarly discovered that participatory design can lead to more acceptance of system-driven outcomes (Binns et al., 2018). Design toolkits which combine participation in design with explicit consideration of equity and also drive toward auditability provide concrete tools for advancing the state of play in values-oriented system development, realizing the goal of values-driven design in practical system implementation and fielding (Katell et al., 2019). Research questions remain, however, around the efficacy of such tools in actually preventing harms such as discrimination or the amassing of technologically centralized and entrenched power. One approach under active scrutiny is the idea of designing systems for contestability, both the ability to appeal incorrect decisions and the ability to understand the mechanism of decisions well enough to determine when decision guidance should be disregarded or overridden (Mulligan et al., 2019a).

Finally, the design of systems must take into account the role of humans within those systems. Systems include more than just technical components and rules-driven automation (Desai and Kroll, 2018), but also decisions by human decision-makers who can exercise discretion. Assuring that discretion within systems is exercised transparently and at appropriate times requires careful attention to design, including requirements on when humans create records or review existing records for accuracy or policy compliance (Ellison, 2007). Humans may at times prefer decisions which are driven by discretion or negotiation rather than purely algorithmic decisions (Lee and Baykal, 2017). In general, automation increases the capabilities of humans by taking over certain tasks and freeing human efforts for strategic optimization and additional productive work. But in doing so, automation also robs involved humans of situational awareness and expertise in the detailed operation of the system. Thus, humans coupled to automation are both more critical to the outcomes driven by the system and less able to control those outcomes (Bainbridge, 1983).

4.4. Structured Logs

As traceability requires recordkeeping both at the development stage and the operational stage, we must consider the classic tradeoff in computer science between direct policy enforcement via the design of and compliance with specifications and the detection of policy violations via recordkeeping and review (Breaux et al., 2006a; Jagadeesan et al., 2009; Pearson and Charlesworth, 2009; Küsters et al., 2010; Datta, 2014). Several authors have proposed tools for verified computation, which provides a proof of its execution that can be reviewed later to ensure particular computations took place as claimed (Ben-Sasson et al., 2012, 2013; Braun et al., 2013; Vu et al., 2013; Ben-Sasson et al., 2014, 2015). Others have also proposed extending such technology to create structured logs in a commit-and-prove style to provide explicit technological demonstration to a skeptical stakeholder or oversight entity of the values-appropriateness of a computation or its procedural fairness and regularity (Kroll, 2015; Kroll et al., 2017). Structured logging with disclosable side-conditions is a well-studied problem in computing, with associated properties ranging from simple integrity checking (Merkle, 1987; Crosby and Wallach, 2009) to full real-world systems for proving the nonexistence of security-critical digital objects such as the certificates used for Internet security (Laurie et al., 2013). A related line of work is the vast field of tracing in operating systems and distributed systems (Mace et al., 2015).

We observed above that traceability is often conceptualized as a transparency principle, but it is more akin to a principle for enabling accountability. Many authors have called explicitly for the use of structured recordkeeping tools to enhance accountability and have built the conceptual framework tying such recordkeeping to improved accountability (Nissenbaum, 1996; Editorial, 2016; Wieringa, 2020).

Empirical studies of trust in data science projects have shown that the sort of recordkeeping called for under traceability requirements may enhance the credibility of data science work products both within and beyond the teams creating them (Passi and Jackson, 2018). Along with traditional software engineering and project management discipline (lifecycles, versioning tools), tools for reproducibility and recordkeeping comprise a set of currently available techniques with the capacity to improve traceability in real-world applications but which are currently underused. Wider adoption of these techniques could improve not only traceability, but thereby the broader governance of computing systems generally.

4.5. Structured Transparency Disclosures

The sort of developmental and operational recordkeeping imagined in the prior section is often shorthanded into demands for better system documentation, often in a structured format such as a checklist (Gawande, 2009; Gebru et al., 2018; Mitchell et al., 2019; Arnold et al., 2019; Bender and Friedman, 2018; Holland et al., 2018; Partnership on AI, 2019). Although documentation of computing systems does support traceability, there is little in the way of research establishing its effectiveness at communicating within organizations or actually mitigating harms. Further, there is often a substantial practical gap between the state of the system-as-documented and the state of the system-as-realized. It is important to remember that documentation is generated by a healthy governance process, but as a side effect. Documentation does not, on its own, engender complete or useful governance of systems.

This gap is highlighted in the operationalization of data protection law. To maintain “adequacy” and enable legal transfers of data between the European Union, where the General Data Protection Regulation applies to all data processing, and the United States, which has a considerably more flexible legal regime for data protection based on sector-specific regulation, the Privacy Shield framework was set up to replace a prior Safe Harbor agreement (Bruening and Kroll, 2019). However, both structures have now been found to be insufficient to protect fundamental rights guaranteed under EU law. Despite this, analysis done when these frameworks were in force examines transparency and governance rights in the US sector-specific approach (e.g., in laws like the Fair Credit Reporting Act and the Equal Credit Opportunity Act) and finds them as powerful if not more powerful than the analogous rights within the GDPR (Bodea et al., 2018a, b).

5. Gaps and Needs

Although many technologies exist which support traceability and many organizations aspire to embody this principle in practice or even claim to achieve it at a sufficient level, substantial gaps exist at the basic technical level. This is to say nothing of gaps in organizational governance or in the remainder of the sociotechnical control structure—no technical control can be more capable than the implementing organization’s willingness to take action based on it (Vaughan, 1996; Leveson, 2016).

5.0.1. Adoption

As noted in a few places in Section 4, there exist tools and technologies that could be deployed immediately in service of better traceability, and yet they are not used. Understanding why this is will be critical to advancing this principle. And because this principle is straightforwardly recognizable and potentially amenable to assessment and measurement more directly than abstract goals such as fairness or equitability, understanding the barriers to operationalizing this principle bears on the feasibility and success of the entire values-in-design project.

Of specific note is the lack of methodologies and standards around the use of tools for the reproducibility of data science products or machine learning models. This is despite substantial progress having been made at integrating such tools into common modeling frameworks—even with the capability to checkpoint models or reset the seeds of pseudorandom generators, this is rarely done in practice and the best guidance exists in the form of folkloric knowledge scattered across blogs, answers on question answering sites, and forum posts. This despite the fact that the frameworks could be designed to afford reproducibility as a default, or at least an easily and straightforwardly achievable property of implementations. Yet tools like data science notebooks lean towards interactive environments in service of the “democratization” of powerful tools, but in ways that make it hard to establish what, precisely, resultant insights actually mean (Jacobs and Wallach, 2021). And even rectifying this dismal situation would not address the problem of framework dependencies and the potential for numerical instability, bugs in core algorithms, or differences in behavior sufficient to change the output of models in practice (Selsam et al., 2017).

5.0.2. Human Factors and Understanding

Traceability claims explicitly to act in service of better understanding of the decision-making processes at work in an automated system. Yet while there is a massive literature on machine learning explainability (Halpern and Pearl, 2005a; Doshi-Velez et al., 2017; Doshi-Velez and Kim, 2017; Edwards and Veale, 2017b; Guidotti et al., 2018; Molnar, 2018), there is comparatively little work on the human factors of how such explanation methods serve understanding (Abdul et al., 2018; Rader et al., 2018) nor much work that explicitly relates the value of generated explanations to a theory or philosophy of what explanations should achieve (Miller, 2019; Selbst and Powles, 2017; Edwards and Veale, 2017b, a; Selbst and Barocas, 2018; Selbst et al., 2019; Kalluri, 2020). For example, in one framework, Miller argues that good explanations must be causal, contrastive, selective, and social and that they act as both a product conveying information and a process of inquiry between the explainer and the explainee (Miller, 2019). Yet few techniques from the machine learning literature demonstrably have more than one of these properties, and precious little work examines such systems in situ to determine whether they improve outcomes (Lou et al., 2012; Sendak et al., 2020).

5.0.3. Test and Evaluation, Standards

In a similar vein, although much work considers specific robustness and generalization properties, or responses to bias or adversarially generated input, work on test and evaluation methodologies for data-driven automation is nascent, and work on test and evaluation of software lags the capabilities of test and evaluation for physical systems despite decades of research. Investigating how to assess algorithmic systems generally leads to more questions than answers.

At present, there is no generally accepted standard for establishing what evidence an automated decision-making system should produce so that its outputs can be considered traceable. This is in contrast to other complex systems, such as corporate finance and risk management (Division of Banking Supervision and Regulation, 2011), nuclear safety (Rees, 2009), or aviation safety (Huang, 2009). And while standards are only part of a governance approach (Leveson, 2016), they can help build the organizational structures necessary to assess and investigate unintended behaviors before they happen. Updates to accounting or audit standards, in both the public and private sectors, would make the assessment of traceability substantially more straightforward. Further, investigating the effectiveness of such standards in furthering the goals of traceability (as measured by, say, perspectives on the performance of a system (Veale, 2017; Binns et al., 2018)) would provide useful benchmarks for those charged with the test and evaluation of practical systems. Understanding the operationalization of this principle, seemingly amenable to testing and assessment more than other, contested principles dealing with fairness, bias, and equity, would demonstrate a path to operationalizing ethical principles in the development of practical systems, and is thus of paramount importance.

5.1. Oversight, Governance, and Responsibility

Closing the gaps between the lofty goals of principles documents and the behavior of algorithmic systems in practice is the key research challenge of the values-oriented design movement and research community (Metcalf et al., 2019; Mittelstadt, 2019a, b; Fjeld et al., 2020). It is in particular imperative that the size and nature of any such gap be legible both to affected stakeholders and to competent oversight authorities in order to support effective governance and accountability (Mulgan, 2000, 2003; Kroll et al., 2017; Kroll, 2018). Operationalizing traceability provides an excellent opportunity for progress in this critical research domain. Our investigation opens related research questions, such as why available tools are ignored even when their use would drive clear benefit both for the function of development teams, the function of governance structures, and the embodiment of human values.

6. Conclusion

Traceability is a widely demanded principle, appearing in a variety of AI policy and AI ethics principles documents adopted around the world, notably in several important pieces of government-issued guidance, in coordinating documents promulgated by transnational organizations such as the United Nations and the OECD, and in the guidance of the European Union’s High Level Expert Group on building Trustworthy and Responsible AI. Although it is often conceptualized in these documents as a principle which operationalizes transparency as a way to achieve governance, in reality that governance is achieved by enhancements to accountability and enhanced capabilities of both affected parties and competent authorities to notice when systems are going wrong and rectify the issue. Traceability serves to demonstrate when and why transparency is valuable, connecting the desire for disclosures about how a system functions to consumption of that disclosure for a defined purpose.

Traceability is an excellent principle for driving system assessment, as it is seemingly more concretely realizable and recognizable compared to other goals like equitability or nondiscrimination. Like transparency, it is an instrumental principle that serves ends beyond itself (including other ethical principles such as fairness, accountability, or governability). While it is possible to see such instruments as a goal unto themselves, their primary role in the operationalization of policies for ethical computing systems is to enable more abstract assessments.

That said, the concreteness of traceability does not equate to ease of realization in practical systems. We conclude via our analysis of operationalizing this principle that substantial gaps exist between the requirements we have identified and the tools presently available to meet those requirements. A major gap is simply adoption: tools which could improve traceability remain unused despite decades of development and the existence of mature realizations. Understanding why these tools go un-adopted remains an important open research question for the governance of computing systems. Other gaps are more serious and require new research to close: establishing an appropriate human factors understanding for computing systems that work in tandem with humans challenges traceability as much as it challenges any other question of ethical computing; the lack of accepted standards or even widely used best practices for the assessment of computing system ethics remains a barrier; and a lack of practical governance, oversight, and review limits how well robust traceability can support meaningful assignments of responsibility for the behaviors of computing systems or assessments of those systems’ fidelity to normative governance goals.

Traceability serves as a foundation for other goals in aligning the behavior of automated systems to human values and in governing those systems to make that alignment plain to anyone potentially affected or harmed by the operation of those systems or the social outcomes they drive. Only by making systems traceable can we hold them accountable and ensure they comport with applicable, contextually appropriate social, political, and legal norms.

Acknowledgements.
The author wishes to thank the anonymous reviewers for their helpful comments and insights, and is grateful for feedback from and discussions with colleagues including CDR Ed Jatho, USN; Abigail Jacobs; and Andrew Smart. This work was sponsored by a grant from the Naval Postgraduate School’s Research Initiation Program for new faculty. Views expressed are those of the author and not of the Naval Postgraduate School, the Department of the Navy, the Department of Defense, or the United States Government.

References

  • (1)
  • Abdul et al. (2018) Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y. Lim, and Mohan Kankanhalli. 2018. Trends and Trajectories for Explainable, Accountable and Intelligible Systems. Proceedings of the International Conference on Human Factors in Computer Systems (CHI) (2018).
  • Adida (2008) Ben Adida. 2008. Helios: Web-based Open-Audit Voting.. In USENIX Security Symposium, Vol. 17. 335–348.
  • Ajunwa (2021) Ifeoma Ajunwa. 2021. The Auditing Imperative for Automated Hiring. Harvard Journal of Law & Technology 34 (2021).
  • Albarghouthi et al. (2016) Aws Albarghouthi, Loris D’Antoni, Samuel Drews, and Aditya Nori. 2016. Fairness as a program property. arXiv preprint arXiv:1610.06067 (2016).
  • Albarghouthi and Vinitsky (2019) Aws Albarghouthi and Samuel Vinitsky. 2019. Fairness-aware programming. In Conference on Fairness, Accountability, and Transparency. ACM, 211–219.
  • Amit Elazari Bar On (2018) Amit Elazari Bar On. 2018. We Need Bug Bounties for Bad Algorithms. https://www.vice.com/en/article/8xkyj3/we-need-bug-bounties-for-bad-algorithms. Vice News (3 May 2018).
  • Appel (2011) Andrew W Appel. 2011. Verified software toolchain. In European Symposium on Programming. Springer, 1–17.
  • Argyraki et al. (2007) K Argyraki, P Maniatis, O Irzak, and S Shenker. 2007. An accountability interface for the Internet. In Proc. 14th ICNP.
  • Argyris (1977) Chris Argyris. 1977. Double loop learning in organizations. Harvard business review 55, 5 (1977), 115–125.
  • Arnold et al. (2019) Matthew Arnold, Rachel KE Bellamy, Michael Hind, Stephanie Houde, Sameep Mehta, A Mojsilović, Ravi Nair, K Natesan Ramamurthy, Alexandra Olteanu, David Piorkowski, et al. 2019. FactSheets: Increasing trust in AI services through supplier’s declarations of conformity. IBM Journal of Research and Development 63, 4/5 (2019), 6–1.
  • Backes et al. (2009) Michael Backes, Peter Druschel, Andreas Haeberlen, and Dominique Unruh. 2009. CSAR: A Practical and Provable Technique to Make Randomized Systems Accountable. Proc. NDSS (2009).
  • Bainbridge (1983) Lisanne Bainbridge. 1983. Ironies of automation. In Analysis, design and evaluation of man–machine systems. Elsevier, 129–135.
  • Balasch et al. (2010) Josep Balasch, Alfredo Rial, Carmela Troncoso, Bart Preneel, Ingrid Verbauwhede, and Christophe Geuens. 2010. PrETP: Privacy-Preserving Electronic Toll Pricing.. In USENIX Security Symposium. 63–78.
  • Bamberger and Mulligan (2010) Kenneth A Bamberger and Deirdre K Mulligan. 2010. Privacy on the Books and on the Ground. Stan. L. Rev. 63 (2010), 247.
  • Bamberger and Mulligan (2015) Kenneth A Bamberger and Deirdre K Mulligan. 2015. Privacy on the ground: driving corporate behavior in the United States and Europe. MIT Press.
  • Barach and Small (2000) Paul Barach and Stephen D Small. 2000. Reporting and preventing medical mishaps: lessons from non-medical near miss reporting systems. Bmj 320, 7237 (2000), 759–763.
  • Barth et al. (2007) Adam Barth, John C Mitchell, Anupam Datta, and Sharada Sundaram. 2007. Privacy and utility in business processes. In Computer Security Foundations Symposium, 2007. CSF’07. 20th IEEE. IEEE, 279–294.
  • Bashir et al. (2016) Muhammad Ahmad Bashir, Sajjad Arshad, and Christo Wilson. 2016. Recommended For You: A First Look at Content Recommendation Networks. In Proceedings of the 2016 Internet Measurement Conference. ACM.
  • Basil and Turner (1975) Victor R Basil and Albert J Turner. 1975. Iterative enhancement: A practical technique for software development. IEEE Transactions on Software Engineering 4 (1975), 390–396.
  • Bass et al. (2015) Len Bass, Ingo Weber, and Liming Zhu. 2015. DevOps: A software architect’s perspective. Addison-Wesley Professional.
  • Beck (2003) Kent Beck. 2003. Test-driven development: by example. Addison-Wesley Prof.
  • Beck et al. ([n.d.]) Kent Beck, Mike Beedle, Arie Van Bennekum, Alistair Cockburn, Ward Cunningham, et al. [n.d.]. Manifesto for agile software development, 2001. ([n. d.]).
  • Bellovin (2006) Steven M Bellovin. 2006. On the brittleness of software and the infeasibility of security metrics. IEEE Annals of the History of Computing 4, 04 (2006), 96–96.
  • Ben-Sasson et al. (2012) Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, and Eran Tromer. 2012. On the Concrete-Efficiency Threshold of Probabilistically-Checkable Proofs. Technical Report. Electronic Colloquium on Computational Complexity. http://eccc.hpi-web.de/report/2012/045/.
  • Ben-Sasson et al. (2013) Eli Ben-Sasson, Alessandro Chiesa, Daniel Genkin, Eran Tromer, and Madars Virza. 2013. SNARKs for C: Verifying program executions succinctly and in zero knowledge. CRYPTO (2013).
  • Ben-Sasson et al. (2015) Eli Ben-Sasson, Alessandro Chiesa, Matthew Green, Eran Tromer, and Madars Virza. 2015. Secure sampling of public parameters for succinct zero knowledge proofs. In IEEE Symposium on Security and Privacy.
  • Ben-Sasson et al. (2014) Eli Ben-Sasson, Alessandro Chiesa, Eran Tromer, and Madars Virza. 2014. Succinct non-interactive zero knowledge for a von Neumann architecture. In USENIX Security. 781–796.
  • Bender and Friedman (2018) Emily M Bender and Batya Friedman. 2018. Data statements for natural language processing: Toward mitigating system bias and enabling better science. Transactions of the Association for Computational Linguistics 6 (2018), 587–604.
  • Beutel et al. (2019) Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Allison Woodruff, Christine Luu, Pierre Kreitmann, Jonathan Bischof, and Ed H Chi. 2019. Putting Fairness Principles into Practice: Challenges, Metrics, and Improvements. arXiv preprint arXiv:1901.04562 (2019).
  • Binns et al. (2018) Reuben Binns, Max Van Kleek, Michael Veale, Ulrik Lyngs, Jun Zhao, and Nigel Shadbolt. 2018. ’It’s Reducing a Human Being to a Percentage’: Perceptions of Justice in Algorithmic Decisions. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 377.
  • Bishop (1995) Matt Bishop. 1995. A standard audit trail format. In Proceedings of the 18th National Information Systems Security Conference. 136–145.
  • Blocki et al. (2013) Jeremiah Blocki, Nicolas Christin, Anupam Datta, Ariel D Procaccia, and Arunesh Sinha. 2013. Audit games. In Proceedings of the Twenty-Third international joint conference on Artificial Intelligence. AAAI Press, 41–47.
  • Bodea et al. (2018a) Gabriela Bodea, Kristina Karanikolova, Deirdre K. Mulligan, and Jael Makagon. 2018a. Automated decision-making on the basis of personal data that has been transferred from the EU to companies certified under the EU-U.S. Privacy Shield: Fact-finding and assessment of safeguards provided by U.S. law. Technical Report. European Commission. https://ec.europa.eu/info/sites/info/files/independent_study_on_automated_decision-making.pdf
  • Bodea et al. (2018b) Gabriela Bodea, Kristina Karanikolova, Deirdre K. Mulligan, and Jael Makagon. 2018b. Automated decision-making on the basis of personal data that has been transferred from the EU to companies certified under the EU-U.S. Privacy Shield. Fact-finding and assessment of safeguards provided by U.S. law. Independent Report promulgated by the European Commission, Directorate-General for Justice and Consumers. https://ec.europa.eu/info/sites/info/files/independent_study_on_automated_decision-making.pdf
  • Boehm (2002) Barry Boehm. 2002. Get ready for agile methods, with care. Computer 35, 1 (2002), 64–69.
  • Boehm (1984) Barry W. Boehm. 1984. Verifying and validating software requirements and design specifications. IEEE software 1, 1 (1984), 75.
  • Boettiger (2015) Carl Boettiger. 2015. An introduction to Docker for reproducible research. ACM SIGOPS Operating Systems Review 49, 1 (2015), 71–79.
  • Bonchi et al. (2016) Francisco Bonchi, Carlos Castillo, and Sara Hajian. 2016. Algorithmic bias: from discrimination discovery to fairness-aware data mining. Knowledge Discovery and Data Mining, Tutorials Track.
  • Braun et al. (2013) Benjamin Braun, Ariel J Feldman, Zuocheng Ren, Srinath Setty, Andrew J Blumberg, and Michael Walfish. 2013. Verifying computations with state. In Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, 341–357.
  • Breaux et al. (2006a) Travis D Breaux, Annie I Antón, C-M Karat, and John Karat. 2006a. Enforceability vs. accountability in electronic policies. In Policies for Distributed Systems and Networks, 2006. Policy 2006. Seventh IEEE International Workshop on. IEEE, 4–pp.
  • Breaux et al. (2009) Travis D Breaux, Annie I Antón, and Eugene H Spafford. 2009. A distributed requirements management framework for legal compliance and accountability. computers & security 28, 1-2 (2009), 8–17.
  • Breaux et al. (2006b) Travis D Breaux, Matthew W Vail, and Annie I Anton. 2006b. Towards regulatory compliance: Extracting rights and obligations to align requirements with regulations. In 14th IEEE International Requirements Engineering Conference (RE’06). IEEE, 49–58.
  • Brooks Jr (1995) Frederick P Brooks Jr. 1995. The mythical man-month: essays on software engineering. Addison-Wesley.
  • Bruening and Kroll (2019) Paula Bruening and Joshua A. Kroll. 2019. Considering Transparency: AI, Data Science, and the GDPR. Computers, Privacy, and Data Protection (2019).
  • Brundage et al. (2020) Miles Brundage, Shahar Avin, Jasmine Wang, Haydn Belfield, Gretchen Krueger, Gillian Hadfield, Heidy Khlaaf, Jingying Yang, Helen Toner, Ruth Fong, et al. 2020. Toward trustworthy AI development: mechanisms for supporting verifiable claims. OpenAI Technical Report, arXiv preprint arXiv:2004.07213 (2020).
  • Buneman et al. (2001) Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In Database Theory—ICDT 2001. Springer.
  • Buolamwini and Gebru (2018) Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency. 77–91.
  • Burrell (2016) Jenna Burrell. 2016. How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society 3, 1 (2016).
  • Camenisch et al. (2006) Jan Camenisch, Susan Hohenberger, and Anna Lysyanskaya. 2006. Balancing accountability and privacy using e-cash. In Security and Cryptography for Networks. Springer, 141–155.
  • Carruthers and Espeland (1991) Bruce G Carruthers and Wendy Nelson Espeland. 1991. Accounting for rationality: Double-entry bookkeeping and the rhetoric of economic rationality. American journal of sociology 97, 1 (1991), 31–69.
  • Carter and Sholler (2016) Daniel Carter and Dan Sholler. 2016. Data science on the ground: Hype, criticism, and everyday work. Journal of the Association for Information Science and Technology 67, 10 (2016), 2309–2319.
  • Cavoukian (2011) Ann Cavoukian. 2011. Privacy by design in law, policy and practice. A white paper for regulators, decision-makers and policy-makers (2011).
  • Cavoukian et al. (2009) Ann Cavoukian et al. 2009. Privacy by design: The 7 foundational principles. Information and Privacy Commissioner of Ontario, Canada 5 (2009).
  • Chen et al. (2018) Le Chen, Ruijun Ma, Anikó Hannák, and Christo Wilson. 2018. Investigating the Impact of Gender on Rank in Resume Search Engines. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM.
  • Chen and Wilson (2017) Le Chen and Christo Wilson. 2017. Observing algorithmic marketplaces in-the-wild. ACM SIGecom Exchanges 15, 2 (2017), 34–39.
  • Chin and Ozsoyoglu (1982) Francis Y Chin and Gultekin Ozsoyoglu. 1982. Auditing and inference control in statistical databases. IEEE Transactions on Software Engineering (1982), 574–582.
  • Chouldechova (2017) Alexandra Chouldechova. 2017. Fair prediction with disparate impact: A study of bias in recidivism prediction instruments. Big data 5, 2 (2017), 153–163.
  • Chouldechova et al. (2018) Alexandra Chouldechova, Diana Benavides-Prado, Oleksandr Fialko, and Rhema Vaithianathan. 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. In Conference on Fairness, Accountability and Transparency. 134–148.
  • Christin (2017) Angèle Christin. 2017. Algorithms in practice: Comparing web journalism and criminal justice. Big Data & Society 4, 2 (2017).
  • Citron (2007) Danielle Citron. 2007. Technological due process. Washington University Law Review 85 (2007), 1249–1313.
  • Citron (2008) Danielle Keats Citron. 2008. Open code governance. In University of Chicago Legal Forum. 355–387.
  • Citron and Calo (2020) Danielle K Citron and Ryan Calo. 2020. The Automated Administrative State: A Crisis of Legitimacy. Emory Law Journal (2020).
  • Clark (2018) Robert Clark. 2018. Compliance != Security (Except When It Might Be). In Enigma 2018.
  • Colbert and Bowen (1996) Janet L Colbert and Paul Bowen. 1996. A Comparison of Internal Control: COBIT, SAC, COSO and SAS. IS Audit and Control Journal (1996), 26–35.
  • Computing Community Consortium (2015a) Computing Community Consortium. 2015a. Privacy by Design—Engineering Privacy: Workshop 3 Report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/12/PbD3-Workshop-Report-v2.pdf.
  • Computing Community Consortium (2015b) Computing Community Consortium. 2015b. Privacy by Design—Privacy Enabling Design: Workshop 2 Report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/05/PbD2-Report-v5.pdf.
  • Computing Community Consortium (2015c) Computing Community Consortium. 2015c. Privacy by Design-State of Research and Practice: Workshop 1 Report. https://cra.org/ccc/wp-content/uploads/sites/2/2015/02/PbD-Workshop-1-Report-.pdf.
  • Consultative Committee for Space Data Systems (2011) Consultative Committee for Space Data Systems. 2011. Audit and certification of Trustworthy Digital Repositories, Recommended Practice.
  • Consumer Financial Protection Bureau (2014) Consumer Financial Protection Bureau. 2014. Using Publicly Available Information to Proxy for Unidentified Race and Ethnicity: A Methodology and Assessment. https://www.consumerfinance.gov/data-research/research-reports/using-publicly-available-information-to-proxy-for-unidentified-race-and-ethnicity/.
  • Cook (1998) Richard I Cook. 1998. How complex systems fail. Cognitive Technologies Laboratory, University of Chicago. Chicago IL (1998).
  • Crosby and Wallach (2009) Scott A Crosby and Dan S Wallach. 2009. Efficient Data Structures For Tamper-Evident Logging.. In USENIX Security Symposium. 317–334.
  • Datta (2014) Anupam Datta. 2014. Privacy through Accountability: A Computer Science Perspective. In Distributed Computing and Internet Technology. Springer, 43–49.
  • de Castro et al. (2020) Leo de Castro, Andrew W Lo, Taylor Reynolds, Fransisca Susan, Vinod Vaikuntanathan, Daniel Weitzner, and Nicolas Zhang. 2020. SCRAM: A Platform for Securely Measuring Cyber Risk. Harvard Data Science Review (2020).
  • Defense Innovation Board (2019) Defense Innovation Board. 2019. AI Principles: Recommendations on the Ethical Use of Artificial Intelligence by the Department of Defense: Supporting Document. https://innovation.defense.gov/ai/.
  • Desai and Kroll (2018) Deven Desai and Joshua A. Kroll. 2018. Trust but Verify: A Guide to Algorithms and the Law. Harvard J. of Law and Tech. 31, 1 (2018).
  • Division of Banking Supervision and Regulation (2011) Division of Banking Supervision and Regulation. 2011. SR 11-7: Guidance on Model Risk Management.
  • Doshi-Velez and Kim (2017) Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608 (2017).
  • Doshi-Velez et al. (2017) Finale Doshi-Velez, Mason Kortz, Ryan Budish, Chris Bavitz, Sam Gershman, David O’Brien, Stuart Schieber, James Waldo, David Weinberger, and Alexandra Wood. 2017. Accountability of AI Under the Law: The Role of Explanation. arXiv preprint arXiv:1711.01134 (2017).
  • Dwork et al. (2006) Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference. Springer, 265–284.
  • Dwork and Mulligan (2013) Cynthia Dwork and Deirdre K Mulligan. 2013. It’s not privacy, and it’s not fair. Stanford Law Review Online 66 (2013), 35.
  • Editorial (2016) Editorial. 2016. More Accountability for Big Data Algorithms. Nature 537 (2016). Issue 7621.
  • Edwards and Veale (2017a) Lilian Edwards and Michael Veale. 2017a. Enslaving the Algorithm: From a “Right to an Explanation” to a “Right to Better Decisions”? IEEE Security and Privacy (2017).
  • Edwards and Veale (2017b) Lilian Edwards and Michael Veale. 2017b. Slave to the Algorithm? Why a “Right to Explanation” is Probably Not the Remedy You are Looking for. Duke Technology Law Journal 16, 1 (2017), 18–84.
  • Elish and Hwang (2015) MC Elish and Tim Hwang. 2015. Praise the Machine! Punish the Human! The contradictory history of accountability in automated aviation. Data & Society Report (2015).
  • Elish (2019) Madeleine Clare Elish. 2019. Moral crumple zones: Cautionary tales in human-robot interaction. Engaging Science, Technology, and Society 5 (2019), 40–60.
  • Ellison (2007) Carl Ellison. 2007. Ceremony Design and Analysis. IACR eprint archive 399 (2007). http://eprint.iacr.org/2007/399.pdf.
  • Ellison et al. (2010) Robert J Ellison, John B Goodenough, Charles B Weinstock, and Carol Woody. 2010. Evaluating and mitigating software supply chain security risks. Technical Report. Carnegie-Mellon University Software Engineering Institute.
  • Espeland and Vannebo (2007) Wendy Nelson Espeland and Berit Irene Vannebo. 2007. Accountability, quantification, and law. Annu. Rev. Law Soc. Sci. 3 (2007), 21–43.
  • European Commission Independent High-Level Expert Group on Artificial Intelligence (2020) European Commission Independent High-Level Expert Group on Artificial Intelligence. 2020. Ethics Guidelines for Trustworthy AI. https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai.
  • Ewusi-Mensah (2003) Kweku Ewusi-Mensah. 2003. Software development failures. Mit Press.
  • Executive Office of the President of the United States (2019) Executive Office of the President of the United States. 2019. Executive Order 13859: Maintaining American Leadership in Artificial Intelligence. https://www.govinfo.gov/app/details/DCPD-201900073.
  • Executive Office of the President of the United States (2020) Executive Office of the President of the United States. 2020. Executive Order 13960: Promoting the Use of Trustworthy Artificial Intelligence in the Federal Government. https://www.govinfo.gov/app/details/DCPD-202000870.
  • Feigenbaum et al. (2012) Joan Feigenbaum, Aaron D Jaggard, Rebecca N Wright, and Hongda Xiao. 2012. Systematizing “Accountability” in Computer Science (Version of Feb. 17, 2012). Technical Report YALEU/DCS/TR-1452. Yale University, New Haven, CT.
  • Fjeld et al. (2020) Jessica Fjeld, Nele Achten, Hannah Hilligoss, Adam Nagy, and Madhulika Srikumar. 2020. Principled artificial intelligence: Mapping consensus in ethical and rights-based approaches to principles for AI. Berkman Klein Center Research Publication 2020-1 (2020).
  • Flanagan et al. (2005) Mary Flanagan, Daniel C. Howe, and Helen Nissenbaum. 2005. Values at Play: Design Tradeoffs in Socially-oriented Game Design. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’05). ACM, New York, NY, USA, 751–760.
  • Floridi (2019) Luciano Floridi. 2019. Establishing the rules for building trustworthy AI. Nature Machine Intelligence 1, 6 (2019), 261–262.
  • Friedman (1996) Batya Friedman. 1996. Value-sensitive design. interactions 3, 6 (1996), 16–23.
  • Friedman et al. (2008) Batya Friedman, Peter H. Kahn, and Alan Borning. 2008. Value Sensitive Design and Information Systems. In The Handbook of Information and Computer Ethics, Kenneth Einar Himma and Herman T. Tavani (Eds.). John Wiley & Sons, Inc., Chapter 4, 69–101.
  • Friedman et al. (1999) Batya Friedman, John C Thomas, Jonathan Grudin, Clifford Nass, Helen Nissenbaum, Mark Schlager, and Ben Shneiderman. 1999. Trust me, I’m accountable: trust and accountability online. In CHI’99 Extended Abstracts on Human Factors in Computing Systems. ACM, 79–80.
  • Gawande (2009) Atul Gawande. 2009. The Checklist Manifesto: How to Get Things Right. Metropolitan Books.
  • Gebru et al. (2018) Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2018. Datasheets for datasets. arXiv preprint arXiv:1803.09010 (2018).
  • Geiger and Halfaker (2017) R. Stuart Geiger and Aaron Halfaker. 2017. Operationalizing Conflict and Cooperation between Automated Software Agents in Wikipedia: A Replication and Expansion of ”Even Good Bots Fight”. Proceedings of the ACM on Human-Computer Interaction 1, CSCW (dec 2017), 1–33.
  • Gellman (2017) Robert Gellman. 2017. Fair information practices: A basic history. SSRN, 2415020 (2017).
  • Gelperin and Hetzel (1988) David Gelperin and Bill Hetzel. 1988. The growth of software testing. Commun. ACM 31, 6 (1988), 687–695.
  • Gillespie (2006) Tarleton Gillespie. 2006. Engineering a Principle: ‘End-to-End’in the Design of the Internet. Social Studies of Science 36, 3 (2006), 427–457.
  • Gillespie (2010) Tarleton Gillespie. 2010. The politics of ‘platforms’. New media & society 12, 3 (2010), 347–364.
  • Gordon and Breaux (2013) David G Gordon and Travis D Breaux. 2013. Assessing regulatory change through legal requirements coverage modeling. In 2013 21st IEEE International Requirements Engineering Conference (RE). IEEE, 145–154.
  • Government Accountability Office (2018) Government Accountability Office. 2018. Government Auditing Standards, 2018 Revision.
  • Guidotti et al. (2018) Riccardo Guidotti, Anna Monreale, Franco Turini, Dino Pedreschi, and Fosca Giannotti. 2018. A Survey Of Methods For Explaining Black Box Models. arXiv preprint arXiv:1802.01933 (2018).
  • Gürses et al. (2011) Seda Gürses, Carmela Troncoso, and Claudia Diaz. 2011. Engineering privacy by design. Conference on Computers, Privacy, and Data Protection (2011).
  • Habra et al. (1992) Naji Habra, Baudouin Le Charlier, Abdelaziz Mounji, and Isabelle Mathieu. 1992. ASAX: Software architecture and rule-based language for universal audit trail analysis. In Computer Security—ESORICS 92. Springer, 435–450.
  • Haeberlen (2010) Andreas Haeberlen. 2010. A case for the accountable cloud. ACM SIGOPS Operating Systems Review 44, 2 (2010), 52–57.
  • Haeberlen et al. (2010) Andreas Haeberlen, Paarijaat Aditya, Rodrigo Rodrigues, and Peter Druschel. 2010. Accountable Virtual Machines.. In OSDI. 119–134.
  • Haeberlen et al. (2007) Andreas Haeberlen, Petr Kouznetsov, and Peter Druschel. 2007. PeerReview: Practical accountability for distributed systems. In ACM SIGOPS Operating Systems Review, Vol. 41:6. ACM, 175–188.
  • Hall (2010) Joseph Lorenzo Hall. 2010. Election Auditing Bibliography. https://josephhall.org/papers/auditing_biblio.pdf.
  • Halpern and Pearl (2005a) Joseph Y Halpern and Judea Pearl. 2005a. Causes and explanations: A structural-model approach. Part I: Causes. The British journal for the philosophy of science 56, 4 (2005), 843–887.
  • Halpern and Pearl (2005b) Joseph Y Halpern and Judea Pearl. 2005b. Causes and explanations: A structural-model approach. Part II: Explanations. The British Journal for the Philosophy of Science 56, 4 (2005), 889–911.
  • Hannak et al. (2013) Aniko Hannak, Piotr Sapiezynski, Arash Molavi Kakhki, Balachander Krishnamurthy, David Lazer, Alan Mislove, and Christo Wilson. 2013. Measuring personalization of web search. In Proceedings of the 22nd international conference on World Wide Web. ACM.
  • Hannak et al. ([n.d.]) Aniko Hannak, Gary Soeller, David Lazer, Alan Mislove, and Christo Wilson. [n.d.]. Measuring price discrimination and steering on e-commerce web sites. In Proceedings of the 2014 conference on internet measurement conference. ACM, 305–318.
  • Hedstrom (1997) Margaret Hedstrom. 1997. Digital preservation: a time bomb for digital libraries. Computers and the Humanities 31, 3 (1997), 189.
  • Helman and Liepins (1993) Paul Helman and Gunar Liepins. 1993. Statistical foundations of audit trail analysis for the detection of computer misuse. Software Engineering, IEEE Transactions on 19, 9 (1993), 886–901.
  • Herschel et al. (2017) Melanie Herschel, Ralf Diestelkämper, and Houssem Ben Lahmar. 2017. A survey on provenance: What for? What form? What from? The VLDB Journal 26, 6 (2017), 881–906.
  • Holland et al. (2018) Sarah Holland, Ahmed Hosny, Sarah Newman, Joshua Joseph, and Kasia Chmielinski. 2018. The dataset nutrition label: A framework to drive higher data quality standards. arXiv preprint arXiv:1805.03677 (2018).
  • Huang (2009) Jiefang Huang. 2009. Aviation safety through the rule of law: ICAO’s mechanisms and practices. Kluwer Law International BV.
  • Irani and Silberman (2014) Lilly Irani and M. Six Silberman. 2014. From critical design to critical infrastructure. Interactions 21, 4 (2014), 32–35.
  • ISO (2012) ISO. 2012. Certification of Trustworthy Digital Repositories.
  • Jabbra and Dwivedi (1989) J.G. Jabbra and O.P. Dwivedi (Eds.). 1989. Public Service Accountability: A Comparative Perspective. Kumarian Press.
  • Jackson et al. (2014) Steven J. Jackson, Tarleton Gillespie, and Sandy Payette. 2014. The Policy Knot: Re-integrating Policy, Practice and Design in CSCW Studies of Social Computing. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’14). ACM, New York, NY, USA, 588–602.
  • Jacobs and Wallach (2021) Abigail Z. Jacobs and Hanna Wallach. 2021. Measurement and Fairness. In ACM Conference on Fairness, Accountability, and Transparency.
  • JafariNaimi et al. (2015) Nassim JafariNaimi, Lisa Nathan, and Ian Hargraves. 2015. Values as Hypotheses: Design, Inquiry, and the Service of Values. Design Issues 31, 4 (oct 2015), 91–104.
  • Jagadeesan et al. (2009) Radha Jagadeesan, Alan Jeffrey, Corin Pitcher, and James Riely. 2009. Towards a theory of accountability and audit. In Computer Security–ESORICS 2009. Springer, 152–167.
  • Ka-Ping Yee (2007) Ka-Ping Yee. 2007. Building Reliable Voting Machine Software. Ph.D. Dissertation. University of California. http://zesty.ca/pubs/yee-phd.pdf.
  • Kalluri (2020) Pratyusha Kalluri. 2020. Don’t ask if artificial intelligence is good or fair, ask how it shifts power. Nature 583, 7815 (2020), 169–169.
  • Katell et al. (2019) Michael Katell, Meg Young, Bernease Herman, Dharma Dailey, Aaron Tam, Vivian Guetler, Corinne Binz, Daniella Raz, and PM Krafft. 2019. An Algorithmic Equity Toolkit for Technology Audits by Community Advocates and Activists. arXiv preprint arXiv:1912.02943 (2019).
  • Kearns et al. (2017) Michael Kearns, Seth Neel, Aaron Roth, and Zhiwei Steven Wu. 2017. Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness. arXiv preprint arXiv:1711.05144 (2017).
  • Kenthapadi et al. (2005) Krishnaram Kenthapadi, Nina Mishra, and Kobbi Nissim. 2005. Simulatable auditing. In Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems. ACM, 118–127.
  • Kim (2017) Pauline Kim. 2017. Auditing Algorithms for Discrimination. University of Pennsylvania Law Review Online 166, 189 (2017).
  • Kirchner (2017) Lauren Kirchner. 2017. Thousands of Criminal Cases in New York Relied on Disputed DNA Testing Techniques. ProPublica (4 Sept. 2017).
  • Knobel and Bowker (2011) Cory Knobel and Geoffrey C. Bowker. 2011. Values in design. Commun. ACM 54 (2011), 26.
  • Kroll (2015) Joshua A. Kroll. 2015. Accountable Algorithms. Ph.D. Dissertation. Princeton University.
  • Kroll (2018) Joshua A Kroll. 2018. The fallacy of inscrutability. Phil. Trans. R. Soc. A 376, 2133 (2018), 14.
  • Kroll (2020) Joshua A. Kroll. 2020. Accountability in Computer Systems. In The Oxford Handbook of the Ethics of Artificial Intelligence, Markus Dubber, Frank Pasquale, and Sunit Das (Eds.). Oxford University Press, Oxford, UK, 181–196.
  • Kroll et al. (2017) Joshua A. Kroll, Joanna Huey, Solon Barocas, Edward W. Felten, Joel R. Reidenberg, David G. Robinson, and Harlan Yu. 2017. Accountable Algorithms. University of Pennsylvania Law Review (to appear) 165 (2017), 633–705. Issue 3.
  • Künnemann et al. (2019) Robert Künnemann, Ilkan Esiyok, and Michael Backes. 2019. Automated Verification of Accountability in Security Protocols. In 2019 IEEE 32nd Computer Security Foundations Symposium (CSF). IEEE, 397–39716.
  • Küsters et al. (2010) Ralf Küsters, Tomasz Truderung, and Andreas Vogt. 2010. Accountability: definition and relationship to verifiability. In Proc. 17th ACM conf. Computer and Communications Security. ACM, 526–535.
  • Kwong (2017) Katherine Kwong. 2017. The Algorithm says you did it: The use of Black Box Algorithms to analyze complex DNA evidence. Harv. JL & Tech. 31 (2017), 275.
  • Lamb et al. (2020) Chris Lamb, Holger Levson, Mattia Rizzolo, and Vagrant Cascadian. 2020. Reproducible Builds. https://reproducible-builds.org.
  • Laurie et al. (2013) Ben Laurie, Adam Langley, and Emilia Kasper. 2013. Certificate Transparency. Technical Report RFC 6962. Internet Engineering Task Force.
  • Le Dantec et al. (2009) Christopher A. Le Dantec, Erika Shehan Poole, and Susan P. Wyche. 2009. Values as lived experience: Evolving value sensitive design in support of value discovery. In Proceedings of the 27th international conference on Human factors in computing systems - CHI 09. ACM Press, New York, New York, USA, 1141.
  • Lee and Baykal (2017) Min Kyung Lee and Su Baykal. 2017. Algorithmic Mediation in Group Decisions: Fairness Perceptions of Algorithmically Mediated vs. Discussion-Based Social Division. In Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW ’17). ACM, New York, NY, USA, 1035–1048.
  • Lee et al. (2020) Min Kyung Lee, Nina Grgić-Hlača, Michael Carl Tschantz, Reuben Binns, Adrian Weller, Michelle Carney, and Kori Inkpen. 2020. Human-Centered Approaches to Fair and Responsible AI. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 1–8.
  • Lee et al. (2019a) Min Kyung Lee, Anuraag Jain, Hea Jin Cha, Shashank Ojha, and Daniel Kusbit. 2019a. Procedural justice in algorithmic fairness: Leveraging transparency and outcome control for fair algorithmic mediation. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–26.
  • Lee et al. (2019b) Min Kyung Lee, Daniel Kusbit, Anson Kahng, Ji Tae Kim, Xinran Yuan, Allissa Chan, Daniel See, Ritesh Noothigattu, Siheon Lee, Alexandros Psomas, et al. 2019b. WeBuildAI: Participatory framework for algorithmic governance. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 1–35.
  • Leveraging Data as a Strategic Asset Cross-Agency Priority Team (2020) Leveraging Data as a Strategic Asset Cross-Agency Priority Team. 2020. Federal Data Strategy: 2020 Action Plan. https://strategy.data.gov/action-plan/.
  • Leveson et al. (1997) Nancy Leveson, L Denise Pinnel, Sean David Sandys, Shuichi Koga, and Jon Damon Reese. 1997. Analyzing software specifications for mode confusion potential. In Proceedings of a workshop on human error and system development. Glasgow Accident Analysis Group, 132–146.
  • Leveson (2016) Nancy G Leveson. 2016. Engineering a safer world: Systems thinking applied to safety. The MIT Press.
  • Lipton et al. (2017) Zachary C Lipton, Alexandra Chouldechova, and Julian McAuley. 2017. Does mitigating ML’s disparate impact require disparate treatment? arXiv preprint arXiv:1711.07076 (2017).
  • Loeliger and McCullough (2012) Jon Loeliger and Matthew McCullough. 2012. Version Control with Git: Powerful tools and techniques for collaborative software development. O’Reilly Media.
  • Lou et al. (2012) Yin Lou, Rich Caruana, and Johannes Gehrke. 2012. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. 150–158.
  • Ludäscher (2016) Bertram Ludäscher. 2016. A brief tour through provenance in scientific workflows and databases. In Building Trust in Information. Springer, 103–126.
  • Lum and Isaac (2016) Kristian Lum and William Isaac. 2016. To predict and serve? Significance 13, 5 (2016), 14–19.
  • Lunt (1988) Teresa F Lunt. 1988. Automated audit trail analysis and intrusion detection: A survey. In In Proceedings of the 11th National Computer Security Conference.
  • Mace et al. (2015) Jonathan Mace, Ryan Roelke, and Rodrigo Fonseca. 2015. Pivot tracing: Dynamic causal monitoring for distributed systems. In Proceedings of the 25th Symposium on Operating Systems Principles. 378–393.
  • Martin Jr et al. (2020) Donald Martin Jr, Vinodkumar Prabhakaran, Jill Kuhlberg, Andrew Smart, and William S Isaac. 2020. Extending the machine learning abstraction boundary: A Complex systems approach to incorporate societal context. arXiv preprint arXiv:2006.09663 (2020).
  • McBreen and Foreword By-Beck (2002) Pete McBreen and Kent Foreword By-Beck. 2002. Questioning extreme programming. Addison-Wesley Longman Publishing Co., Inc.
  • McPhillips et al. (2015) Timothy McPhillips, Tianhong Song, Tyler Kolisnik, Steve Aulenbach, Khalid Belhajjame, Kyle Bocinsky, Yang Cao, Fernando Chirigati, Saumen Dey, Juliana Freire, Deborah Huntzinger, Christopher Jones, David Koop, Paolo Missier, Mark Schildhauer, Christopher Schwalm, Yaxing Wei, James Cheney, Mark Bieda, and Bertram Ludascher. 2015. YesWorkflow: a user-oriented, language-independent tool for recovering workflow information from scripts. International Data Curation Conference (2015).
  • Merkle (1987) Ralph C. Merkle. 1987. A Digital Signature Based on a Conventional Encryption Function. CRYPTO (1987).
  • Metcalf et al. (2019) Jacob Metcalf, Emanuel Moss, et al. 2019. Owning Ethics: Corporate Logics, Silicon Valley, and the Institutionalization of Ethics. Social Research: An International Quarterly 86, 2 (2019), 449–476.
  • Miller (2019) Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence 267 (2019), 1–38.
  • Mitchell (2009) Melanie Mitchell. 2009. Complexity: A guided tour. Oxford University Press, Oxford, UK.
  • Mitchell et al. (2019) Margaret Mitchell, Simone Wu, Andrew Zaldivar, Parker Barnes, Lucy Vasserman, Ben Hutchinson, Elena Spitzer, Inioluwa Deborah Raji, and Timnit Gebru. 2019. Model cards for model reporting. In Proceedings of the Conference on Fairness, Accountability, and Transparency. ACM, 220–229.
  • Mittelstadt (2019a) Brent Mittelstadt. 2019a. AI Ethics–Too principled to fail. arXiv preprint arXiv:1906.06668 (2019).
  • Mittelstadt (2019b) Brent Mittelstadt. 2019b. Principles alone cannot guarantee ethical AI. Nature Machine Intelligence (2019), 1–7.
  • Molnar (2018) Christoph Molnar. 2018. Interpretable Machine Learning: A Guide for Making Black-box Models Explainable. Online: https://christophm.github.io/interpretable-ml-book/.
  • Moor (1985) James H Moor. 1985. What is computer ethics? Metaphilosophy 16, 4 (1985).
  • Moreau et al. (2008) Luc Moreau, Paul Groth, Simon Miles, Javier Vazquez-Salceda, John Ibbotson, Sheng Jiang, Steve Munroe, Omer Rana, Andreas Schreiber, Victor Tan, et al. 2008. The provenance of electronic data. Commun. ACM 51, 4 (2008), 52–58.
  • Mulgan (2000) Richard Mulgan. 2000. ‘Accountability’: An ever-expanding concept? Public administration 78, 3 (2000), 555–573.
  • Mulgan (2003) Richard G Mulgan. 2003. Holding power to account: accountability in modern democracies. Palgrave Macmillan.
  • Mulligan and Bamberger (2018) Deirdre K Mulligan and Kenneth A Bamberger. 2018. Saving Governance-by-Design. California Law Review 106, 101 (2018).
  • Mulligan et al. (2019a) Deirdre K Mulligan, Daniel Kluttz, and Nitin Kohli. 2019a. Shaping Our Tools: Contestability as a Means to Promote Responsible Algorithmic Decision Making in the Professions. Preprint available at SSRN 3311894 (2019).
  • Mulligan et al. (2016) Deirdre K Mulligan, Colin Koopman, and Nick Doty. 2016. Privacy is an essentially contested concept: a multi-dimensional analytic for mapping privacy. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374, 2083 (2016), 20160118.
  • Mulligan et al. (2019b) Deirdre K Mulligan, Joshua A Kroll, Nitin Kohli, and Richmond Y Wong. 2019b. This Thing Called Fairness: Disciplinary Confusion Realizing a Value in Technology. Proceedings of the ACM on Human-Computer Interaction 3, CSCW (2019), 119.
  • Mulligan and Nissenbaum (2020) Deirdre K Mulligan and Helen Nissenbaum. 2020. The Concept of Handoff as a Model for Ethical Analysis and Design. In The Oxford Handbook of Ethics of AI, Markus Dubber, Frank Pasquale, and Sunit Das (Eds.). Oxford University Press, Oxford, UK.
  • Multistakeholder Process on Software Component Transparency (2019) Multistakeholder Process on Software Component Transparency. 2019. Framing Software Component Transparency: Establishing a Common Software Bill of Material (SBOM). https://www.ntia.gov/files/ntia/publications/framingsbom_20191112.pdf.
  • Muniswamy-Reddy et al. (2006) Kiran-Kumar Muniswamy-Reddy, David A Holland, Uri Braun, and Margo I Seltzer. 2006. Provenance-Aware Storage Systems.. In USENIX Annual Technical Conference, General Track. 43–56.
  • Muniswamy-Reddy et al. (2010) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo I Seltzer. 2010. Provenance for the Cloud.. In FAST, Vol. 10. 15–14.
  • Myers et al. (1979) Glenford J Myers, Corey Sandler, and Tom Badgett. 2011 (1979). The art of software testing. John Wiley & Sons.
  • Nabar et al. (2008) Shubha U Nabar, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwani. 2008. A survey of query auditing techniques for data privacy. In Privacy-Preserving Data Mining. Springer, 415–431.
  • Nabar et al. (2006) Shubha U Nabar, Bhaskara Marthi, Krishnaram Kenthapadi, Nina Mishra, and Rajeev Motwani. 2006. Towards robustness in query auditing. In Proceedings of the 32nd international conference on Very large data bases. 151–162.
  • National Institute of Standards and Technology (2018) National Institute of Standards and Technology. 2018. Framework for Improving Critical Infrastructure Cybersecurity. https://www.nist.gov/cyberframework.
  • National Institute of Standards and Technology (2020) National Institute of Standards and Technology. 2020. NIST Privacy Framework: A Tool for Improving Privacy through Enterprise Risk Management. https://www.nist.gov/privacy-framework.
  • National New Generation Artificial Intelligence Governance Expert Committee (Ministry of Science and Technology convening) (2019) National New Generation Artificial Intelligence Governance Expert Committee (Ministry of Science and Technology convening). 2019. Governance Principles for a New Generation of Artificial Intelligence: Develop Responsible Artificial Intelligence.
  • National Security Commission on Artificial Intelligence (2020) National Security Commission on Artificial Intelligence. 2020. Second Quarter Recommendations. https://www.nscai.gov.
  • Nikitin et al. (2017) Kirill Nikitin, Eleftherios Kokoris-Kogias, Philipp Jovanovic, Nicolas Gailly, Linus Gasser, Ismail Khoffi, Justin Cappos, and Bryan Ford. 2017. {\{CHAINIAC}\}: Proactive software-update transparency via collectively signed skipchains and verified builds. In 26th {\{USENIX}\} Security Symposium ({\{USENIX}\} Security 17). 1271–1287.
  • Nissenbaum (1996) Helen Nissenbaum. 1996. Accountability in a computerized society. Science and engineering ethics 2, 1 (1996), 25–42.
  • Nissenbaum (2001) Helen Nissenbaum. 2001. How computer systems embody values. Computer 34, 3 (March 2001), 120–119.
  • Nissenbaum (2005) Helen Nissenbaum. 2005. Values in technical design. In Encyclopedia of science, technology, and ethics. Macmillan New York, NY, 66–70.
  • Office of the Secretary of Defense (2020) Office of the Secretary of Defense. 2020. Artificial Intelligence Ethical Principles for the Department of Defense. OSD Memorandum.
  • Organization for Economic Cooperation and Development (2019) Organization for Economic Cooperation and Development. 2019. Recommendation of the Council on Artificial Intelligence. OECD/LEGAL/0449, https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449.
  • Partnership on AI (2019) Partnership on AI. 2019. Annotation and Benchmarking on Understanding and Transparency of Machine Learning Lifecycles (ABOUT ML). https://partnershiponai.com/about-ml.
  • Passi and Barocas (2019) Samir Passi and Solon Barocas. 2019. Problem formulation and fairness. In Conference on Fairness, Accountability, and Transparency.
  • Passi and Jackson (2018) Samir Passi and Steven J Jackson. 2018. Trust in Data Science: Collaboration, Translation, and Accountability in Corporate Data Science Projects. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (nov 2018), 1–28.
  • Pearson and Charlesworth (2009) Siani Pearson and Andrew Charlesworth. 2009. Accountability as a way forward for privacy protection in the cloud. In Cloud computing. Springer, 131–144.
  • Pérez et al. (2018) Beatriz Pérez, Julio Rubio, and Carlos Sáenz-Adán. 2018. A systematic review of provenance systems. Knowledge and Information Systems (2018), 1–49.
  • Petersen et al. (2009) Kai Petersen, Claes Wohlin, and Dejan Baca. 2009. The waterfall model in large-scale development. In International Conference on Product-Focused Software Process Improvement. Springer, 386–400.
  • Porter (1992) Theodore M Porter. 1992. Quantification and the accounting ideal in science. Social studies of science 22, 4 (1992), 633–651.
  • Rader et al. (2018) Emilee Rader, Kelley Cotter, and Janghee Cho. 2018. Explanations as Mechanisms for Supporting Algorithmic Transparency. Proceedings of the International Conference on Human Factors in Computer Systems (CHI) (2018).
  • Raghavan et al. (2020) Manish Raghavan, Solon Barocas, Jon Kleinberg, and Karen Levy. 2020. Mitigating bias in algorithmic hiring: Evaluating claims and practices. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. 469–481.
  • Raji et al. (2020) Inioluwa Deborah Raji, Andrew Smart, Rebecca N White, Margaret Mitchell, Timnit Gebru, Ben Hutchinson, Jamila Smith-Loud, Daniel Theron, and Parker Barnes. 2020. Closing the AI Accountability Gap: Defining an End-to-End Framework for Internal Algorithmic Auditing. ACM Conference on Fairness, Accountability, and Transparency (2020).
  • Rees (2009) Joseph V Rees. 2009. Hostages of each other: The transformation of nuclear safety since Three Mile Island. University of Chicago Press.
  • Reisman et al. (2018) Dillon Reisman, Jason Schultz, Kate Crawford, and Meredith Whittaker. 2018. Algorithmic Impact Assessements: A Practical Framework for Public Agency Accountability. AI Now Institute Report https://ainowinstitute.org/aiareport2018.pdf.
  • Reyes et al. (2018) Irwin Reyes, Primal Wijesekera, Joel Reardon, Amit Elazari Bar On, Abbas Razaghpanah, Narseo Vallina-Rodriguez, and Serge Egelman. 2018. “Won’t somebody think of the children?” examining COPPA compliance at scale. Proceedings on Privacy Enhancing Technologies 2018, 3 (2018), 63–83.
  • Rice (1953) Henry Gordon Rice. 1953. Classes of recursively enumerable sets and their decision problems. Trans. Amer. Math. Soc. (1953), 358–366.
  • Rosner and Ames (2014) Daniela K. Rosner and Morgan Ames. 2014. Designing for repair?: infrastructures and materialities of breakdown. In Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing - CSCW ’14. ACM Press, Baltimore, 319–331.
  • Rothenberg (1999) Jeff Rothenberg. 1999. Avoiding Technological Quicksand: Finding a Viable Technical Foundation for Digital Preservation. A Report to the Council on Library and Information Resources. ERIC.
  • Rubinstein and Good (2013) Ira S Rubinstein and Nathaniel Good. 2013. Privacy by design: A counterfactual analysis of Google and Facebook privacy incidents. Berkeley Tech. LJ 28 (2013), 1333.
  • Salvendy (2012) Gavriel Salvendy. 2012. Handbook of human factors and ergonomics. John Wiley & Sons.
  • Sandvig (2015) Christian Sandvig. 2015. Seeing the Sort: The Aesthetic and Industrial Defense of “The Algorithm”. Journal of the New Media Caucus (2015). ISSN: 1942-017X, [Online] http://median.newmediacaucus.org/art-infrastructures-information/seeing-the-sort-the-aesthetic-and-industrial-defense-of-the-algorithm/.
  • Sandvig et al. (2014) Christian Sandvig, Kevin Hamilton, Karrie Karahalios, and Cedric Langbort. 2014. Auditing algorithms: Research methods for detecting discrimination on internet platforms. Data and Discrimination: Converting Critical Concerns into Productive Inquiry (2014).
  • Sasson et al. (2014) Eli Ben Sasson, Alessandro Chiesa, Christina Garman, Matthew Green, Ian Miers, Eran Tromer, and Madars Virza. 2014. Zerocash: Decentralized anonymous payments from bitcoin. In 2014 IEEE Symp. Sec. & Priv. IEEE, 459–474.
  • Selbst (2017) Andrew Selbst. 2017. Disparate Impact in Big Data Policing. Georgia Law Review 52, 109 (2017).
  • Selbst and Barocas (2018) Andrew D Selbst and Solon Barocas. 2018. The intuitive appeal of explainable machines. Fordham L. Rev. 87 (2018).
  • Selbst et al. (2019) Andrew D Selbst, Danah Boyd, Sorelle A Friedler, Suresh Venkatasubramanian, and Janet Vertesi. 2019. Fairness and abstraction in sociotechnical systems. In Conference on Fairness, Accountability, and Transparency. ACM, 59–68.
  • Selbst and Powles (2017) Andrew D Selbst and Julia Powles. 2017. Meaningful information and the right to explanation. International Data Privacy Law 7, 4 (2017), 233–242.
  • Selsam et al. (2017) Daniel Selsam, Percy Liang, and David L Dill. 2017. Developing bug-free machine learning systems with formal mathematics. arXiv:1706.08605 (2017).
  • Sendak et al. (2020) Mark Sendak, Madeleine Clare Elish, Michael Gao, Joseph Futoma, William Ratliff, Marshall Nichols, Armando Bedoya, Suresh Balu, and Cara O’Brien. 2020. “The human body is a black box” supporting clinical decision-making with deep learning. In Conference on Fairness, Accountability, and Transparency. 99–109.
  • Shilton (2018) Katie Shilton. 2018. Values and Ethics in Human-Computer Interaction. Foundations and Trends in Human-Computer Interaction 12, 2 (2018), 107–171.
  • Shilton et al. (2014) Katie Shilton, Jes A. Koepfler, and Kenneth R. Fleischmann. 2014. How to see values in social computing: Methods for Studying Values Dimensions. In Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work & Social Computing (CSCW ’14). 426–435.
  • Siegelman and Heckman (1993) Peter Siegelman and J Heckman. 1993. The Urban Institute audit studies: Their methods and findings. Clear and Convincing Evidence: Measurement of Discrimination in America, Washington 187 (1993), 258.
  • Star and Ruhleder (1994) Susan Leigh Star and Karen Ruhleder. 1994. Steps towards an ecology of infrastructure: complex problems in design and access for large-scale collaborative systems. In Proceedings of the 1994 ACM conference on Computer supported cooperative work. ACM, 253–264.
  • Star and Ruhleder (1996) Susan Leigh Star and Karen Ruhleder. 1996. Steps toward an ecology of infrastructure: Design and access for large information spaces. Information systems research 7, 1 (1996), 111–134.
  • Steinhardt (2016) Stephanie B Steinhardt. 2016. Breaking Down While Building Up: Design and Decline in Emerging Infrastructures. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems - CHI ’16. ACM Press, New York, New York, USA, 2198–2208.
  • Stodden et al. (2014) Victoria Stodden, Friedrich Leisch, and Roger D Peng. 2014. Implementing reproducible research. CRC Press.
  • Stodden and Miguez (2014) Victoria Stodden and Sheila Miguez. 2014. Best practices for computational science: Software infrastructure and environments for reproducible and extensible research. Journal of Open Research Software 2, 1 (2014), 21.
  • Turk et al. (2005) Daniel Turk, France Robert, and Bernhard Rumpe. 2005. Assumptions underlying agile software-development processes. Journal of Database Management (JDM) 16, 4 (2005), 62–87.
  • United States Office of the Director of National Intelligence (2020) United States Office of the Director of National Intelligence. 2020. Artificial Intelligence Ethics Framework for the Intelligence Community.
  • Vaughan (1996) Diane Vaughan. 1996. The Challenger launch decision: Risky technology, culture, and deviance at NASA. University of Chicago press.
  • Veale (2017) Michael Veale. 2017. Logics and practices of transparency and opacity in real-world applications of public sector machine learning. Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML) (2017).
  • Vu et al. (2013) Victor Vu, Srinath Setty, Andrew J Blumberg, and Michael Walfish. 2013. A hybrid architecture for interactive verifiable computation. In Proc. IEEE S & P.
  • Wagenknecht et al. (2016) Susann Wagenknecht, Min Lee, Caitlin Lustig, Jacki O’Neill, and Himanshu Zade. 2016. Algorithms at Work: Empirical Diversity, Analytic Vocabularies, Design Implications. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing Companion - CSCW ’16 Companion. ACM Press, New York, New York, USA, 536–543.
  • Warden (2018) Patrick Warden. 2018. The Machine Learning Reproducibility Crisis. https://petewarden.com/2018/03/19/the-machine-learning-reproducibility-crisis/.
  • Waters et al. (2004) Brent R Waters, Dirk Balfanz, Glenn Durfee, and Diana K Smetters. 2004. Building an Encrypted and Searchable Audit Log. In NDSS, Vol. 4. 5–6.
  • Weitzner et al. (2007) Daniel J Weitzner, Harold Abelson, Tim Berners-Lee, Joan Feigenbaum, James Hendler, and Gerald Jay Sussman. 2007. Information Accountability. Technical Report MIT-CSAIL-TR-2007-034. Massachussets Institute of Technology, Computer Science and Artificial Intelligence Laboratory.
  • Wieringa (2020) Maranke Wieringa. 2020. What to account for when accounting for algorithms: a systematic literature review on algorithmic accountability. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. ACM, New York, NY, 1–18.
  • Wolf et al. (2018) Christine T Wolf, Haiyi Zhu, Julia Bullard, Min Kyung Lee, and Jed R Brubaker. 2018. The Changing Contours of ”Participation” in Data-driven, Algorithmic Ecosystems: Challenges, Tactics, and an Agenda. In Companion of the 2018 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’18. ACM Press, New York, New York, USA, 377–384.
  • Wong and Mulligan (2019) Richmond Y Wong and Deirdre K Mulligan. 2019. Bringing Design to the Privacy Table: Broadening “Design” in “Privacy by Design” Through the Lens of HCI. In 2019 CHI Conference on Human Factors in Computing Systems. 1–17.
  • Young et al. (2019a) Meg Young, Lassana Magassa, and Batya Friedman. 2019a. Toward inclusive tech policy design: a method for underrepresented voices to strengthen tech policy documents. Ethics and Information Technology 21, 2 (2019), 89–103.
  • Young et al. (2019b) Meg Young, Luke Rodriguez, Emily Keller, Feiyang Sun, Boyang Sa, Jan Whittington, and Bill Howe. 2019b. Beyond open vs. closed: Balancing individual privacy and public accountability in data sharing. In Proceedings of the Conference on Fairness, Accountability, and Transparency. 191–200.
  • Zhu et al. (2018) Haiyi Zhu, Bowen Yu, Aaron Halfaker, and Loren Terveen. 2018. Value-Sensitive Algorithm Design: Method, Case Study, and Lessons. Proceedings of the ACM on Human-Computer Interaction 2, CSCW (Nov. 2018), 1–23.
  • Ziewitz (2017) Malte Ziewitz. 2017. A not quite random walk: Experimenting with the ethnomethods of the algorithm. Big Data & Society 4, 2 (2017).