AI Safety Subproblems for Software Engineering Researchers

David Gros [email protected] University of California, Davis , Prem Devanbu [email protected] University of California, Davis and Zhou Yu [email protected] Columbia University

Abstract.

In this 4-page manuscript we discuss the problem of long-term AI Safety from a Software Engineering (SE) research viewpoint. We briefly summarize long-term AI Safety, and the challenge of avoiding harms from AI as systems meet or exceed human capabilities, including software engineering capabilities (and approach AGI / “HLMI”) . We perform a quantified literature review suggesting that AI Safety discussions are not common at SE venues. We make conjectures about how software might change with rising capabilities, and categorize “subproblems” which fit into traditional SE areas, proposing how work on similar problems might improve the future of AI and SE.

AI safety, AGI, Transformative AI, AI4SE, Code Generation

^†^†conference: Make sure to enter the correct conference title from your rights confirmation emai; June 03–05, 2018; Woodstock, NY^†^†price: 15.00^†^†isbn: 978-1-4503-XXXX-X/18/06^†^†ccs: Software and its engineering^†^†ccs: Computing methodologies Artificial intelligence^†^†ccs: Social and professional topics Computing / technology policy

1. Introduction

The rise of data-driven techniques has led to rapid progress in Artificial Intelligence (AI). Since 2010, compute power used to train the largest AI models has doubled approximately every 6 months (Sevilla et al., 2022). The total private dollar investment in AI is more than 10 times what it was a decade ago (Maslej et al., 2023). Software Engineering (SE) researchers are increasingly taking note and contributing to this advance. When examining papers at large SE venues¹¹1FSE, ICSE, ASE, ISSTA in 2022, we estimate 33% mentioned AI/ML/DL terms in their title or abstract²²2based on a keyword match in Semantic Scholar corpus. https://github.com/DNGros/aisse/blob/main/ai_terms_regex.txt. This is in comparison to 4% in 2012.

The rise of AI for Software Engineering (AI4SE) and Software Engineering for AI (SE4AI) requires making AI Safety a concern of SE researchers. AI Safety refers to avoiding harm from AI. We endorse arguments that the most pressing consideration here is safety at the limit, i.e. safety as AI begins to match or exceed human capabilities in all domains. The collection of such systems is referred to as High Level Machine Intelligence (HLMI) (Grace et al., 2017), and is similar to concepts like Artifical General Intelligence (AGI) or Transformative AI (TAI). Failure to make progress on HLMI safety can have catastrophic consequences, including the extinction of humanity. SE researchers have a role to play in either exacerbating the dangers, or helping reduce risk. We will discuss why, and provide helpful conceptions of concrete AI problems, that resemble existing SE problems.

We believe that safety implications of highly advanced AI are infrequently discussed at SE conferences, despite the increase in AI4SE work. We explore this hypothesis through a citation graph analysis³³3We use the citation graph (rather than traditional keyword search like (Juric et al., 2020)) to hopefully better approximate works that give any consideration to long-term AI safety, even if it was not enough of a focus to appear in searchable fields.. We identify 44 works that are “foundational” in HLMI safety (referred to as FSafe)⁴⁴4https://github.com/DNGros/aisse/blob/main/foundation_papers.csv. The selection of the FSafe works is informed by querying all references in a set of survey papers (subsection 2.3) and examining works that are in multiple surveys. We consider a citation to any of these 44 works to be a proxy signal that a paper considers concepts of long-term AI Safety (referred to as CSafe). We use the Semantic Scholar API and corpus (Lo et al., 2020) to query citations to the FSafe. The earliest paper included in FSafe is from 1960 (Wiener, 1960). We filter CSafe to be from 2012 to March 2023.

We identify a total of 6565 unique CSafe works making 9198 citations to the FSafe works. Table 1 shows counts by venue. 3484 CSafe have “Computer Science” as an identified field.

Overall, CSafe papers are relatively rare at major computing conferences⁵⁵5using a conference list from csconferences.org. Unsurprisingly, such topics are discussed most often in top ML-focused conferences (NeurIPS, ICLR, ICML, KDD) with 225 papers, or at AI-specific conferences (AAAI, IJCAI) with 87 papers⁶⁶6This count includes only the main technical conference, but misses some safety related workshops like SafeAI@AAAI and AISafety@IJCAI. Notably, only 4 papers in SE conference venues1 reference FSafe work out of 7744 papers in the corpus at those venues⁷⁷7Includes (Tambon et al., 2021) which the corpus labels as in Automated Software Engineering conference when actually in Automated Software Engineering journal. Just 5 other CSafe papers appear in major SE journals out of 6082 entries. About half of the CSafe works are arXiv preprints or uncategorized. Scientific work often never reaches a publication, but AI Safety culturally relies heavily on self-publishing or web forums.

This analysis has many limitations. Large amounts of research (like in AI robustness or interpretability) is safety-motivated, even if it never references the FSafe discussions of long-term AI. So while this analysis only approximates existing consideration of HLMI Safety, it suggests there is currently likely limited sharing of ideas between the SE and AI Safety communities. In this manuscript we hope to help bridge some of the gap.

Domain	CSafe	All
SE (ICSE, FSE, ASE, ISSTA)	4	7744
ML (ICLR, ICML, NeurIPS, KDD)	225	24731
AI (AAAI, IJCAI)	87	17906
NLP (ACL, EMNLP, NAACL)	40	15561
Computer Vision (CVPR, ECCV, ICCV)	17	21192
PL (PLDI, POPL, ICFP, OOPSLA)	0	1550
Other Conference or Workshop	360
SE Journal (TSE, JSS, ESE, IST, TOSEM)	5	6082
Other Journal or Workshop⁸⁸8Some workshops appear classified as journals in the corpus	2742
arXiv CS (and not other)	665	199394
Venue Unknown / Other arXiv / Other	2810

Table 1. Estimated counts of works considering HLMI safety in major SE conferences and a subset of other fields.

2. AI Safety in 1-page

This page concisely summarizes AI Safety concepts as background.

AI Safety Timelines: A key consideration in safety is the timeline for how and when we reach systems more capable than humans in key parts of the economy, and in advancing science. There is evidence that AI researchers expect this could happen soon. A 2016 survey from Grace et al. (2017) and a 2022 followup (Stein-Perlman et al., 2022) asked ~700 authors who published at ML/AI conferences to predict the arrival of HLMI, defined as “when unaided machines can accomplish every task better and more cheaply than human workers”. The aggregate response put a 50% chance by 2059.

Roser (2023) discusses this survey, three other independent surveys of experts and non-experts, and models based on estimates of computation needed for intelligence tasks. This and other surveys (Keith Wynroe and Sevilla, 2023) suggest significant (¿50%) likelihood of highly capable AI in this century (thus relevant for people born today), and appreciable likelihood (¿20%) within two decades. Given the transformative effect of this event, probabilities on this scale must be taken seriously.

This is the time to attend to AI safety.

Takeoff Speed ”Takeoff” estimates how quickly AI could progress from near-human capabilities to vastly more capable than human (superintelligence, or ASI). More formal definitions try to define this in terms like changes to world economic output (Christiano, 2018).

It seems probable the physical bounds for intelligence are vastly higher than what the human brain performs (which runs on about 20 watts and has less mass than some laptops). Because HLMI could do tasks like automating AI research and hardware development, recursive self improvement could lead to unexpected progress, creating a “singularity” in development. Takeoff speed is uncertain, but might mean we must “get AI safety right the first time”.

The Alignment Problem Alignment refers to making a system follow designers’ intended goal. An AI system may prefer states about the world that it optimizes toward (eg, a chess AI prefers states of the world where it wins). Aligned HLMI should prefer states that humanity also prefers. There are nuances to alignment, such as outer- and inner-alignment, and objective robustness (Hubinger et al., 2019; Shah et al., 2022; Koch et al., 2021).

Three examples of real-world AI alignment failures include (1) the Bing chatbot doing a web search for a user’s name, and then threatening them (Hubinger, 2023), (2) Social media AI algorithms optimized for clicks/engagement time when this objective misaligned with societal benefit (Saurwein and Spencer-Smith, 2021; Bergen, 2019; Stray, 2020), (3) Code generators that produce buggy code when they are capable of producing correct code (Chen et al., 2021; Jesse et al., 2023).

Threats from Misaligned HLMI Intelligence is an ability to achieve goals in a wide range of environments (Legg and Hutter, 2007). “Instrument convergence” suggests many sub-goals are useful for all intelligent agents (like sub-goals of having access to resources or ensuring self preservation). Unless well-aligned, HLMI systems might cause catastrophic or existential harm to humanity while achieving goals. This is analogous to how humans might chop a forest or use pesticides not out of hate for animals living there, but to achieve other goals. Systems need not be malicious to cause harm, only very capable. (Bostrom, 2017)

Current Agendas Work is often categorized as “technical alignment” or “policy focused”. Focusing on the technical side, sub-areas include Agent Foundations (understanding the nature of intelligence using concepts like decision theory) (Soares and Fallenstein, 2017), Interpretability (which can make alignment easier) (Hobbhahn, 2022; Olah, 2022), Corrigibility (making systems willing to be changed by their designers) (Soares et al., 2015), Prosaic Alignment (working on alignment strategies of today’s systems using today’s tools; includes ideas like RL from Human Feedback (Christiano et al., 2023; Ouyang et al., 2022; Ba et al., 2022) or CAI (Bai et al., 2022) for aligning language models), Robustness (ensuring alignment won’t fail from out of distribution environments or from adversaries) (Hendrycks et al., 2021). See subsection 2.3 for more comprehensive lists.

2.1. Common Skepticisms

Works like (Yampolskiy, 2022) and (Russell, 2017) survey skepticism on the need for focused AI safety work. Debates are not new; In 1950 Turing (1950) argued against skeptics of thinking machines. Categories from (Yampolskiy, 2022) include:

(1)

Priorities Objections: HLMI problems are too futuristic and so unimportant compared to other problems. This objection neglects evidence that timelines to HLMI are shortening, and that safety problems might take time to solve.
(2)

Technical Objections: HLMI as unattainable. Recent successes contradict this position. Others mistakenly believe there will be trivial technical solutions to safety. Others view AI safety an impossible problem to work on currently (a view we hope diminish with work like this).
(3)

“Biased Objections”: Safety considerations will slow progress or decrease funding. This has echos of 20th-century biased research into tobacco or pollutants (Oreskes and Conway, 2010). As AI advances, this is not sustainable.

Skepticism on advanced AI and existential risks is natural: these fears seem very weird and can feel like science fiction. Yet, significant evidence supports this weird future.

2.2. SE Research’s Unique Leverage

SE researchers have an active role to play in long-term AI safety, due to the nature of software. Transformational and risky capabilities come from when machines can form and execute complex plans. It seems likely that reasoning on code will be a key skill which enables these capabilities. Highlighting this importance, a first multi-million user deployment of billion+ parameter neural networks came not for NLP or CV or RL, but for SE with Github Copilot.

Traditional research problems directed at making complex software behave as expected can be adapted to apply to AI software and also machine-written software.

2.3. Additional References

There are many survey papers on AI Safety (Everitt et al., 2018; Hendrycks et al., 2021; Juric et al., 2020; Sotala and Yampolskiy, 2014; Dwivedi et al., 2021; Russell et al., 2015; Critch and Krueger, 2020). In particular, (Everitt et al., 2018) provides a nice general survey. (Hendrycks et al., 2021) has a useful emphasis on ML. (Hendrycks and Mazeika, 2022) discusses assessing risks of AI work. (Tambon et al., 2021; Dey and Lee, 2021) are SE journal articles on ML safety (but are less HLMI focused).

AgiSafetyFundamentals.com collects a set of links and a course syllabus⁹⁹9https://agisafetyfundamentals.com/resources and ../ai-alignment-curriculum. Berkeley CHAI similarly provides a bibliography¹⁰¹⁰10https://humancompatible.ai/bibliography, ordered by topic and “priority”.

Books such as Bostrom (2017) collect key arguments in longer form. Other recent books include (Russell, 2019; Tegmark, 2017; Christian, 2021).

3. Subproblems

We outline 4 conjectures about the future of software engineering, and for each discuss 3 example problems that might need to be solved to ensure safe progress. Addressing these challenges within the SE domain could help make other AI systems safer.

3.1. Most Software Will Be Machine Written

A nearer-term problem than HLMI is when a majority of source code is written by machines (at the abstraction level of current languages. Compiled and declarative high-level languages allow most code to be “machine-written” as per 1950s abstraction levels).

Not long after deploying a generative AI autocomplete system, Google engineers reported in 2022 that approximately 3% of new internal code was AI-written (Tabachnyk and Nikolov, 2022). On the path to 50% there are many safety-relevant problems.

P1: How to reliably extract indicators of uncertainty from code generators, and support auditing?

In a world where most code is machine written, there needs to be ways to determine when the code might be misaligned with intent. A first step could accurately estimate the probability AI4SE outputs are correct. Work on calibrating ML probabilities includes (Lakshminarayanan et al., 2016; Guo et al., 2017; Kadavath et al., 2022). Careful attention is required: a claim like “the system says the code is 99% likely to be right, good enough” implies likely failure within 100s of tries. More complex modeling of uncertainty should account for varying degrees of negative impact (a 10% risk of over-counting the number of files in a directory is better than a 10% risk of deleting all the files in a directory). It could also help to localize certain regions of code which are uncertain or under-specified (eg, if asked to generate a button, the system might choose a button color or size even if it is not provided in the prompt specification). A related approach is to generate code with “holes” in uncertain regions (Guo et al., 2021). Learning about uncertainty might generalize to other generative AI systems with complex output (like in natural language (NL) or a robot action-space).

P2: How to create faithful summaries?

Humans might not be able to review all lines of code; faithful summaries can help developers understand machine-written software. Summarizing complex software outputs might also have lessons for summarizing other AI created plans that are too complex for easy review. Prior safety-motivated work has explored summarizing longform NL like books (Wu et al., 2021).

AI4SE has long studied automated summarization of source code (Zhang et al., 2022) or other artifacts like pull requests (Liu et al., 2019). The studied datasets are traditionally human written artifacts, but a long-term view likely should put increasing emphasis on artifacts coming from generators. While the summarizer might fail, its failures should ideally be anti-correlated with generator failures.

P3: Improve code provenance, accountability, and monitoring.

In the current SE paradigm, version control systems like Git makes each line of code traceable to a human author. As we move to a world where most code is machine written, code committed by a developer may not have been written by them; we thus need ways track what tool generated the code, under what conditions (e.g., prompting), and how it was audited. Commentary on this need has appeared previously (Bird et al., 2023).

3.2. Value of testing and verification increases.

While testing is an important enabler of software reliability today, gaps in testing are hopefully backstopped by having code written carefully by skilled and thoughtful humans. With improving code generators, we could expect that less human effort will be spent on implementation, and more on testing and verifying software. This conjecture is trackable with studies of developer time¹¹¹¹11eg, (Meyer et al., 2019) which estimated Microsoft developers spending 29% of time spent on implementation focused activities vs 19% on testing/reviewing/specification focused activities. If true we would expect time to shift away from implementation time..

P4: Identifying and helping writing tests for most critical code.

Testing provides a formalized way to specify intent (Lahiri et al., 2022). Thus, a potential way to improve safety is to work on ways of identifying the most important and uncertain regions of code, and guiding developers to formalize their intents as tests.

P5: Improved systems-level granularity and AI4SE safety claims.

Modern operating systems and run-times are responsible for assuring that software only accesses resources (like memory, compute time, files, cameras, etc) that it is permitted. This improves trust in software written by others. As the “others” shift from humans to AI systems, the need for such OS assurances will likely increase. Safety might be improved if an AI system could make specific, verifiable, claims, accompanying generated code (e.g, that it won’t touch files, or have certain runtime bounds; Necula’s proof-carrying code (Necula, 1997) is a historical example). Those claims could then be enforced at a system level. Improved systems controls can likely improve software safety that is intermixed with AI written code, and be a useful proxy when thinking about controlling AI outputs across domains.

P6: Responsible automated vulnerability detection tools.

There is interest in automated detection, prioritization, and repair of bugs and vulnerabilities (Russell et al., 2018; Le et al., 2022). This could have long-term benefits, but it’s important to note that vulnerability detection is a dual-use technology (vulnerabilities can be maliciously exploited). The security research community is usually already well aware of dual-use risks, but there must be expanded awareness in the SE community that AI can make exploiting vulnerabilities more scalable. Yet, it seems there is a reasonable long-term safety argument towards the research as hardening defenses and fixing vulnerabilities helps stop rapid changes from the status quo (e.g., an AI system hacking into cloud computing providers to acquire vastly more compute) without time for intervention, or help prevent the hacking of aligned AGI (Ladish and Heim, 2022).

3.3. Everyone will be a software engineer.

In the 1950s, computers were large and very expensive, and operating them was a specialized occupation. Today, with PCs, smartphones and embedded devices, everyone is a ”computer operator”.

A natural conjecture is that on the path to HLMI, everyone will become a software engineer. Users will “speak apps into existence”, an extreme form of WYSIWYG (What You See Is What You Get) or no-code/low-code editors that let users click websites, documents, or apps into existence. This transformation is not just about autogenerated code, but everyone undertaking all parts of SE (problem specification, testing, deployment, etc). Changes here are difficult to track, but estimates of low-code usage can be proxies (Gartner, 2022).

P7: Narrowing the PL $\leftrightarrow$ NL gap

There is a gap between the precision of a typical programming language and the ambiguity of natural language. Software creators of the future might not understand much about traditional programming languages. Challenges caused by this are apparent in existing low-code platforms with limited AI use (Rokis and Kirikova, 2022). One could imagine new programming languages or low-code paradigms that better mix together ambiguity and precision, and guide the user towards precision, particularly in areas with high risks of harms from intent ambiguity. While doing this, safety-conscious researchers could consider how their findings could generalize as systems scale to super-human capability or to other domains.

P8: Education of failure modes

Effective techniques must be developed to educate non-technical users how automated SE tools can fail. This is a combination of the challenges of testing low-code (sadat Khorram et al., 2020), explaining AI decisions (Miller, 2017; Mohseni et al., 2021), and out-of-distribution detection (Yang et al., 2021). A safety-conscious researcher can also think about how these education techniques can be used to teach about failure modes of broader AI.

P9: Methods of dissuading overuse and modeling trust of systems

It seems beneficial to understand when the use of an AI4SE system should be discouraged; although this runs counter to the natural goals of improving productivity, it is essential for a responsible system. Work like (Cheng et al., 2022; Liao and Sundar, 2022) has studied trust of tools.

3.4. AI written software will be used in the most critical parts of society.

Software is an essential part of financial systems, health systems, and warfighting systems. As AI4SE becomes the norm, it will likely be used in writing software for the most critical parts of society.

P10: Adapting SE techniques for ML reliability

If complex ML-based systems are deployed in critical areas, there are likely opportunities to apply SE techniques for testing and understanding ML systems. Existing CSafe in SE works are mostly in this SE4AI robustness area (Gerasimou et al., 2020; Baluta et al., 2020). Other SE4AI work has seeked to understand how to deploy AI systems (Martínez-Fernández et al., 2021; Gezici and Tarhan, 2022). Improved robustness and understanding can make aligned HLMI more stable.

P11: Defining normative values of automated software tools?

Systems like ChatGPT are helping raise awareness that designing systems require normative decisions (Weidinger et al., 2021; Jakesch et al., 2023; Ganguli et al., 2023). There must be expanded understanding of similar failure modes and norms in AI4SE (for example, (Chen et al., 2021) explains how a code model might promote stereotypes of race or gender when writing algorithms or UIs). As machines author or edit software for the most important parts of society, the decisions become increasingly important. Additionally, proactive norms are needed in topics like self-modification of AI4SE tools or automated malware generation.

P12: Building regulation, policy, and safety-conscious culture.

AI regulations are sometimes controversial (Zhang et al., 2021; Michael et al., 2022). However, when considering the use of AI4SE systems in the most safety-critical systems, the need for common enforceable rules becomes increasingly clear. Natural market forces encourage some safety precautions, but likely are not enough to prompt adequate investment into avoiding catastrophes from highly capable AI systems, while also avoiding “arms races” (Dafoe, 2018).

AI poses many complicated problems that are going to take significant effort. Thus, wide community involvement and a culture of safety and preparedness is needed.

3.5. Anti-problems for Safety

Not all problems aid safety. We will discuss one example.

Non-safety P1: Improving correctness metrics or user metrics is not necessarily net-beneficial to safety.

A natural direction of the field is to drive forward correctness metrics of tasks like automated code generation. However, capability drives are not necessarily a safety objective. For example, improving the fraction of methods passing tests in the HumanEval (Chen et al., 2021) dataset (which measure ability to synthesize programs) is not a safety objective. Similarly, optimizing user metrics like the fraction of generations accepted is not a safety objective. On the surface, one might argue that making AIs generate more correct code improves safety, compared to the alternative of AI writing incorrect code. Yet, we claim this is a bit of a false dichotomy. There is an alternative with the current status quo of human-authored code, which has many issues (high cost, inaccessible, error prone, etc), but gives more time to understand risks and safety problems. Pushing metrics like generation accuracy to their natural limit can lead to dangerously capable systems. Reaching the limit does not happen without incremental steps which on the surface seem harmless.

We are not advocating here the stronger view that progress on code tools or code generators should completely halt (there is huge upside to automating software development). Rather, we note that advances in correctness and capability, while important, don’t necessarily lead to greater long-term safety. Researchers, institutions, conferences, and the field should consider the portfolio of research being done. The proportion that is safety-focused should be brought in line with the amount of risk if the capabilities work on AI4SE succeeds and we reach the wild world where intelligent machines write most code in all parts of society.

We acknowledge that the distinction between this “anti-problem” and some of the other problems mentioned are not always ideally clear. Work on clearer boundaries and trade-offs is beneficial.

4. Conclusions

This set of conjectures and sub-problems is not intended to be comprehensive or definitive. Instead it is a set of example starting points. Solving these problems alone will not solve AI alignment (and defining what would is an open problem). There are also very real concerns of dual-use risks, and of “safety-washing” (Scholl, 2022; Vaintrob, 2023) where safety impact gets mischaracterized. However, solving pragmatic problems can be stepping stones for more robust solutions while involving more communities.

We must work to break down taboos around seriously discussing HLMI and superintelligent AGI at SE venues (and all computing fields). We hope increasing numbers of SE researchers will reflect seriously on what the future of SE looks like, and then motivate their research questions not just on short-term progress, but also on its impact in a future of highly-capable AI.

References

(1)
Ba et al. (2022) Yuntao Ba et al. 2022. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv:2204.05862 [cs.CL]
Bai et al. (2022) Yuntao Bai et al. 2022. Constitutional AI: Harmlessness from AI Feedback.
Baluta et al. (2020) Teodora Baluta et al. 2020. Scalable Quantitative Verification for Deep Neural Networks. ICSE (2020), 312–323.
Bergen (2019) Mark Bergen. 2019. YouTube Executives Ignored Warnings, Letting Toxic Videos Run Rampant. Bloomberg (2 April 2019).
Bird et al. (2023) Christian Bird et al. 2023. Taking Flight with Copilot: Early Insights and Opportunities of AI-Powered Pair-Programming Tools. Queue 20, 6 (jan 2023).
Bostrom (2017) Nick Bostrom. 2017. Superintelligence. Dunod.
Chen et al. (2021) Mark Chen et al. 2021. Evaluating Large Language Models Trained on Code.
Cheng et al. (2022) Ruijia Cheng et al. 2022. ”It would work for me too”: How Online Communities Shape Software Developers’ Trust in AI-Powered Code Generation Tools. ArXiv abs/2212.03491 (2022).
Christian (2021) Brian R. Christian. 2021. The Alignment Problem: Machine Learning and Human Values. Perspectives on Science and Christian Faith (2021).
Christiano (2018) Paul Christiano. 2018. Takeoff speeds. tinyurl.com/svtake.
Christiano et al. (2023) Paul Christiano et al. 2023. Deep reinforcement learning from human preferences. arXiv:1706.03741 [stat.ML]
Critch and Krueger (2020) Andrew Critch and David Krueger. 2020. AI Research Considerations for Human Existential Safety (ARCHES). ArXiv abs/2006.04948 (2020).
Dafoe (2018) Allan Dafoe. 2018. AI governance: a research agenda. Governance of AI Program, Future of Humanity Institute, University of Oxford: Oxford, UK 1442 (2018), 1443.
Dey and Lee (2021) Sangeeta Dey and Seok-Won Lee. 2021. Multilayered review of safety approaches for machine learning-based systems in the days of AI. JSS 176 (2021), 110941.
Dwivedi et al. (2021) Yogesh K. Dwivedi et al. 2021. Artificial Intelligence (AI): Multidisciplinary perspectives on emerging challenges, opportunities, and agenda for research, practice and policy. International Journal of Information Management 57 (2021).
Everitt et al. (2018) Tom Everitt, Gary Lea, and Marcus Hutter. 2018. AGI Safety Literature Review. ArXiv abs/1805.01109 (2018).
Ganguli et al. (2023) Deep Ganguli et al. 2023. The Capacity for Moral Self-Correction in Large Language Models. ArXiv abs/2302.07459 (2023).
Gartner (2022) Gartner. 2022. Gartner Forecasts Worldwide Low-Code Development Technologies Market to Grow 20% in 2023. Press Release. href.
Gerasimou et al. (2020) Simos Gerasimou et al. 2020. Importance-Driven Deep Learning System Testing. 2020 IEEE/ACM 42nd ICSE, Companion Proceedings (2020), 322–323.
Gezici and Tarhan (2022) Bahar Gezici and Ayça Kolukisa Tarhan. 2022. Systematic literature review on software quality for AI-based software. Empirical Software Engineering 27 (2022).
Grace et al. (2017) Katja Grace et al. 2017. When Will AI Exceed Human Performance? Evidence from AI Experts. CoRR abs/1705.08807 (2017). arXiv:1705.08807
Guo et al. (2017) Chuan Guo, Geoff Pleiss, Yu Sun, and Kilian Q. Weinberger. 2017. On Calibration of Modern Neural Networks. In International Conference on Machine Learning.
Guo et al. (2021) Daya Guo et al. 2021. Learning to Complete Code with Sketches. In International Conference on Learning Representations.
Hendrycks et al. (2021) Dan Hendrycks et al. 2021. Unsolved Problems in ML Safety. CoRR abs/2109.13916 (2021). arXiv:2109.13916 https://arxiv.org/abs/2109.13916
Hendrycks and Mazeika (2022) Dan Hendrycks and Mantas Mazeika. 2022. X-Risk Analysis for AI Research. ArXiv abs/2206.05862 (2022).
Hobbhahn (2022) Marius Hobbhahn. 2022. The Defender’s Advantage of Interpretability. href.
Hubinger (2023) Evan Hubinger. 2023. Bing chat is blatantly, aggressively misaligned. https://www.lesswrong.com/posts/jtoPawEhLNXNxvgTT.
Hubinger et al. (2019) Evan Hubinger et al. 2019. Risks from learned optimization in advanced machine learning systems. arXiv preprint arXiv:1906.01820 (2019).
Jakesch et al. (2023) Maurice Jakesch et al. 2023. Co-Writing with Opinionated Language Models Affects Users’ Views. ArXiv abs/2302.00560 (2023).
Jesse et al. (2023) Kevin Jesse et al. 2023. Large Language models and Simple, Stupid, Bugs. In Proceedings, MSR.
Juric et al. (2020) Mislav Juric et al. 2020. AI safety: state of the field through quantitative lens. CoRR abs/2002.05671 (2020). arXiv:2002.05671 https://arxiv.org/abs/2002.05671
Kadavath et al. (2022) Saurav Kadavath et al. 2022. Language Models (Mostly) Know What They Know.
Keith Wynroe and Sevilla (2023) David Atkinson Keith Wynroe and Jaime Sevilla. 2023. Literature review of Transformative Artificial Intelligence timelines.
Koch et al. (2021) Jack Koch et al. 2021. Objective Robustness in Deep Reinforcement Learning. CoRR abs/2105.14111 (2021). arXiv:2105.14111 https://arxiv.org/abs/2105.14111
Ladish and Heim (2022) Jeffrey Ladish and Lennart Heim. 2022. Information security considerations for AI and the long term future. https://tinyurl.com/eaaissec.
Lahiri et al. (2022) Shuvendu K Lahiri et al. 2022. Interactive Code Generation via Test-Driven User-Intent Formalization. arXiv preprint arXiv:2208.05950 (2022).
Lakshminarayanan et al. (2016) Balaji Lakshminarayanan, Alexander Pritzel, and Charles Blundell. 2016. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In NIPS.
Le et al. (2022) Triet HM Le et al. 2022. A survey on data-driven software vulnerability assessment and prioritization. Comput. Surveys 55, 5 (2022), 1–39.
Legg and Hutter (2007) Shane Legg and Marcus Hutter. 2007. A Collection of Definitions of Intelligence.
Liao and Sundar (2022) Qingzi Vera Liao and S. Shyam Sundar. 2022. Designing for Responsible Trust in AI Systems: A Communication Perspective. FAccT (2022).
Liu et al. (2019) Zhongxin Liu et al. 2019. Automatic Generation of Pull Request Descriptions. ASE (2019), 176–188.
Lo et al. (2020) Kyle Lo et al. 2020. S2ORC: The Semantic Scholar Open Research Corpus. Association for Computational Linguistics, Online, 4969–4983.
Martínez-Fernández et al. (2021) Silverio Martínez-Fernández, , et al. 2021. Software Engineering for AI-Based Systems: A Survey. TOSEM 31 (2021), 1 – 59.
Maslej et al. (2023) Nestor Maslej et al. 2023. The AI Index 2023 Annual Report. AI Index Steering Committee, Institute for Human-Centered AI, Stanford University.
Meyer et al. (2019) André N. Meyer et al. 2019. Today Was a Good Day: The Daily Life of Software Developers. IEEE Transactions on Software Engineering 47 (2019), 863–880.
Michael et al. (2022) Julian Michael et al. 2022. What Do NLP Researchers Believe? Results of the NLP Community Metasurvey. ArXiv abs/2208.12852 (2022).
Miller (2017) Tim Miller. 2017. Explanation in Artificial Intelligence: Insights from the Social Sciences. Artif. Intell. 267 (2017), 1–38.
Mohseni et al. (2021) Sina Mohseni et al. 2021. A Multidisciplinary Survey and Framework for Design and Evaluation of Explainable AI Systems. ACM Trans. Interact. Intell. Syst. 11, 3–4, Article 24 (sep 2021), 45 pages. https://doi.org/10.1145/3387166
Necula (1997) George C Necula. 1997. Proof-carrying code. In Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 106–119.
Olah (2022) Chris Olah. 2022. Mechanistic interpretability, variables, and the importance of interpretable bases. tinyurl.com/circuitsthread.
Oreskes and Conway (2010) Naomi Oreskes and Erik M. Conway. 2010. Merchants of Doubt: How a Handful of Scientists Obscured the Truth on Issues from Tobacco Smoke to Global Warming.
Ouyang et al. (2022) Long Ouyang et al. 2022. Training language models to follow instructions with human feedback. arXiv:2203.02155 [cs.CL]
Rokis and Kirikova (2022) Karlis Rokis and Marite Kirikova. 2022. Challenges of Low-Code/No-Code Software Development: A Literature Review. In Perspectives in Business Informatics Research: 21st International Conference on Business Informatics Research, BIR 2022, Rostock, Germany, September 21–23, 2022, Proceedings. Springer, 3–17.
Roser (2023) Max Roser. 2023. Ai timelines: What do experts in artificial intelligence expect for the future? https://ourworldindata.org/ai-timelines
Russell et al. (2018) Rebecca L. Russell et al. 2018. Automated Vulnerability Detection in Source Code Using Deep Representation Learning. 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA) (2018), 757–762.
Russell (2017) Stuart Russell. 2017. Provably Beneficial Artificial Intelligence. href.
Russell (2019) Stuart Russell. 2019. Human Compatible: Artificial Intelligence and the Problem of Control.
Russell et al. (2015) Stuart J. Russell, Dan Dewey, and Max Tegmark. 2015. Research Priorities for Robust and Beneficial Artificial Intelligence. AI Mag. 36 (2015), 105–114.
sadat Khorram et al. (2020) Faezeh sadat Khorram et al. 2020. Challenges & opportunities in low-code testing. Proceedings of the 23rd ACM/IEEE International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings (2020).
Saurwein and Spencer-Smith (2021) Florian Saurwein and Charlotte Spencer-Smith. 2021. Automated Trouble: The Role of Algorithmic Selection in Harms on Social Media Platforms. Media and Communication (2021).
Scholl (2022) Adam Scholl. 2022. Safetywashing. https://tinyurl.com/lwswScholl.
Sevilla et al. (2022) Jaime Sevilla et al. 2022. Compute Trends Across Three Eras of Machine Learning. arXiv:2202.05924 [cs.LG]
Shah et al. (2022) Rohin Shah et al. 2022. Goal Misgeneralization: Why Correct Specifications Aren’t Enough For Correct Goals. arXiv:2210.01790 [cs.LG]
Soares et al. (2015) Nate Soares et al. 2015. Corrigibility. In Workshops at the Twenty-Ninth AAAI.
Soares and Fallenstein (2017) Nate Soares and Benya Fallenstein. 2017. Agent Foundations for Aligning Machine Intelligence with Human Interests: A Technical Research Agenda.
Sotala and Yampolskiy (2014) Kaj Sotala and Roman V Yampolskiy. 2014. Responses to catastrophic AGI risk: a survey. Physica Scripta 90 (2014).
Stein-Perlman et al. (2022) Zach Stein-Perlman et al. 2022. 2022 expert survey on progress in AI. href.
Stray (2020) Jonathan Stray. 2020. Aligning AI Optimization to Community Well-Being. International Journal of Community Well-Being 3 (2020), 443 – 463.
Tabachnyk and Nikolov (2022) Maxim Tabachnyk and Stoyan Nikolov. 2022. ML-enhanced code completion improves developer productivity. https://tinyurl.com/googleBlogCodegen.
Tambon et al. (2021) Florian Tambon et al. 2021. How to certify machine learning based safety-critical systems? A systematic literature review. Automated Software Engineering (2021).
Tegmark (2017) Max Tegmark. 2017. Life 3.0: Being Human in the Age of Artificial Intelligence.
Turing (1950) Alan Mathison Turing. 1950. Mind. Mind 59, 236 (1950), 433–460.
Vaintrob (2023) Lizka Vaintrob. 2023. Beware safety-washing. https://tinyurl.com/lwswVaintrob.
Weidinger et al. (2021) Laura Weidinger et al. 2021. Ethical and social risks of harm from Language Models. arXiv:2112.04359 [cs.CL]
Wiener (1960) Norbert Wiener. 1960. Some Moral and Technical Consequences of Automation. Science 131 3410 (1960), 1355–8.
Wu et al. (2021) Jeff Wu et al. 2021. Recursively Summarizing Books with Human Feedback.
Yampolskiy (2022) Roman V Yampolskiy. 2022. AI Risk Skepticism. In Philosophy and Theory of Artificial Intelligence 2021. Springer, 225–248.
Yang et al. (2021) Jingkang Yang, Kaiyang Zhou, Yixuan Li, and Ziwei Liu. 2021. Generalized Out-of-Distribution Detection: A Survey. ArXiv abs/2110.11334 (2021).
Zhang et al. (2021) Baobao Zhang et al. 2021. Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers.
Zhang et al. (2022) Chunyan Zhang et al. 2022. A Survey of Automatic Source Code Summarization. Symmetry 14 (2022), 471.