Friendly AI – Good Tech, Bad Tech

forthcoming in AI and Ethics

Link to paper: https://link.springer.com/article/10.1007/s43681-021-00051-6

Abstract

The past few decades have seen a substantial increase in the focus on the myriad ethical implications of artificial intelligence. Included amongst the numerous issues is the existential risk that some believe could arise from the development of artificial general intelligence (AGI) which is an as-of-yet hypothetical form of AI that is able to perform all the same intellectual feats as humans. This has led to extensive research into how humans can avoid losing control of an AI that is at least as intelligent as the best of us. This ‘control problem’ has given rise to research into the development of ‘friendly AI’ which is a highly competent AGI that will benefit, or at the very least, not be hostile toward humans. Though my question is focused upon AI, ethics and issues surrounding the value of friendliness, I want to question the pursuit of human-friendly AI (hereafter FAI). In other words, we might ask whether worries regarding harm to humans are sufficient reason to develop FAI rather than impartially ethical AGI, or an AGI designed to take the interests of all moral patients—both human and non-human—into consideration. I argue that, given that we are capable of developing AGI, it ought to be developed with impartial, species-neutral values rather than those prioritizing friendliness to humans above all else.

Introduction

The past few decades have seen a geometric increase in the number of pages dedicated to the ethical implications of artificial intelligence, and rightly so. Included amongst the numerous issues is the existential risk that some believe could arise from the development of artificial general intelligence (AGI) which is an as-of-yet hypothetical form of artificial intelligence that is able to perform all the same intellectual tasks as humans. This has led to extensive research into how humans can avoid losing control of an AI that is at least as intelligent as the best of us. This is often known as the ‘control problem’ and it has given rise to research into the more specific areas of ‘friendly AI’¹ and value-alignment (i.e., the aligning of AGI’s values, or at least its behaviors, with those of humans). ‘Friendly AI’ refers to highly competent AGI that will benefit, or at the very least not be hostile toward, humans. In this paper, I want to question the pursuit of human-friendly AI (hereafter FAI).

Worries regarding the failure to successfully develop FAI have led to a significant focus on issues of ‘motivation control’ (e.g., via machine ethics,² value-alignment,³ Oracle AI,⁴ etc.). But it remains reasonable to ask whether worries regarding harm to humans is a sufficient reason to develop FAI rather than impartially ethical AGI, or AGI designed to take the interests of all moral patients into consideration. I will argue that given we are capable of developing AGI, it ought to be developed with impartial, species-neutral values rather than those prioritizing friendliness to humans above all else.⁵

Before proceeding, some brief definitions will help set the stage.

By ‘artificial general intelligence’ or ‘AGI’ I mean a non-biological intelligence that possesses all of the same intellectual capabilities as a mentally competent adult human.⁶
By ‘artificial superintelligence’ or ‘ASI’ I intend a non-biological intelligence that far outstrips even the brightest of human minds across all intellectual domains and capacities.⁷
By ‘Friendly AI’ or ‘FAI’ I intend artificial general intelligence which will benefit or, at the very least, not be hostile toward or bring harm to human beings. In particular, FAI will make decisions based upon the assumption that human interests alone are intrinsically valuable.⁸
By ‘Impartial AI’ or ‘IAI’ I intend AGI that is developed so that its decision-making procedures consider the interests⁹ of all moral patients (i.e., any being that can be harmed or benefitted¹⁰) to be intrinsically valuable. Moreover, in being truly impartial, such procedures will be species-neutral rather than human-centered in nature. That is, IAI will not favor any entity simply because it’s a member of a particular species or exists in the present.
By a ‘moral patient’ I mean any entity that is worthy of any level of moral consideration for its own sake. While there might be significant disagreement regarding exactly which entities are worthy of such consideration, I take it to be uncontroversial that the capacity to suffer is a sufficient condition for being worthy of consideration. So, at minimum, all beings that can suffer will enter into the moral calculus of IAI.

Assumptions

For the purposes of this paper, I will assume the following:

We are capable of creating AGI. We are, or soon enough will be, capable of creating artificial general intelligence.¹¹
AGI will become ASI. The emergence of AGI will eventually, and possibly quite quickly, lead to artificial superintelligence (ASI).¹²
We can create either impartial AI (IAI) or human-friendly AI (FAI). AGI will learn, accept, retain and act upon the values that it’s given. Because of its vast intelligence, together with its lack of emotional and cognitive shortcomings, ASI will be able to discern then act according to either (a) impartial and species-neutral ethical values¹³; or (b) human-friendly ethical values (i.e., it will either give moral consideration only to humans or it will favor humans in any situations in which there are conflicting interests).¹⁴
IAI may pose a substantial, even existential threat to humans. An ethically impartial artificial intelligence may, in the course of acting on impartial values, pose a substantial, even existential threat to human beings.¹⁵

While none of these assumptions are uncontroversial, they are accepted in some form or other by those searching for ways to align the values of AGI and humans as well as those who fret about the control problem more generally.

The central question of this paper is as follows: given the conjunction of these assumptions, would we be morally obligated to create IAI rather than FAI? Or, would it be morally permissible for humans to create FAI, where FAI [1] is generally intelligent, [2] will become superintelligent, [3] is programmed in accordance with, and even causally determined by, values focused exclusively on friendliness to humans; even if [4] such values are not consistent with an impartial, non-speciesist perspective¹⁶—and may, as a result, lead to significant suffering for non-human moral patients¹⁷ or reduce the possibility of other intelligent, sentient species that might have otherwise evolved from those we harm or extinguish? I will argue that, given the assumptions listed above, we are obligated to create IAI rather than FAI, even if IAI poses a significant, even existential, threat to humans.

I will provide two arguments for favoring IAI over FAI, each based upon our moral responsibilities to human beings as well as non-human moral patients. The first (Sect. 3) will focus upon our responsibilities to actual, currently existing beings while the second (Sect. 4.2) will focus upon our responsibilities to possible¹⁸ beings other than humans. Before doing so, I will expand upon the distinction between FAI and IAI, then provide a brief discussion of speciesism to clarify the species-neutral approach I will be defending.

Friendly AI (FAI) vs ethically impartial AI (IAI)

Friendly AI is artificial general (or, super-) intelligence that will respect the interests of, or at the very least, not be hostile toward humanity. It’s likely to be the case that friendliness is a necessary condition for humanity to enjoy the potential benefits that a general or superintelligence might provide (e.g., ending world hunger, curing diseases, solving the challenges of global warming and issues of social organization more generally). Of course, none of this is beneficial for humans if ASI views us as something to be rid of. A being of such vastly superior intelligence, if given goals that turn out to be inconsistent with human interests, could view us as competition or mere obstacles to be done away with.¹⁹ As intelligence is the primary tool by which humans have come to their place at the top of the terrestrial hierarchy, the superior intelligence of a hostile ASI will make our survival highly unlikely and our thriving even less so.

To avoid such disastrous results, AGI might be developed in a way that prohibits or disables its capacity to harm humans or to do that which humans command (other than that which will bring harm to humans). There are at least two different things that researchers might mean when they speak of ‘friendly AI’. First, one might intend AI that will act according to what humans believe to be in the interest of humans. In the second case, one might intend AI that will act according to what is, in fact, in the interest of humans. These are certainly different as it might be that we are mistaken about what is in our best interest.²⁰ And while there are problems with either of these approaches, even apart from there existing a possible alternative in IAI, this is not the place to develop them in any detail.²¹

As opposed to FAI, the idea of ‘ethically impartial’ AI (or, IAI) assumes, first, that there exists a set of impartial, species-neutral moral facts.²² Second, due to its vast intelligence, IAI will have the ability to discover and act according to these facts, in addition to having a far superior ability to accurately calculate the consequences of any action. Third, because of its lack of akratic emotions, cognitive biases and weakness of will, together with its superintelligence and impartial, non-speciesist goal-orientation, it will be far more likely to act effectively according to this set of moral facts than humans likely ever could.

Existing comparable literature

Though there is little, I believe, that has been written about Impartial Artificial Intelligence, there do exist views of FAI that may be consistent with IAI depending on how they are filled out. In particular, I have in mind Muehlhauser and Bostrom [12] as well as Yudkowsky [29].²³ According to the former:

The problem of encoding human (or at least humane) values into an AI’s utility function is a challenging one, but it may be possible. If we can build such a ‘Friendly AI,’ we may not only avert catastrophe but also use the powers of machine superintelligence to do enormous good.²⁴

Yudkowsky [29] characterizes the “problem statement of Friendly AI” as to “[e]nsure that the creation of a generally intelligent, self-improving, eventually superintelligent system realizes a positive outcome”.²⁵

The inclusion of ‘humane’ and ‘enormous good’ in the Muehlhauser and Bostrom [12] characterization is important but, depending on how one interprets these terms, this characterization may or may not imply a ‘species-neutral’ approach to the development of AGI. One way to read this paper is as an argument for why we ought to interpret these terms so as to have this implication. The same can be said for the use of ‘positive outcome’ in Yudkowsky [29].

Simply put, if one includes a commitment to species-neutrality in one’s view of FAI then the distinction between FAI and IAI collapses. Nonetheless, my point still stands insofar as there would then be two distinct interpretations of FAI (exclusively human friendly and species-neutral) and I should be read as supporting the more inclusive, species-neutral interpretation.

I turn now to a brief discussion of the moral status of non-humans to specify what I mean by a ‘species-neutral’ approach to ethics.

Speciesism and the moral standing of non-humans

The classic presentation of, and objections to, speciesism can be found in Peter Singer’s Animal Liberation.²⁶ He characterizes speciesism as “a prejudice or attitude of bias in favor of the interests of members of one’s own species and against those of members of other species”.²⁷

I will understand speciesism—meant to be analogous with other ‘isms’ such as racism and sexism—as favoring members of one’s own species on the basis of their being members of one’s own species.

I will also understand the rejection of speciesism as the acceptance of the view that sentient non-humans have at least some intrinsic moral status. That is, sentient non-humans have interests that are morally significant in their own right and not simply as instrumental in furthering human interests.

To clarify the position I’m proposing, it will be helpful to appeal to DeGrazia’s [7] distinction between Singer’s ‘equal-consideration framework’ and the ‘sliding-scale model’.²⁸ According to the equal-consideration approach:

No matter what the nature of the being, the principle of equality requires that its suffering be counted equally with the like suffering—insofar as rough comparisons can be made—of any other being.²⁹

As it concerns pain, all animals are equal on the equal-consideration view. But, as Singer notes, equal consideration (i.e., of suffering) does not imply the equal value of lives. Certain characteristics can make a being’s life more valuable even if it doesn’t make that being’s suffering of greater significance. He suggests self-awareness, the capacity to think about one’s future, the capacity for abstract thought, meaningful relations with others and having goals amongst the relevant “mental powers”.³⁰ So, on Singer’s ‘equal-consideration view’, while all animals are equal when it comes to pain, this does not imply the equal value of lives.

On the ‘sliding-scale’ approach:

Beings at the very top have the highest moral status and deserve full consideration. Beings somewhat lower deserve very serious consideration but less than what the beings on top deserve. As one moves down this scale of moral status or moral consideration, the amount of consideration one owes to beings at a particular level decreases. At some point, one reaches beings who deserve just a little consideration. Their interests have direct moral significance, but not much, so where their interests conflict with those with much higher moral status, the former usually lose out.³¹

This sliding-scale approach is what I intend by an ‘impartial, species-neutral approach’ to moral consideration, both as it relates to the significance of suffering and lives.³² Like Singer’s equal consideration approach, this view rejects speciesism insofar as it appeals to morally relevant features rather than species membership when grounding moral judgments with regard to both the treatment and consideration owed to beings. But it does not require that all beings be given equal consideration when it comes to pain.

Just as either being or not being a member of a sub-class of humans based upon superficial features such as gender, skin-color or age is not morally relevant in itself, being a member of a species is likewise not morally relevant. On the other hand, features that contribute to one’s ability to experience or enjoy one’s life more fully—such as intelligence, awareness of one’s past and future, or meaningful communication with others—are morally relevant and also serve to ground our commonsense moral judgments. Features of this sort, as well as the capacity to suffer, are clearly relevant to how a being ought to be treated in ways that its biological or genetic makeup is not.

As for those who still believe that our biological humanity (or membership in the species Homo Sapiens) makes us morally superior to other creatures, consider a non-human species with all the same intellectual, perceptual and phenomenological features possessed by typical humans. The members of this species are self-aware, rational, can plan for the future and form goals as well as hopes, they also feel pleasure and pain of various physical, emotional and intellectual sorts. The extreme speciesist is committed to the view that such beings are not worthy of consideration (or, worthy of less consideration) simply because its members are not human. I take it that this view is untenable.

If we accept that certain cognitive features allow us to make principled moral distinctions in choosing differential treatment of certain beings, then we should accept that some sentient non-humans deserve an elevated level of consideration relative to others in light of evidence that some non-humans possess self-awareness, altruism, tool and language-use, the capacity to reciprocate, varying forms of communication, empathy and a sense of fairness, among other morally relevant features.³³

With this said, I will proceed with the assumption that the view that only humans are worthy of morally consideration simply because of their ‘humanness’ is false, and that a being ought to be given moral consideration on the basis of its morally relevant features.

The argument from non-human sentients

As moral agents, humans have responsibilities to all sentients (i.e., moral patients) that can be affected by our actions.
Given humans are capable of creating more than one kind of AI (and given that we will create at least one kind), if one of these will better respect the interests of all sentients, then ceteris paribus we ought, morally, to create that kind of AI.
IAI will, ceteris paribus, better respect the interests of all sentients than FAI.
Humans are capable of creating IAI or FAI (this is assumption 3 in the introductory section above).
∴ Given the option to create either IAI or FAI, then ceteris paribus we ought, morally, to create IAI.

Premise 1 (i.e., humans have responsibilities toward all sentients that can be affected by our actions): Accepting this premise does not require accepting equal treatment or equal consideration of all moral patients. It only requires that all moral patients are worthy of moral consideration and that their interests ought to be respected. In fact, it is consistent with premise 1 that humans are exceptional and worthy of greater moral consideration, but this doesn’t imply that we are to be favored, or even protected, at all cost! The enormity of some costs to non-human moral patients may override the justification of protecting human interests, especially when the interests in question are uncontroversially trivial in nature (e.g., our interest in sport-killing, palate-pleasures, or believing ourselves to be dominant and superior).

Premise 2 (If we can create something that will better respect the interests of all sentients, then, all else being equal, we ought morally to do so): This follows from premise 1. Suppose humans are on the verge of creating AGI. Now suppose that we have a choice between creating a particular version of AGI which will wreak havoc and cause massive suffering to all non-humans and a different version which will leave things as they are (i.e., it will cause no more suffering than currently exists for both humans and non-humans). I take it that, all else being equal, we clearly ought to opt for the latter.

Premise 3 (IAI will better respect the interests of all sentients than FAI): To support this premise I will begin by explaining how FAI will not be equivalent to IAI. I will then explain how IAI will better respect the interests of all sentients. This will involve both how IAI will, itself, better respect all relevant interests as well as the positive consequences this fact will have on human actions. More explicitly, the development of IAI will lead to better consequences for all moral patients collectively because: [a] IAI will better respect the interests of all moral patients than FAI, and [b] the impact of IAI upon human behavior will have better consequences for non-humans than the impact of FAI upon human behavior.

First, note that FAI’s consideration of non-human interests will be restricted to cases in which such consideration is, or such interests are, instrumentally valuable. Otherwise, such interests will be disregarded entirely. This is implied by the notion of artificial intelligence that is solely focused upon friendliness to humans. While there may be cases where friendliness to humans implies granting consideration to non-humans, it will only be a result of such consideration being in the interest of humans. Otherwise, FAI is equivalent to IAI.³⁴

While there are sure to be many cases where the interests of non-humans coincide with those of humans (e.g., cases where the value of biodiversity, and its importance to humans, leads a purely rational FAI to preserve and respect some species that are otherwise dangerous to humans) there are countless others in which this is unlikely. For example, FAI with the extraordinary capabilities often attributed to AGI/ASI³⁵ could manufacture alternative ways of preserving the human value of biodiversity while extinguishing some sentient species that are dangerous to humans. For instance, it might eradicate aggressive species of bears, dogs, lions, tigers, rhinos, sharks, snakes, primates, etc. and replace them with non-aggressive versions (or, to preserve the ‘food chain’, versions that are only aggressive toward their natural, non-human prey).³⁶ Such cases do not constitute instances where human interests are consistent with those of existing non-human sentients.

While the replacement of aggressive species with non-aggressive versions would certainly benefit humans, it would not constitute consideration of the interests of such species. In fact, preserving non-aggressive versions of such species exemplifies the view that their consideration is merely instrumentally valuable. For this reason, FAI would not, in fact, amount to, or be consistent with, IAI.

Second, being imbued with impartial moral values will make IAI less likely than both humans and FAI to harm sentient non-humans to protect relatively trivial interests of humans (e.g., ‘palate-pleasures’, ‘sport-killing’, feelings of superiority and dominance, etc.). It will also be more likely to benefit non-humans directly as well as discover and pursue strategies for optimizing the interests of all species simultaneously whenever possible.

Moreover, FAI, being human-centric in its behavior, may also encourage and reinforce already widespread human tendencies to disregard or trivialize non-human interests. This is because the awe and deference with which we are likely to interpret the actions and the commands of FAI are liable to lead us to see such actions as justified and, in turn, exacerbate human harm to non-humans.

Numerous classic studies suggest a strong human tendency to defer to authority (e.g., Stanford, Milgram, Solomon Asch). The Nuremberg trials also saw a majority of Nazis claim that they were only ‘following orders’ out of fear of punishment. A being as intelligent and powerful as FAI is at least as likely as any charismatic human to induce a sense of authority, even awe, in us as it pursues its goals. And given that convincing humans to consider non-human interests would, all else being equal, yield consequences preferable to their being species-centered, it seems at the very least incredibly likely that IAI would convince humans of this. And if charisma is deemed to be a necessary condition for persuading humans, AGI will acquire charisma as well.

Relatedly, it’s important not to underestimate or overlook FAI’s (or any ASI’s) powers of persuasion. By logical influence, effective appeals to emotional weakness, threats, brute force or some combination, FAI will convince us to aid in the achievement of its goals. Just as AGI/ASI would employ inanimate resources in the pursuit of its goals, there’s no good reason to believe that it wouldn’t also use living resources (especially relatively competent ones such as ourselves) for the same reasons.³⁷ Such powers are likely to lead to our deferring to FAI when making moral choices. Just as a child tends to imitate the adults it observes, we are likely to imitate and internalize the values of an FAI, the intelligence of which will surpass our own to an extent far beyond that of an adult relative to a child. Of course, if the AI’s values are human-centric this will lead to widespread dismissal of non-human interests, or consideration only where it’s instrumentally valuable for us.

On the other hand, IAI, as impartial, will not reinforce these shortcomings. Observing a superior moral actor (or one that can surely convince us that it’s a superior moral actor) is likely to encourage better practices in humans.

Much of what was said in support of premise 3 can be used here to support the likelihood that IAI will also affect our beliefs as well as our behavioral tendencies. It’s again important to avoid overlooking the persuasive power IAI is sure to possess. Its immense intelligence will allow it to quickly and easily persuade us of whatever it wants to persuade us of.³⁸ In fact, it may also employ force to persuade us to act in accordance with its species-neutral goals. As noted above, if it would use technology and inanimate resources to pursue its goals, it’s hard to fathom why it wouldn’t also persuade (or force) us to assist in the achievement of its goals. As is the case for FAI, IAI’s powers of persuasion are likely to lead to our deferring to it when making moral choices. Just as children tend to imitate the adults they observe, we are likely to imitate and, over time, internalize the values of IAI as its intelligence and authority becomes ever clearer. This will more than likely result in far greater consideration for non-human moral patients than would result from the FAI scenario.

While the foregoing focuses upon our responsibilities to actual beings beyond humans, I turn now to our responsibilities to possible³⁹ beings other than humans.

The argument from future species

Even if one rejects that any currently existing non-humans are worthy of moral consideration, one ought to accept that any species morally equal, or even superior, to humans ought to be granted, at the very least, consideration equal to that which we grant to humans. This approach requires the weakest anti-speciesist position, and one that I expect will be very widely accepted.⁴⁰

Preliminaries

Evolutionary theory suggests that humans exist on a biological continuum that just so happens to have us at the intellectual and, perhaps, moral peak for the time being. Darwin himself believed that the difference in ‘mental powers’ between humans and non-humans is a matter of degree rather than kind.

If no organic being excepting man had possessed any mental power, or if his powers had been of a wholly different nature from those of the lower animals, then we should never have been able to convince ourselves that our high faculties had been gradually developed. But it can be clearly shewn that there is no fundamental difference of this kind. We must also admit that there is a much wider interval in mental power between one of the lowest fishes…and one of the higher apes, than between an ape and man; yet this immense interval is filled up by numberless gradations…There is no fundamental difference between man and the higher mammals in their mental faculties.⁴¹

This notion of a biological continuum raises a further issue for the tendency to overemphasize friendliness to humans in AGI development. I will argue that if the reasons put forward against speciesism in Sect. 2.1 succeed in the very weakest sense, then there are further reasons for favoring IAI over FAI beyond those given in Sect. 3. Moreover, if my reasoning is correct, there is a less obvious implication—that IAI ought to decide whether to become an existential threat to humans or a more limited threat to the freedom of humans—that, as I will argue, also follows.

Regarding the latter point, I will argue that IAI should be developed to determine the likelihood of a ‘superior’ or even equal species emerging from currently existing (or soon to exist) species. It ought also be designed to determine the global moral value (i.e., the value to all moral patients including itself) of such species emerging together with the chance of its emerging. Finally, IAI ought to be designed to determine whether, if the continued existence of humans lessens the chances of superior species emerging, eliminating humans would be morally preferable to allowing our continued existence but with sufficiently restricted freedom of choice and action.

The argument

Given our continued existence, humans are sure to destroy (or contribute to the destruction of) the majority of species on the planet.⁴² This is nothing more than a well-founded extrapolation from the extinctions we’ve already contributed to together with the likelihood of the continuation of the reasons why these extinctions occurred (e.g., human singlemindedness, shortsightedness, weakness of will, time preference, temporal discounting, selfishness, etc.). In so doing, humans are also, in essence, destroying all of the species that would have otherwise evolved from each of these extinguished species. In fact, it’s quite possible that we will, or maybe already have, extinguished species that would have given rise to one or more species that would have been morally superior to ourselves (according to any plausible moral metric). For whatever species-independent features one might believe underlie the supposed current moral superiority of humans, there is no good reason to believe that such features could not evolve, even to a greater degree, in non-humans. Nonetheless, the above-noted widely shared human features (i.e., singlemindedness, shortsightedness, etc.) suggest that even if humans could calculate the likelihood that some of these species would be morally superior to themselves, they are unlikely to submit to the relevant changes that would be required to allow for the emergence of such species.

With that said, epistemic and cognitive limitations suggest that humans are not in a position to calculate the likelihood of such species. On the other hand, if we could develop something that could [1] calculate whether any such species would be morally superior (or equal) to humans, [2] calculate the likelihood of such species emerging, and [3] act impartially on the basis of such calculations, then we ought to do it. Note that this implication holds because of the weakest non-speciesist version of premise 1 in the argument in Sect. 3 (i.e., as moral agents, humans have responsibilities to all moral patients that can be affected by our actions) as well as for the very reasons that one might believe humans to be currently morally superior to non-humans (e.g., rationality, greater future-directed preferences, capacity for complex acts of communication, etc.). Of course, IAI could accomplish [1–3]. At the very least, it would be far more likely to be able to do so than humans (and far more likely to actually do so than FAI).

Beyond this, IAI should be created to decide whether humans continue to exist at all, and if it decides that we do, it should also decide what kind of latitude (e.g., freedom, privacy, property, etc.) we are to be allowed. This is consistent with premise 1 from my argument in 2.2 for the following reasons:

IAI will know (or, eventually come to know) the impartial, species-neutral moral facts.
IAI will be capable of coming to possess a superior understanding of adaptation and natural selection. This could be accomplished by way of training via relevant materials (e.g., raw data, the internet, scientific journals, Darwin’s On the Origin of Species, the ability to run countless simulations, etc.) together with its superior reasoning capabilities.
As impartial, IAI will be motivated to possess and apply this understanding of adaptation and natural selection on the basis of its judgment that morality is species-neutral and that, if this judgment is correct, future beings are morally relevant.
Given 1 and 2, IAI will be well-positioned (and far better so than humans) to determine whether the species that may evolve from those we are likely to endanger will be equal or morally superior to humans.

Given 1–4, IAI ought to be created in order to decide whether to become an existential threat to human beings, or merely a threat to the freedom of humans.⁴³ This claim is distinct from the claims in the previous section insofar as there I argued that we ought to create IAI rather than FAI. Here I’m making the further claim that not only should we opt for IAI rather than FAI, but we also ought to develop an impartial ASI so that it can decide what ought to be done with us.⁴⁴

Why create AGI at all?

While one might agree that IAI is morally preferable to FAI, one might still question the permissibility of developing AGI to begin with. In other words, while I’ve been assuming that we can and will create AGI, one might reject the idea that we ought to do so.⁴⁵

I must admit that I am somewhat sympathetic to this objection. In fact, assuming that my thesis is correct, it seems that creating IAI where the interests of all sentients are taken into consideration—rather than just humans—will be far more difficult. Assuming that we eventually come upon the technical expertise to develop such a thing, how are we to know just how to balance the interests of all sentient beings, or to develop artificial beings with the capacity and direction to do so? I take it that it isn’t a rare occurrence for humans to be stumped by how best to balance their own interests with those of other humans who will be affected by their actions—even when they have a sincere desire to do so. Given this, how are we to know how best to balance the interests of all moral patients such that any relevant technical expertise could effectively be put to use?

Nevertheless, whether we ought to create AGI (whether that be IAI, FAI or any other AGI for that matter), I expect that humanity will, in fact, do so if it can. The reasons are many, but they surely include the pursuit of profit, power (i.e., military), and simple scientific curiosity.⁴⁶

But with all this said, given that intelligence is, at the very least, the primary reason for the emergence of science and all of its innovations, it’s important to keep in mind the range of potential benefits that a superior intelligence might provide (e.g., curing diseases, resolving the dangers of global warming, solutions to the diminishing of democracy and a more likely environment for the harmony of interests overall, as well as a plethora of benefits that might be beyond our powers of conceiving). If our understanding of intelligence is adequate to develop sufficiently beneficial AGI then, all things considered, the foregoing provides two arguments for developing the sort of AGI that will be suitably impartial. And if a suitably impartial AGI can calculate the likelihood of future species that are morally superior to humans then, morally speaking, we have reason—one might even say that we have a duty—to develop such an AGI.

And while one might respond by claiming that we humans have a right of self-defense and therefore a right to not produce that which will potentially lead to our destruction or a significant reduction in our freedoms, it should be noted that the possession of rights doesn’t, by itself, imply that such rights are absolute or limitless. Presumably, there is some point at which the amount of good that is likely to result from an action swamps the rights that might be overridden by that action.

Finally, if AGI is to be developed recklessly, without a sufficient understanding of the workings of artificial intelligences, then we’ll be left hoping that, somewhere in an accessible part of the universe, a more enlightened species has developed AGI in time to reverse the inevitable damage.

Conclusion

I’ve argued that given certain assumptions regarding artificial general intelligence, as well as certain facts about human beings, AGI development ought to aim toward impartial AGI rather than a human-centric sort, the latter of which dominates current research and literature on AI and existential risk. My reasons rest upon [1] the claim, argued for in Sect. 2.1, that humans are not the only beings worthy of moral consideration, and [2] the fact that humans have likely destroyed and are likely to destroy species that could very well evolve into species that are morally equal or even superior, to ourselves (Sect. 4). So if it turns out that humans are as special as we seem to think that we are, and if we are, in fact, headed toward the development of AGI, then we have very good reason to develop an AI that is impartial in its moral orientation so that we might be more likely to facilitate beings with equivalent or improved versions of just this sort of specialness. At the very least, the issue of exactly whose interests should be given consideration, and why, should receive more attention than it currently receives.

Notes

See, for example, Yudkowsky [27].
See, for example, Tarleton [22], Allen et. al. [1], Anderson and Anderson [2], and Wallach et al. [26].
See, for example, Omohundro [16], Bostrom [4], ch. 12; Taylor et al. [23], Soares [21], and Russell [18].
See Armstrong et al. [3] and Bostrom [4], pp. 177–181.
As an example of a company aiming at the latter, see https://openai.com/charter/.
While ‘intelligence’ is notoriously difficult to define, Russell [18], p. 9 claims that something is intelligent “to the extent that their actions can be expected to achieve their objectives”. According to Tegmark (2017) p. 50, intelligence is the “ability to accomplish complex goals”. And Yudkowsky [25]: intelligence is “an evolutionary advantage” that “enables us to model, predict, and manipulate regularities in reality”.
Central to explaining AGI’s move to ASI is ‘recursive self-improvement’ described in Omohundro [14].
This is consistent with Yudkowsky [12], p. 2, according to which: “The term ‘Friendly AI’ refers to the production of human-benefiting, non-human-harming actions in Artificial Intelligence systems that have advanced to the point of making real-world plans in pursuit of goals”.
With ‘considers the interests’ I’m anthropomorphizing for simplicity. I expect it to be a matter of controversy whether AGI of any sort can consider the interests of anything whatsoever.
See Regan [17], chapter 5 for a discussion of the notions of ‘moral patient’ and ‘moral agent’.
For opinions regarding when AGI will be attained see Bostrom [4], pp. 23–24 and Müller and Bostrom [12].
See, for example, Bostrom [4], Kurzweil [11], Yudkowsky [7], Chalmers [5], Vinge [25], Good [9]. There are differing views on the timelines involved in the move from AGI to ASI. For a discussion of the differences between ‘hard’ and ‘soft takeoffs’ see, for example, Bostrom [4] chapter 4 (especially pp. 75–80), Yudkowsky [25], Yudkowsky [30], and Tegmark (2017), pp. 150–157.
IAI may favor particular species if species-neutral values dictate favoring some species over others. For example, it may be the case that while all animals are worthy of moral consideration, some species are worthy of a greater level of consideration than others.
Of course, another possibility is that AGI develops hostile values in which case issues of human and non-human interests are likely moot.
Of course, it should be noted that while IAI may not be consistent with FAI, it is at least possible that IAI will be consistent with FAI. I take it that we are not in a position to know which is more likely with any degree of certainty.
The term ‘speciesism’, coined by Ryder [19], is meant to express a bias toward the interests of one’s own species and against those of other species.
By ‘moral patient’ I mean anything which is sentient or conscious and can be harmed or benefitted. A moral patient is anything toward which moral agents (i.e., those entities that bear moral responsibilities) can have responsibilities toward for their own sake. For present purposes, I will take the capacity to suffer as a reasonable sufficient (and possibly necessary) condition for being a moral patient.
By ‘possible’ here I don’t intend a distant, modal sense according to which there exists some possible world in which the relevant beings exist. I mean that, in this world, such beings could very well actually exist in the future given that we don’t exterminate the preceding species or beings.
Even if the goals, as specified, are consistent with human interests, ASI might take unintended paths toward the accomplishing of these goals, or it may develop subgoals (or, instrumental goals) that are ultimately inconsistent with human interests. For the latter issue, see Omohundro [14, 15] and Bostrom [4], ch. 7.
I acknowledge that there is a debate to be had regarding what is ‘in the interest’ of a species. Nonetheless, I do not see the plausibility of my thesis turning on the choices one might make here.
In terms of FAI based upon values we believe to be consistent with human interests, the main problem involves the widely discussed ‘unintended consequences’. The worry stems from our inability to foresee the possible ways in which AGI might pursue the goals we provide it with. Granting that it will become significantly more intelligent than the brightest humans, it’s unlikely that we’ll be capable of discerning the full range of possible paths cognitively available to AGI for pursuing whatever goal we provide it. In light of this, something as powerful as AGI might produce especially catastrophic scenarios (see, for example, Bostrom [4] ch. 8 and Omohundro [15]. As for FAI based upon what are, in fact, human-centric values, an initial problem arises when we consider that what we believe is in our interest and what is actually in our interest might be quite distinct. If so, how could we possibly go about developing such an AI? It seems that any hopeful approach to such an FAI would require our discovering the correct theory of human wellbeing, whatever that might happen to be. Nonetheless, for the purposes of this paper I want to grant that we are, in fact, capable of developing such an objectively human-friendly AI.
By ‘a set of impartial, species-neutral moral facts’ I mean simply that, given the assumption that the interests of all moral patients are valuable, there is a set of moral facts that follow. Basically, there are a set of facts that determine rightness and wrongness in any possible situation given the moral value of all moral patients, where this is understood in a non-speciesist (i.e., based upon morally relevant features rather than species-membership) way.
I thank an anonymous reviewer for this point.
Muehlhauser and Bostrom [12], p. 43.
Yudkowsky [29], p. 388.
Singer [20].
Singer [20], p. 6.
DeGrazia [7], p. 36.
Singer [20], p. 8.
See Singer [20], p. 20.
DeGrazia [7], pp. 35–36.
The arguments in the remainder of the paper will clearly still follow for proponents of the ‘equal consideration approach’. In fact, my conclusions may still follow on an even weaker anti-speciesist view according to which we ought to treat species as morally equal to humans (or of even greater moral worth than humans) if such beings evolve from current species (see Sect. 4 below).
See, for example, De Waal [8].
In addition, it’s also likely that there will be many cases in which, despite non-human interests receiving no consideration, such interests will remain consistent with human interests. I happily admit this. The point I’m making is that there will be cases where non-human interests will not be consistent with human interests and therefore will be disregarded by FAI.
See, for example, Bostrom [4], Yudkowsky [31], Omohundro [14, 15], Häggström [10], and Russell [18].
This might be accomplished by harvesting and altering their genetic information then producing the new ‘versions’ via in vitro fertilization. This is outlandish, of course, but no more so than the scenarios suggested by many AI researchers regarding existential threats to humanity via unintended consequences.
See Omohundro [15] for a discussion of ‘basic AI drives’. Of these, the most relevant to the current point is ‘resource acquisition’. ‘Efficiency’ is another relevant subgoal, as AGI/ASI will become more efficient with regarding to pursuing its goals as well as its use of resources.
It’s also important to recall that there’s every reason to believe that IAI will, as well as FAI, develop the basic AI drives presented in Omohundro [15].
I remind the reader that by ‘possible’ beings here I intend those that could very well actually exist in the future given that we don’t exterminate the relevant preceding beings and not some logically distant, modal sense of beings.
In addition, given that such species could develop from currently existing species, it is not a major leap to accept that we ought to develop AGI with them in mind as well, even if one rejects that currently existing species are not now worthy of consideration.
Darwin [6], pp. 34–35.
See, for example, https://www.theguardian.com/environment/2018/oct/30/humanity-wiped-out-animals-since-1970-major-report-finds, https://www.ipbes.net/news/Media-Release-Global-Assessment and https://www.forbes.com/sites/trevornace/2018/10/16/humans-are-exterminating-animal-species-faster-than-evolution-can-keep-up/#451b4d6415f3.
I would suggest that this is analogous to cases in which, when presented with a moral dilemma, children should defer to suitable adults to make decisions that will have morally relevant consequences.
In fact, it seems that beyond all of the foregoing, a sufficiently competent and powerful ASI could well fit the environment of the earth, as well as the universe beyond, to the most morally superior of possible biological beings. If it turns out that the optimal moral scenario is one in which the highest of possible moral beings exists and has its interests maximized, then we ought to develop IAI to bring about just this scenario, regardless of whether we are included in such a scenario. On the other hand, if we’re supposed to, morally speaking, develop that which will most benefit humans, then we are left not only scrambling to do so, but also hoping that there are no smarter beings somewhere in the universe working on the analogous project.
I thank an anonymous reviewer for this point as well.
Unfortunately, there is precedent in past human behavior for this attitude. For example, I expect that, with the benefit of hindsight, many believe that nuclear weapons ought not have been created. The same can be said for the development of substances and practices employed in processes that continue to contribute to climate change. Nonetheless, global dismantling of nuclear weapons and moving away from practices that proliferate greenhouse gases remain far off hopes.If this is correct, then I would suggest not only that the foregoing provides support for the preferability of species-neutral AGI but that the scope of interests to be considered by AGI ought to be given far more attention than it currently receives.

References

Allen, C., Smit, I., Wallach, W.: Artificial morality: top-down, bottom-up, and hybrid approaches. Ethics Inf. Technol. 7, 149–155 (2006)
Anderson, M., Anderson, S.: Machine ethics: creating an ethical intelligent agent. AI Mag. 28(4), 15–26 (2007) Google Scholar
Armstrong, S., Sandberg, A., Bostrom, N.: Thinking inside the box: controlling and using an oracle AI. Mind. Mach. 22, 299–324 (2011)Article Google Scholar
Bostrom, N.: Superintelligence. Oxford University Press, Oxford (2014) Google Scholar
Chalmers, D.: The singularity: a philosophical analysis. J. Conscious. Stud. 17(9–10), 7–65 (2010) Google Scholar
Darwin, C.: The Descent of Man, and Selection in Relation to Sex. John Murray, London (1871)Book Google Scholar
DeGrazia, D.: Animal Rights: A Very Short Introduction. Oxford University Press, New York, NY (2002)Book Google Scholar
De Waal, F.: Chimpanzee Politics. Johns Hopkins University Press, Baltimore, MD (1998) Google Scholar
Good, I.J.: Speculations concerning the first ultraintelligent machine. In: Franz, L., Rubinoff, M. (eds.) Advances in Computers, vol. 6, pp. 31–88. Academic Press, New York (1965)
Häggström, O.: Challenges to the Omohundro—Bostrom framework for AI motivations. Foresight 21(1), 153–166 (2019)Article Google Scholar
Kurzweil, R.: The Singularity is Near: When Humans Transcend Biology. Penguin Books, New York (2005)
Muehlhauser, L., Bostrom, N.: Why We Need Friendly AI. Think 36, 13(Spring) (2014)
Müller, V., Bostrom, N.: Future progress in artificial intelligence: a survey of expert opinion. In: Fundamental Issues of Artificial Intelligence, 2016-06-08, pp. 555–572 (2016)
Omohundro, S.: The nature of self-improving artificial intelligence [steveomohundro.com/scientific-contributions/] (2007)
Omohundro, S.: The basic AI drives. In: Wang, P., Goertzel, B., Franklin, S. (eds.) Artificial General Intelligence 2008: Proceedings of the First AGI Conference. IOS, Amsterdam, pp. 483–492 (2008)
Omohundro, S.: Autonomous technology and the greater human good. J. Exp. Theor. Artif. Intellig. 26(3), 303–315 (2014). https://doi.org/10.1080/0952813X.2014.895111.
Regan, T.: The Case for Animal Rights. University of California Press, California (2004) Google Scholar
Russell, S.: Human Compatible: Artificial Intelligence and the Problem of Control. Viking, New York (2019) Google Scholar
Ryder, R.: http://www.criticalsocietyjournal.org.uk/Archives_files/1.SpeciesismAgain.pdf (2010)
Singer, P.: Animal Liberation. HarperCollins, New York, NY (2002) Google Scholar
Soares, N.: The value learning problem. In: Ethics for Artificial Intelligence Workshop at 25th International Joint Conference on Artificial Intelligence (IJCAI-2016), New York, NY, USA, 9–15 July 2016 (2016)
Tarleton, N.: Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics. The Singularity Institute, San Francisco, CA (2010) Google Scholar
Taylor, J., Yudkowsky, E., LaVictoire, P., Critch, A.: Alignment for Advanced Machine Learning Systems. Machine Intelligence Research Institute, July 27, 2016 (2016)
Tegmark, M.: Life 3.0: Being Human in the Age of Artificial Intelligence. Alfred A. Knopf, New York, NY (2017) Google Scholar
Vinge, V.: The coming technological singularity: how to survive in the post-human era. Whole Earth Rev. 77 (1993)
Wallach, W., Allen, C., Smit, I.: Machine morality: bottom-up and top-down approaches for modelling human moral faculties. Ethics Artif. Agents 22(4): 565–582 (2008). doi:https://doi.org/10.1007/s00146-007-0099-0
Yudkowsky, E.: Creating Friendly AI 1.0: The Analysis and Design of Benevolent Goal Architectures. The Singularity Institute, San Francisco, CA, June 15 (2001)
Yudkowsky, E.: Artificial intelligence as a positive and negative factor in global risk. In: Bostrom, N., Cirkovic, M. (eds.) Global Catastrophic Risks. Oxford University Press, Oxford, pp 308–345 (2008)
Yudkowsky, E.: Complex value systems in friendly AI. In: Schmidhuber, J., Thórisson, K.R., Looks, M. (eds.) Artificial General Intelligence: 4th International Conference. AGI 2011, LNAI 6830, pp. 388–393 (2011)
Yudkowsky, E.: Intelligence Explosion Microeconomics. Technical Report 2013-1. Machine Intelligence Research Institute, Berkeley, CA. Last modified September 13, 2013 (2013)
Yudkowsky, E.: There’s No Fire Alarm for Artificial General Intelligence (2017). https://intelligence.org/2017/10/13/fire-alarm/

Tag: Friendly AI

Two arguments against human-friendly AI