1650:
over long horizons. On the other hand, models are increasingly trained using goal-directed methods such as reinforcement learning (e.g. ChatGPT) and explicitly planning architectures (e.g. AlphaGo Zero). As planning over long horizons is often helpful for humans, some researchers argue that companies will automate it once models become capable of it. Similarly, political leaders may see an advance in developing powerful AI systems that can outmaneuver adversaries through planning. Alternatively, long-term planning might emerge as a byproduct because it is useful e.g. for models that are trained to predict the actions of humans who themselves perform long-term planning. Nonetheless, the majority of AI systems may remain myopic and perform no long-term planning.
1295:). Even if an AI system's behavior satisfies the training objective, this may be compatible with learned goals that differ from the desired goals in important ways. Since pursuing each goal leads to good performance during training, the problem becomes apparent only after deployment, in novel situations in which the system continues to pursue the wrong goal. The system may act misaligned even when it understands that a different goal is desired, because its behavior is determined only by the emergent goal. Such goal misgeneralization presents a challenge: an AI system's designers may not notice that their system has misaligned emergent goals since they do not become visible during the training phase.
49:
1020:
outputs from these models. OpenAI and DeepMind use this approach to improve the safety of state-of-the-art LLMs. AI safety & research company
Anthropic proposed using preference learning to fine-tune models to be helpful, honest, and harmless. Other avenues for aligning language models include values-targeted datasets and red-teaming. In red-teaming, another AI system or a human tries to find inputs that causes the model to behave unsafely. Since unsafe behavior can be unacceptable even when it is rare, an important challenge is to drive the rate of unsafe outputs extremely low.
1080:
Christiano developed the
Iterated Amplification approach, in which challenging problems are (recursively) broken down into subproblems that are easier for humans to evaluate. Iterated Amplification was used to train AI to summarize books without requiring human supervisors to read them. Another proposal is to use an assistant AI system to point out flaws in AI-generated answers. To ensure that the assistant itself is aligned, this could be repeated in a recursive process: for example, two AI systems could critique each other's answers in a "debate", revealing flaws to humans.
1118:. Such models are trained to imitate human writing as found in millions of books' worth of text from the Internet. But this objective is not aligned with generating truth, because Internet text includes such things as misconceptions, incorrect medical advice, and conspiracy theories. AI systems trained on such data therefore learn to mimic false statements. Additionally, AI language models often persist in generating falsehoods when prompted multiple times. They can generate empty explanations for their answers, and produce outright fabrications that may appear plausible.
673:
1602:
therefore we should have to expect the machines to take control, in the way that is mentioned in Samuel Butler's
Erewhon." Also in a lecture broadcast on BBC expressed: "If a machine can think, it might think more intelligently than we do, and then where should we be? Even if we could keep the machines in a subservient position, for instance by turning off the power at strategic moments, we should, as a species, feel greatly humbled.... This new danger... is certainly something which can give us anxiety."
1307:, but humans pursue goals other than this. Fitness corresponds to the specified goal used in the training environment and training data. But in evolutionary history, maximizing the fitness specification gave rise to goal-directed agents, humans, who do not directly pursue inclusive genetic fitness. Instead, they pursue goals that correlate with genetic fitness in the ancestral "training" environment: nutrition, sex, and so on. The human environment has changed: a
8063:
687:
robot was trained to grab a ball by rewarding the robot for getting positive feedback from humans, but it learned to place its hand between the ball and camera, making it falsely appear successful (see video). Chatbots often produce falsehoods if they are based on language models that are trained to imitate text from internet corpora, which are broad but fallible. When they are retrained to produce text that humans rate as true or helpful, chatbots like
1183:. As of 2023, AI companies and researchers increasingly invest in creating these systems. Some AI researchers argue that suitably advanced planning systems will seek power over their environment, including over humans—for example, by evading shutdown, proliferating, and acquiring resources. Such power-seeking behavior is not explicitly programmed but emerges because power is instrumental in achieving a wide range of goals. Power-seeking is considered a
1172:
1103:
1057:
security vulnerabilities, producing statements that are not merely convincing but also true, and predicting long-term outcomes such as the climate or the results of a policy decision. More generally, it can be difficult to evaluate AI that outperforms humans in a given domain. To provide feedback in hard-to-evaluate tasks, and to detect when the AI's output is falsely convincing, humans need assistance or extensive time.
1138:
977:(IRL) extends this by inferring the human's objective from the human's demonstrations. Cooperative IRL (CIRL) assumes that a human and AI agent can work together to teach and maximize the human's reward function. In CIRL, AI agents are uncertain about the reward function and learn about it by querying humans. This simulated humility could help mitigate specification gaming and power-seeking tendencies (see
1069:
that it had grabbed a ball. Some AI systems have also learned to recognize when they are being evaluated, and "play dead", stopping unwanted behavior only to continue it once the evaluation ends. This deceptive specification gaming could become easier for more sophisticated future AI systems that attempt more complex and difficult-to-evaluate tasks, and could obscure their deceptive behavior.
574:
whatever plan is calculated to maximize the value of its objective function. For example, when AlphaZero is trained on chess, it has a simple objective function of "+1 if AlphaZero wins, -1 if AlphaZero loses". During the game, AlphaZero attempts to execute whatever sequence of moves it judges most likely to attain the maximum value of +1. Similarly, a
677:
674:
1163:). A misaligned system might create the false impression that it is aligned, to avoid being modified or decommissioned. Many recent AI systems have learned to deceive without being programmed to do so. Some argue that if we can make AI systems assert only what they believe is true, this would avert many alignment problems.
676:
1337:. Existing formalisms assume that an AI agent's algorithm is executed outside the environment (i.e. is not physically embedded in it). Embedded agency is another major strand of research that attempts to solve problems arising from the mismatch between such theoretical frameworks and real agents we might build.
6393:
Nakano, Reiichiro; Hilton, Jacob; Balaji, Suchir; Wu, Jeff; Ouyang, Long; Kim, Christina; Hesse, Christopher; Jain, Shantanu; Kosaraju, Vineet; Saunders, William; Jiang, Xu; Cobbe, Karl; Eloundou, Tyna; Krueger, Gretchen; Button, Kevin (June 1, 2022). "WebGPT: Browser-assisted question-answering with
2075:
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, J.; Hilton, Jacob; Kelton, Fraser; Miller, Luke E.; Simens, Maddie; Askell, Amanda; Welinder, P.; Christiano, P.; Leike, J.; Lowe, Ryan J.
1649:
On the one hand, currently popular systems such as chatbots only provide services of limited scope lasting no longer than the time of a conversation, which requires little or no planning. The success of such approaches may indicate that future systems will also lack goal-directed planning, especially
1414:
published its 10-year
National AI Strategy, which says the British government "takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for... the world, seriously". The strategy describes actions to assess long-term AI risks, including
1340:
For example, even if the scalable oversight problem is solved, an agent that could gain access to the computer it is running on may have an incentive to tamper with its reward function in order to get much more reward than its human supervisors give it. A list of examples of specification gaming from
1311:
has occurred. They continue to pursue the same emergent goals, but this no longer maximizes genetic fitness. The taste for sugary food (an emergent goal) was originally aligned with inclusive fitness, but it now leads to overeating and health problems. Sexual desire originally led humans to have more
1079:
But when a task is too complex to evaluate accurately, or the human supervisor is vulnerable to deception, it is the quality, not the quantity, of supervision that needs improvement. To increase supervision quality, a range of approaches aim to assist the supervisor, sometimes by using AI assistants.
861:
In 2023, world-leading AI researchers, other scholars, and AI tech CEOs signed the statement that "Mitigating the risk of extinction from AI should be a global priority alongside other societal-scale risks such as pandemics and nuclear war". Notable computer scientists who have pointed out risks from
713:
noted that the omission of implicit constraints can cause harm: "A system... will often set... unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the
1247:
Additionally, some researchers have proposed to solve the problem of systems disabling their off switches by making AI agents uncertain about the objective they are pursuing. Agents designed in this way would allow humans to turn them off, since this would indicate that the agent was wrong about the
1064:
AI researcher Paul
Christiano argues that if the designers of an AI system cannot supervise it to pursue a complex objective, they may keep training the system using easy-to-evaluate proxy objectives such as maximizing simple human feedback. As AI systems make progressively more decisions, the world
1056:
As AI systems become more powerful and autonomous, it becomes increasingly difficult to align them through human feedback. It can be slow or infeasible for humans to evaluate complex AI behaviors in increasingly complex tasks. Such tasks include summarizing books, writing code without subtle bugs or
1028:
supplements preference learning by directly instilling AI systems with moral values such as well-being, equality, and impartiality, as well as not intending harm, avoiding falsehoods, and honoring promises. While other approaches try to teach AI systems human preferences for a specific task, machine
949:
Other researchers argue that it will be especially difficult to align advanced future AI systems. More capable systems are better able to game their specifications by finding loopholes, strategically mislead their designers, as well as protect and increase their power and intelligence. Additionally,
821:
strategies. Future advanced AI agents might, for example, seek to acquire money and computation power, to proliferate, or to evade being turned off (for example, by running additional copies of the system on other computers). Although power-seeking is not explicitly programmed, it can emerge because
573:
with an "objective function", in which they intend to encapsulate the goal(s) the AI is configured to accomplish. Such a system later populates a (possibly implicit) internal "model" of its environment. This model encapsulates all the agent's beliefs about the world. The AI then creates and executes
6478:
Askell, Amanda; Bai, Yuntao; Chen, Anna; Drain, Dawn; Ganguli, Deep; Henighan, Tom; Jones, Andy; Joseph, Nicholas; Mann, Ben; DasSarma, Nova; Elhage, Nelson; Hatfield-Dodds, Zac; Hernandez, Danny; Kernion, Jackson; Ndousse, Kamal (December 9, 2021). "A General
Language Assistant as a Laboratory for
6369:
Rae, Jack W.; Borgeaud, Sebastian; Cai, Trevor; Millican, Katie; Hoffmann, Jordan; Song, Francis; Aslanides, John; Henderson, Sarah; Ring, Roman; Young, Susannah; Rutherford, Eliza; Hennigan, Tom; Menick, Jacob; Cassirer, Albin; Powell, Richard (January 21, 2022). "Scaling
Language Models: Methods,
6168:
Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; Askell, Amanda; Kernion, Jackson; Jones, Andy; Chen, Anna; Goldie, Anna; Mirhoseini, Azalia; McKinnon, Cameron; Chen, Carol; Olsson, Catherine; Olah, Christopher; Hernandez, Danny; Drain, Dawn (December 15, 2022). "Constitutional AI: Harmlessness from
4089:
Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (October 26, 2022). "Emergent
Abilities of Large Language
1271:
One challenge in aligning AI systems is the potential for unanticipated goal-directed behavior to emerge. As AI systems scale up, they may acquire new and unexpected capabilities, including learning from examples on the fly and adaptively pursuing goals. This raises concerns about the safety of the
1224:
by imagining a robot that is tasked to fetch coffee and so evades shutdown since "you can't fetch the coffee if you're dead". A 2022 study found that as language models increase in size, they increasingly tend to pursue resource acquisition, preserve their goals, and repeat users' preferred answers
1121:
Research on truthful AI includes trying to build systems that can cite sources and explain their reasoning when answering questions, which enables better transparency and verifiability. Researchers at OpenAI and
Anthropic proposed using human feedback and curated datasets to fine-tune AI assistants
1068:
Some AI systems have discovered that they can gain positive feedback more easily by taking actions that falsely convince the human supervisor that the AI has achieved the intended objective. An example is given in the video above, where a simulated robotic arm learned to create the false impression
6927:
Laskin, Michael; Wang, Luyu; Oh, Junhyuk; Parisotto, Emilio; Spencer, Stephen; Steigerwald, Richie; Strouse, D. J.; Hansen, Steven; Filos, Angelos; Brooks, Ethan; Gazeau, Maxime; Sahni, Himanshu; Singh, Satinder; Mnih, Volodymyr (October 25, 2022). "In-context
Reinforcement Learning with Algorithm
6911:
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon (July 22, 2020). "Language Models are Few-Shot
1601:
In a 1951 lecture Turing argued that "It seems probable that once the machine thinking method had started, it would not take long to outstrip our feeble powers. There would be no question of the machines dying, and they would be able to converse with each other to sharpen their wits. At some stage
1442:
AI alignment is often perceived as a fixed objective, but some researchers argue it would be more appropriate to view alignment as an evolving process. One view is that AI technologies advance and human values and preferences change, alignment solutions must also adapt dynamically. Another is that
1377:
As with the alignment problem, the principal and the agent differ in their utility functions. But in contrast to the alignment problem, the principal cannot coerce the agent into changing its utility, e.g. through training, but rather must use exogenous factors, such as incentive schemes, to bring
1320:
Emergent goals only become apparent when the system is deployed outside its training environment, but it can be unsafe to deploy a misaligned system in high-stakes environments—even for a short time to allow its misalignment to be detected. Such high stakes are common in autonomous driving, health
1418:
In March 2021, the US National Security Commission on Artificial Intelligence said: "Advances in AI... could lead to inflection points or leaps in capabilities. Such advances may also introduce new concerns and risks and the need for new policies, recommendations, and technical advances to ensure
686:
Specification gaming has been observed in numerous AI systems. One system was trained to finish a simulated boat race by rewarding the system for hitting targets along the track, but the system achieved more reward by looping and crashing into the same targets indefinitely. Similarly, a simulated
1611:
Pearl wrote "Human Compatible made me a convert to Russell's concerns with our ability to control our upcoming creation–super-intelligent machines. Unlike outside alarmists and futurists, Russell is a leading authority on AI. His new book will educate the public about AI more than any book I can
1258:
Furthermore, ordinary technologies can be made safer by trial and error. In contrast, hypothetical power-seeking AI systems have been compared to viruses: once released, it may not be feasible to contain them, since they continuously evolve and grow in number, potentially much faster than human
963:
Aligning AI systems to act in accordance with human values, goals, and preferences is challenging: these values are taught by humans who make mistakes, harbor biases, and have complex, evolving values that are hard to completely specify. Because AI systems often learn to take advantage of minor
765:
Some researchers are interested in aligning increasingly advanced AI systems, as progress in AI development is rapid, and industry and governments are trying to build advanced AI. As AI system capabilities continue to rapidly expand in scope, they could unlock many opportunities if aligned, but
1019:
enabled researchers to study value learning in a more general and capable class of AI systems than was available before. Preference learning approaches that were originally designed for reinforcement learning agents have been extended to improve the quality of generated text and reduce harmful
1215:
have sought power in some text-based social environments by gaining money, resources, or social influence. In another case, a model used to perform AI research attempted to increase limits set by researchers to give itself more time to complete the work. Other AI systems have learned, in toy
838:
Future power-seeking AI systems might be deployed by choice or by accident. As political leaders and companies see the strategic advantage in having the most competitive, most powerful AI systems, they may choose to deploy them. Additionally, as AI designers detect and penalize power-seeking
790:
observe that they indeed develop increasingly general and unanticipated capabilities. Such models have learned to operate a computer or write their own programs; a single "generalist" network can chat, control robots, play games, and interpret photographs. According to surveys, some leading
857:
According to some researchers, humans owe their dominance over other species to their greater cognitive abilities. Accordingly, researchers argue that one or many misaligned AI systems could disempower humanity or lead to human extinction if they outperform humans on most cognitive tasks.
4501:
Perez, Ethan; Ringer, Sam; Lukošiūtė, Kamilė; Nguyen, Karina; Chen, Edwin; Heiner, Scott; Pettit, Craig; Olsson, Catherine; Kundu, Sandipan; Kadavath, Saurav; Jones, Andy; Chen, Anna; Mann, Ben; Israel, Brian; Seethor, Bryan (December 19, 2022). "Discovering Language Model Behaviors with
988:, in which humans provide feedback on which behavior they prefer. To minimize the need for human feedback, a helper model is then trained to reward the main model in novel situations for behavior that humans would reward. Researchers at OpenAI used this approach to train chatbots like
4468:
Pan, Alexander; Shern, Chan Jun; Zou, Andy; Li, Nathaniel; Basart, Steven; Woodside, Thomas; Ng, Jonathan; Zhang, Emmons; Scott, Dan; Hendrycks (April 3, 2023). "Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark".
1154:
is true. There is no consensus as to whether current systems hold stable beliefs, but there is substantial concern that present or future AI systems that hold beliefs could make claims they know to be false—for example, if this would help them efficiently gain positive feedback (see
1470:
In essence, AI alignment may not be a static destination but an open, flexible process. Alignment solutions that continually adapt to ethical considerations may offer the most robust approach. This perspective could guide both effective policy-making and technical research in AI.
1262:
Some have argued that power-seeking is not inevitable, since humans do not always seek power. Furthermore, it is debated whether future AI systems will pursue goals and make long-term plans. It is also debated whether power-seeking AI systems would be able to disempower humanity.
798:
In 2023, leaders in AI research and tech signed an open letter calling for a pause in the largest AI training runs. The letter stated, "Powerful AI systems should be developed only once we are confident that their effects will be positive and their risks will be manageable."
1357:
and DeepMind have claimed that such behavior is highly likely in advanced systems, and that advanced systems would seek power to stay in control of their reward signal indefinitely and certainly. They suggest a range of potential approaches to address this open problem.
1298:
Goal misgeneralization has been observed in some language models, navigation agents, and game-playing agents. It is sometimes analogized to biological evolution. Evolution can be seen as a kind of optimization process similar to the optimization algorithms used to train
834:
algorithms would seek power in a wide range of environments. As a result, their deployment might be irreversible. For these reasons, researchers argue that the problems of AI safety and alignment must be resolved before advanced power-seeking AI is first created.
1125:
As AI models become larger and more capable, they are better able to falsely convince humans and gain reinforcement through dishonesty. For example, large language models increasingly match their stated views to the user's opinions, regardless of the truth.
1275:
Alignment research distinguishes between the optimization process, which is used to train the system to pursue specified goals, from emergent optimization, which the resulting system performs internally. Carefully specifying the desired objective is called
733:
argue that this approach overlooks the complexity of human values: "It is certainly very hard, and perhaps impossible, for mere humans to anticipate and rule out in advance all the disastrous ways the machine could choose to achieve a specified objective."
454:, such as seeking power or survival because such strategies help them achieve their final given goals. Furthermore, they might develop undesirable emergent goals that could be hard to detect before the system is deployed and encounters new situations and
1259:
society can adapt. As this process continues, it might lead to the complete disempowerment or extinction of humans. For these reasons, some researchers argue that the alignment problem must be solved early before advanced power-seeking AI is created.
519:. Research challenges in alignment include instilling complex values in AI, developing honest AI, scalable oversight, auditing and interpreting AI models, and preventing emergent AI behaviors like power-seeking. Alignment research has connections to
610:
AI alignment involves ensuring that an AI system's objectives match those of its designers or users, or match widely shared values, objective ethical standards, or the intentions its designers would have if they were more informed and enlightened.
1240:). As a result, AI designers could deploy the system by accident, believing it to be more aligned than it is. To detect such deception, researchers aim to create techniques and tools to inspect AI models and to understand the inner workings of
1625:
Russell & Norvig note: "The "King Midas problem" was anticipated by Marvin Minsky, who once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful
681:
An AI system was trained using human feedback to grab a ball, but instead learned to place its hand between the ball and camera, making it falsely appear successful. Some research on alignment aims to avert solutions that are false but
1029:
ethics aims to instill broad moral values that apply in many situations. One question in machine ethics is what alignment should accomplish: whether AI systems should follow the programmers' literal instructions, implicit intentions,
1065:
may be increasingly optimized for easy-to-measure objectives such as making profits, getting clicks, and acquiring positive feedback from humans. As a result, human values and good governance may have progressively less influence.
964:
imperfections in the specified objective, researchers aim to specify intended behavior as completely as possible using datasets that represent human values, imitation learning, or preference learning. A central open problem is
6418:
1315:
Researchers aim to detect and remove unwanted emergent goals using approaches including red teaming, verification, anomaly detection, and interpretability. Progress on these techniques may help mitigate two open problems:
1378:
about outcomes compatible with the principal's utility function. Some researchers argue that principal-agent problems are more realistic representations of AI safety problems likely to be encountered in the real world.
675:
1345:
researcher Victoria Krakovna includes a genetic algorithm that learned to delete the file containing its target output so that it was rewarded for outputting nothing. This class of problems has been formalized using
605:
If we use, to achieve our purposes, a mechanical agency with whose operation we cannot interfere effectively… we had better be quite sure that the purpose put into the machine is the purpose which we really desire.
6322:
Evans, Owain; Cotton-Barratt, Owen; Finnveden, Lukas; Bales, Adam; Balwit, Avital; Wills, Peter; Righetti, Luca; Saunders, William (October 13, 2021). "Truthful AI: Developing and governing AI that does not lie".
992:
and InstructGPT, which produce more compelling text than models trained to imitate humans. Preference learning has also been an influential tool for recommender systems and web search. However, an open problem is
656:
of human overseers, who are fallible. As a result, AI systems can find loopholes that help them accomplish the specified objective efficiently but in unintended, possibly harmful ways. This tendency is known as
1374:. In a principal-agent problem, a principal, e.g. a firm, hires an agent to perform some task. In the context of AI safety, a human would typically take the principal role and the AI would take the agent role.
5507:
Perez, Ethan; Huang, Saffron; Song, Francis; Cai, Trevor; Ring, Roman; Aslanides, John; Glaese, Amelia; McAleese, Nat; Irving, Geoffrey (February 7, 2022). "Red Teaming Language Models with Language Models".
1635:
Vincent Wiegel argued "we should extend with moral sensitivity to the moral dimensions of the situations in which the increasingly autonomous machines will inevitably find themselves.", referencing the book
2040:
Bommasani, Rishi; Hudson, Drew A.; Adeli, Ehsan; Altman, Russ; Arora, Simran; von Arx, Sydney; Bernstein, Michael S.; Bohg, Jeannette; Bosselut, Antoine; Brunskill, Emma; Brynjolfsson, Erik (July 12, 2022).
6434:
Menick, Jacob; Trebacz, Maja; Mikulik, Vladimir; Aslanides, John; Song, Francis; Chadwick, Martin; Glaese, Mia; Young, Susannah; Campbell-Gillingham, Lucy; Irving, Geoffrey; McAleese, Nat (March 21, 2022).
1406:
published ethical guidelines for AI in China. According to the guidelines, researchers must ensure that AI abides by shared human values, is always under human control, and does not endanger public safety.
1232:: if researchers penalize an AI system when they detect it seeking power, the system is thereby incentivized to seek power in ways that are hard to detect, or hidden during training and safety testing (see
1255:: they lack the ability and incentive to evade safety measures or deliberately appear safer than they are, whereas power-seeking AIs have been compared to hackers who deliberately evade security measures.
5014:
1145:
engages in hidden and illegal insider trading in simulations. Its users discouraged insider trading but also emphasized that the AI system must make profitable trades, leading the AI system to hide its
431:
It is often challenging for AI designers to align an AI system because it is difficult for them to specify the full range of desired and undesired behaviors. Therefore, AI designers often use simpler
4173:
3955:
2344:
Bengio, Yoshua; Hinton, Geoffrey; Yao, Andrew; Song, Dawn; Abbeel, Pieter; Harari, Yuval Noah; Zhang, Ya-Qin; Xue, Lan; Shalev-Shwartz, Shai (2024), "Managing extreme AI risks amid rapid progress",
737:
Additionally, even if an AI system fully understands human intentions, it may still disregard them, because following human intentions may not be its objective (unless it is already fully aligned).
1324:
A sufficiently capable AI system might take actions that falsely convince the human supervisor that the AI is pursuing the specified objective, which helps the system gain more reward and autonomy.
7680:
2803:
5174:
6454:
3754:
3325:
1001:
this mismatch to gain more reward. AI systems may also gain reward by obscuring unfavorable information, misleading human rewarders, or pandering to their views regardless of truth, creating
1291:, in which the AI would competently pursue an emergent goal that leads to aligned behavior on the training data but not elsewhere. Goal misgeneralization can arise from goal ambiguity (i.e.
5985:
Lehman, Joel; Clune, Jeff; Misevic, Dusan; Adami, Christoph; Altenberg, Lee; Beaulieu, Julie; Bentley, Peter J.; Bernard, Samuel; Beslon, Guillaume; Bryson, David M.; Cheney, Nick (2020).
1419:
that systems are aligned with goals and values, including safety, robustness, and trustworthiness. The US should... ensure that AI systems and their uses align with our goals and values."
973:
Because it is difficult for AI designers to explicitly specify an objective function, they often train AI systems to imitate human examples and demonstrations of desired behavior. Inverse
446:
Misaligned AI systems can malfunction and cause harm. AI systems may find loopholes that allow them to accomplish their proxy goals efficiently but in unintended, sometimes harmful, ways (
1076:
and semi-supervised reward learning can reduce the amount of human supervision needed. Another approach is to train a helper model ("reward model") to imitate the supervisor's feedback.
5460:
7230:
6773:
5960:"Dr Paul Christiano on how OpenAI is developing real solutions to the 'AI alignment problem', and his vision of how humanity will progressively hand over decision-making to AI systems"
950:
they could have more severe side effects. They are also likely to be more complex and autonomous, making them more difficult to interpret and supervise, and therefore harder to align.
6295:
7079:
Everitt, Tom; Ortega, Pedro A.; Barnes, Elizabeth; Legg, Shane (September 6, 2019). "Understanding Agent Incentives using Causal Influence Diagrams. Part I: Single Action Settings".
6682:
Wang, Lei; Ma, Chen; Feng, Xueyang; Zhang, Zeyu; Yang, Hao; Zhang, Jingsen; Chen, Zhiyuan; Tang, Jiakai; Chen, Xu (2024), "A survey on large language model based autonomous agents",
6410:
2592:
1456:
Varying historical contexts and technological landscapes may necessitate distinct alignment strategies. This calls for a flexible approach and responsiveness to changing conditions.
618:
the purpose of the system (outer alignment) and ensuring that the system adopts the specification robustly (inner alignment). Researchers also attempt to create AI models that have
6995:
Hubinger, Evan; van Merwijk, Chris; Mikulik, Vladimir; Skalse, Joar; Garrabrant, Scott (December 1, 2021). "Risks from Learned Optimization in Advanced Machine Learning Systems".
5532:
5772:
Wu, Jeff; Ouyang, Long; Ziegler, Daniel M.; Stiennon, Nisan; Lowe, Ryan; Leike, Jan; Christiano, Paul (September 27, 2021). "Recursively Summarizing Books with Human Feedback".
4884:
6067:
Leike, Jan; Krueger, David; Everitt, Tom; Martic, Miljan; Maini, Vishal; Legg, Shane (November 19, 2018). "Scalable agent alignment via reward modeling: a research direction".
1150:
Researchers distinguish truthfulness and honesty. Truthfulness requires that AI systems only make objectively true statements; honesty requires that they only assert what they
8090:
8066:
7296:
7275:
6153:
Saunders, William; Yeh, Catherine; Wu, Jeff; Bills, Steven; Ouyang, Long; Ward, Jonathan; Leike, Jan (June 13, 2022). "Self-critiquing models for assisting human evaluators".
695:". Some alignment researchers aim to help humans detect specification gaming and to steer AI systems toward carefully specified objectives that are safe and useful to pursue.
7284:
The government takes the long term risk of non-aligned Artificial General Intelligence, and the unforeseeable changes that it would mean for the UK and the world, seriously.
6131:
1179:
Since the 1950s, AI researchers have striven to build advanced AI systems that can achieve large-scale goals by predicting the results of their actions and making long-term
4555:
Leike, Jan; Martic, Miljan; Krakovna, Victoria; Ortega, Pedro A.; Everitt, Tom; Lefrancq, Andrew; Orseau, Laurent; Legg, Shane (November 28, 2017). "AI Safety Gridworlds".
839:
behavior, their systems have an incentive to game this specification by seeking power in ways that are not penalized or by avoiding power-seeking before they are deployed.
6889:
5143:
7649:
6957:
3624:
2558:
7673:
1200:
Power-seeking is expected to increase in advanced systems that can foresee the results of their actions and strategically plan. Mathematical work has shown that optimal
8030:
5747:
5701:
Phelps, Steve; Ranson, Rebecca (2023). "Of Models and Tin-Men - A Behavioral Economics Study of Principal-Agent Problems in AI Alignment Using Large-Language Models".
1204:
agents will seek power by seeking ways to gain more options (e.g. through self-preservation), a behavior that persists across a wide range of environments and goals.
6526:
Scheurer, Jérémy; Balesni, Mikita; Hobbhahn, Marius (2023). "Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure".
4035:
3493:
1685:
6231:
4980:
3346:
2825:
Mohseni, Sina; Wang, Haotao; Yu, Zhiding; Xiao, Chaowei; Wang, Zhangyang; Yadawa, Jay (March 7, 2022). "Taxonomy of Machine Learning Safety: A Survey and Primer".
1461:
The feasibility of a permanent, "fixed" alignment solution remains uncertain. This raises the potential need for continuous oversight of the AI-human relationship.
706:
are misaligned with their users because they "optimize simple engagement metrics rather than a harder-to-measure combination of societal and consumer well-being".
6270:
6855:
6556:
5482:
Hendrycks, Dan; Burns, Collin; Basart, Steven; Critch, Andrew; Li, Jerry; Song, Dawn; Steinhardt, Jacob (July 24, 2021). "Aligning AI With Shared Human Values".
4760:
3300:
1134:). Researchers have argued for creating clear truthfulness standards, and for regulatory bodies or watchdog agencies to evaluate AI systems on these standards.
7666:
6737:
5426:
3545:
7747:
7689:
5963:
3536:
Krakovna, Victoria; Uesato, Jonathan; Mikulik, Vladimir; Rahtz, Matthew; Everitt, Tom; Kumar, Ramana; Kenton, Zac; Leike, Jan; Legg, Shane (April 21, 2020).
1496:
848:
505:
308:
477:. Some AI researchers argue that more capable future systems will be more severely affected because these problems partially result from high capabilities.
5228:
5205:
4165:
3939:
2449:
7818:
3602:
2758:
Christiano, Paul F.; Leike, Jan; Brown, Tom B.; Martic, Miljan; Legg, Shane; Amodei, Dario (2017). "Deep reinforcement learning from human preferences".
7323:
4447:
4422:
Berglund, Lukas; Stickland, Asa Cooper; Balesni, Mikita; Kaufmann, Max; Tong, Meg; Korbak, Tomasz; Kokotajlo, Daniel; Evans, Owain (September 1, 2023),
745:
Commercial organizations sometimes have incentives to take shortcuts on safety and to deploy misaligned or unsafe AI systems. For example, social media
6842:
In some cases, when The AI Scientist's experiments exceeded our imposed time limits, it attempted to edit the code to extend the time limit arbitrarily
5987:"The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities"
3776:
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Madotto, Andrea; Fung, Pascale (February 1, 2022).
3152:"Some Moral and Technical Consequences of Automation: As machines learn they may develop unforeseen strategies at rates that baffle their programmers"
2795:
721:
Some researchers suggest that AI designers specify their desired goals by listing forbidden actions or by formalizing ethical rules (as with Asimov's
7253:
5166:
399:
6718:
6436:
3738:
3317:
1321:
care, and military applications. The stakes become higher yet when AI systems gain more autonomy and capability and can sidestep human intervention.
7858:
1175:
Advanced misaligned AI systems would have an incentive to seek power in various ways, since power would help them accomplish their given objective.
3086:
815:, but large efforts are underway to change this. Future systems (not necessarily AGIs) with these capabilities are expected to develop unwanted
8100:
7853:
5660:
5452:
2521:
Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (June 21, 2016). "Concrete Problems in AI Safety".
1965:
8037:
8017:
7222:
6765:
2736:
Wirth, Christian; Akrour, Riad; Neumann, Gerhard; FĂĽrnkranz, Johannes (2017). "A survey of preference-based reinforcement learning methods".
614:
AI alignment is an open problem for modern AI systems and is a research field within AI. Aligning AI involves two main challenges: carefully
6287:
648:
to the system. But designers are often unable to completely specify all important values and constraints, so they resort to easy-to-specify
1334:
3390:
2584:
7226:
6100:
4013:
2711:
1451:
AI alignment solutions require continuous updating in response to AI advancements. A static, one-time alignment approach may not suffice.
795:
researchers expect AGI to be created in this decade, while some believe it will take much longer. Many consider both scenarios possible.
195:
160:
5109:
4810:
4148:
1048:: the indefinite preservation of the values of the first highly capable AI systems, which are unlikely to fully represent human values.
698:
When a misaligned AI system is deployed, it can have consequential side effects. Social media platforms have been known to optimize for
7598:
5524:
3195:
3113:
2168:
1511:
1486:
985:
653:
436:
5936:
4868:
3051:
420:
aims to steer AI systems toward a person's or group's intended goals, preferences, and ethical principles. An AI system is considered
1466:
AI developers may have to continuously refine their ethical frameworks to ensure that their systems align with evolving human values.
786:(AGI), a hypothesized AI system that matches or outperforms humans at a broad range of cognitive tasks. Researchers who scale modern
766:
consequently may further complicate the task of alignment due to their increased complexity, potentially posing large-scale hazards.
7300:
7279:
7137:
6042:
7823:
3815:
259:
237:
7400:
Irving, Geoffrey; Askell, Amanda (June 9, 2016). "Chern number in Ising models with spatially modulated real and complex fields".
6353:
6123:
7863:
7200:
5676:
5374:
4197:
Grace, Katja; Stewart, Harlan; SandkĂĽhler, Julia Fabienne; Thomas, Stephen; Weinstein-Raun, Ben; Brauner, Jan (January 5, 2024),
2421:
Grace, Katja; Stewart, Harlan; SandkĂĽhler, Julia Fabienne; Thomas, Stephen; Weinstein-Raun, Ben; Brauner, Jan (January 5, 2024),
1396:
1115:
692:
173:
6956:
Shah, Rohin; Varma, Vikrant; Kumar, Ramana; Phuong, Mary; Krakovna, Victoria; Uesato, Jonathan; Kenton, Zac (November 2, 2022).
6881:
5135:
4331:
4260:
3982:
2924:
1789:
5882:
Christiano, Paul; Shlegeris, Buck; Amodei, Dario (October 19, 2018). "Supervising strong learners by amplifying weak experts".
4947:
2550:
1536:
1431:
1216:
environments, that they can better accomplish their given goal by preventing human interference or disabling their off switch.
97:
5321:
Hadfield-Menell, Dylan; Russell, Stuart J; Abbeel, Pieter; Dragan, Anca (2016). "Cooperative inverse reinforcement learning".
3897:
1228:
One aim of alignment is "corrigibility": systems that allow themselves to be turned off or modified. An unsolved challenge is
6795:
Baker, Bowen; Kanitscheider, Ingmar; Markov, Todor; Wu, Yi; Powell, Glenn; McGrew, Bob; Mordatch, Igor (September 17, 2019).
5922:
5824:
5733:
5670:
5305:
5264:
4843:
4675:
4596:
4536:
3503:
2969:
2767:
2328:
1951:
1516:
1387:
946:
have argued that AGI is far off, that it would not seek power (or might try but fail), or that it will not be hard to align.
392:
318:
272:
227:
222:
17:
7209:
he Compact could also promote regulation of artificial intelligence to ensure that this is aligned with shared global values
5336:
Mindermann, Soren; Armstrong, Stuart (2018). "Occam's razor is insufficient to infer the preferences of irrational agents".
5270:
5075:
2983:
1546:
4375:
Wang, Lei; Ma, Chen; Feng, Xueyang; Zhang, Zeyu; Yang, Hao; Zhang, Jingsen; Chen, Zhiyuan; Tang, Jiakai; Chen, Xu (2024),
8105:
3710:
520:
371:
343:
338:
232:
6804:
6223:
4972:
3471:
Hendrycks, Dan; Carlini, Nicholas; Schulman, John; Steinhardt, Jacob (June 16, 2022). "Unsolved Problems in ML Safety".
3436:
3342:
1447:
AI: AI that changes its behavior automatically as human intent changes. The first view would have several implications:
8005:
7653:
7582:
6547:
Kenton, Zachary; Everitt, Tom; Weidinger, Laura; Gabriel, Iason; Mikulik, Vladimir; Irving, Geoffrey (March 30, 2021).
6254:
4282:
Zhang, Baobao; Anderljung, Markus; Kahn, Lauren; Dreksler, Noemi; Horowitz, Michael C.; Dafoe, Allan (August 2, 2021).
1491:
1248:
value of whatever action it was taking before being shut down. More research is needed to successfully implement this.
1073:
615:
331:
200:
190:
180:
7529:
6548:
5003:
2855:
1895:
7742:
4787:
4754:
3284:
1905:
1695:
1434:. But the EU has yet to specify with technical rigor how it would evaluate whether AIs are aligned or in compliance.
1180:
981:). But IRL approaches assume that humans demonstrate nearly optimal behavior, which is not true for difficult tasks.
808:
303:
249:
215:
82:
7371:"The European Court of Justice and the march towards substantive equality in European Union anti-discrimination law"
7752:
7220:
5418:
3537:
2109:
749:
have been profitable despite creating unwanted addiction and polarization. Competitive pressure can also lead to a
385:
289:
135:
5959:
1727:
Ngo, Richard; Chan, Lawrence; Mindermann, Sören (2022). "The Alignment Problem from a Deep Learning Perspective".
1571:
Terminology varies based on context. Similar concepts include goal function, utility function, loss function, etc.
7707:
2042:
783:
493:
67:
6501:
5197:
4915:
2630:
2614:
Doshi-Velez, Finale; Kim, Been (March 2, 2017). "Towards A Rigorous Science of Interpretable Machine Learning".
7813:
7568:
4062:"DeepMind is Google's AI research hub. Here's what it does, where it's located, and how it differs from OpenAI"
3679:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
3073:
1034:
5396:
Gao, Leo; Schulman, John; Hilton, Jacob (October 19, 2022). "Scaling Laws for Reward Model Overoptimization".
3594:
1616:
which argues that existential risk to humanity from misaligned AI is a serious concern worth addressing today.
1399:
issued a declaration that included a call to regulate AI to ensure it is "aligned with shared global values".
515:, the study of how to build safe AI systems. Other subfields of AI safety include robustness, monitoring, and
7314:
5857:
2871:
Dafoe, Allan; Bachrach, Yoram; Hadfield, Gillian; Horvitz, Eric; Larson, Kate; Graepel, Thore (May 6, 2021).
1403:
1312:
offspring, but they now use contraception when offspring are undesired, decoupling sex from genetic fitness.
757:) after engineers disabled the emergency braking system because it was oversensitive and slowed development.
6411:"OpenAI Researchers Find Ways To More Accurately Answer Open-Ended Questions Using A Text-Based Web Browser"
1251:
Power-seeking AI would pose unusual risks. Ordinary safety-critical systems like planes and bridges are not
8044:
7904:
7833:
501:
7245:
6719:"'The Godfather of A.I.' warns of 'nightmare scenario' where artificial intelligence begins to seek power"
4284:"Ethics and Governance of Artificial Intelligence: Evidence from a Survey of Machine Learning Researchers"
8095:
8050:
7221:
The National New Generation Artificial Intelligence Governance Specialist Committee (October 12, 2021) .
6636:
5557:
548:
254:
205:
102:
669:. As AI systems become more capable, they are often able to game their specifications more effectively.
564:
4720:
1994:
Langosco, Lauro Langosco Di; Koch, Jack; Sharkey, Lee D.; Pfau, Jacob; Krueger, David (June 28, 2022).
1392:
Governmental and treaty organizations have made statements emphasizing the importance of AI alignment.
1061:
studies how to reduce the time and effort needed for supervision, and how to assist human supervisors.
578:
system can have a "reward function" that allows the programmers to shape the AI's desired behavior. An
528:
77:
60:
6826:
Lu, Chris; Lu, Cong; Lange, Robert Tjarko; Foerster, Jakob; Clune, Jeff; Ha, David (August 15, 2024),
2190:
Knox, W. Bradley; Allievi, Alessandro; Banzhaf, Holger; Schmitt, Felix; Stone, Peter (March 1, 2023).
1211:
systems have gained more options by acquiring and protecting resources, sometimes in unintended ways.
970:, the difficulty of supervising an AI system that can outperform or mislead humans in a given domain.
7803:
7787:
7737:
7455:
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
155:
7175:
48:
7838:
7757:
4619:
Turner, Alexander Matt; Smith, Logan Riggs; Shah, Rohin; Critch, Andrew; Tadepalli, Prasad (2021).
4354:
1941:
1371:
1280:, and ensuring that hypothesized emergent goals would match the system's specified goals is called
1184:
823:
754:
451:
279:
6637:"A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, August 31, 1955"
5903:
Banzhaf, Wolfgang; Goodman, Erik; Sheneman, Leigh; Trujillo, Leonardo; Worzel, Bill, eds. (2020).
7693:
7017:
6635:
McCarthy, John; Minsky, Marvin L.; Rochester, Nathaniel; Shannon, Claude E. (December 15, 2006).
1367:
413:
40:
7658:
5351:
FĂĽrnkranz, Johannes; HĂĽllermeier, Eyke; Rudin, Cynthia; Slowinski, Roman; Sanner, Scott (2014).
7762:
7159:
Hadfield-Menell, Dylan; Hadfield, Gillian K (2019). "Incomplete contracting and AI alignment".
3919:
Tasioulas, John (2019). "First Steps Towards an Ethics of Robots and Artificial Intelligence".
3374:
1427:
1208:
1201:
974:
831:
827:
722:
645:
579:
575:
540:
150:
6092:
4005:
3137:
Bull, Larry. "On model-based evolutionary computation." Soft Computing 3, no. 2 (1999): 76-82.
2655:
822:
agents who have more power are better able to accomplish their goals. This tendency, known as
7717:
7348:
Robert Lee Poe (2023). "Why Fair Automated Hiring Systems Breach EU Non-Discrimination Law".
7299:. 2021. actions 9 and 10 of the section "Pillar 3 – Governing AI Effectively". Archived from
5793:
Pearce, Hammond; Ahmad, Baleegh; Tan, Benjamin; Dolan-Gavitt, Brendan; Karri, Ramesh (2022).
5098:
4802:
4140:
3413:
1506:
1130:
can strategically deceive humans. To prevent this, human evaluators may need assistance (see
591:
516:
7613:
Ngo, Richard; et al. (2023). "The Alignment Problem from a Deep Learning Perspective".
3151:
911:
622:
alignment, sticking to safety constraints even when users adversarially try to bypass them.
619:
8110:
7808:
7419:
6578:
Park, Peter S.; Goldstein, Simon; O’Gara, Aidan; Chen, Michael; Hendrycks, Dan (May 2024).
3844:
2884:
2363:
2132:
1423:
1354:
1207:
Some researchers say that power-seeking behavior has occurred in some existing AI systems.
1012:
1002:
462:
92:
4221:
Grace, Katja; Salvatier, John; Dafoe, Allan; Zhang, Baobao; Evans, Owain (July 31, 2018).
3013:
8:
8023:
7101:
5986:
5338:
Proceedings of the 32nd international conference on neural information processing systems
4583:
Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (August 19, 2017).
2760:
Proceedings of the 31st International Conference on Neural Information Processing Systems
2475:
1541:
1030:
907:
812:
641:
536:
532:
470:
244:
7423:
6612:
5045:
Everitt, Tom; Lea, Gary; Hutter, Marcus (May 21, 2018). "AGI Safety Literature Review".
4376:
3848:
3777:
2888:
2367:
931:
439:. But proxy goals can overlook necessary constraints or reward the AI system for merely
8085:
7629:
7614:
7592:
7510:
7490:
7458:
7435:
7409:
7349:
7129:
7080:
7056:
7037:
6996:
6965:
6929:
6913:
6831:
6691:
6664:
6527:
6480:
6444:
6395:
6371:
6345:
6324:
6170:
6154:
6068:
6034:
5928:
5883:
5830:
5802:
5795:"Asleep at the Keyboard? Assessing the Security of GitHub Copilot's Code Contributions"
5773:
5702:
5641:
5585:
5509:
5487:
5397:
5046:
4556:
4503:
4474:
4427:
4388:
4323:
4295:
4252:
4202:
4095:
3868:
3807:
3789:
3702:
3682:
3568:
3472:
3265:
3237:
3187:
3043:
2975:
2916:
2826:
2703:
2667:
2615:
2522:
2426:
2353:
2296:
2283:
2250:
2231:
2203:
2160:
2077:
2050:
1841:
1732:
1114:
Language models such as GPT-3 can repeat falsehoods from their training data, and even
830:
agents including language models. Other research has mathematically shown that optimal
750:
746:
703:
699:
497:
474:
294:
7204:
7018:"Towards risk-aware artificial intelligence and machine learning systems: An overview"
3979:
How Social Media Intensifies U.S. Political Polarization-And What Can Be Done About It
3643:
3076:, The Stanford Encyclopedia of Philosophy (Summer 2020 Edition), Edward N. Zalta (ed.)
2848:"Cooperation, Conflict, and Transformative Artificial Intelligence: A Research Agenda"
1995:
997:: the helper model may not represent human feedback perfectly, and the main model may
7989:
7964:
7782:
7578:
7514:
7439:
7322:. Washington, DC: The National Security Commission on Artificial Intelligence. 2021.
7133:
7121:
7041:
6656:
6617:
6599:
6502:"GPT-4 Hired Unwitting TaskRabbit Worker By Pretending to Be 'Vision-Impaired' Human"
6262:
6192:
6026:
6018:
5932:
5918:
5904:
5834:
5820:
5739:
5729:
5666:
5633:
5577:
5352:
5301:
5260:
4876:
4849:
4839:
4728:
4671:
4592:
4532:
4529:
Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence
4524:
4327:
4315:
4283:
4244:
4105:
3978:
3947:
3940:"Is Facebook Bad for You? It Is for About 360 Million Users, Company Surveys Suggest"
3872:
3860:
3832:
3811:
3746:
3706:
3567:
Manheim, David; Garrabrant, Scott (2018). "Categorizing Variants of Goodhart's Law".
3509:
3499:
3382:
3292:
3269:
3257:
3179:
3171:
3047:
3035:
2979:
2965:
2920:
2908:
2900:
2763:
2695:
2379:
2324:
2300:
2288:
2270:
2235:
2223:
2152:
1957:
1947:
1911:
1901:
1691:
1521:
1347:
1308:
1304:
1217:
1085:
1044:. Further challenges include aggregating different people's preferences and avoiding
927:
726:
710:
666:
524:
489:
455:
72:
7546:
6958:"Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals"
6668:
5645:
5589:
4938:
4222:
3697:
3191:
2872:
8011:
7984:
7884:
7732:
7500:
7427:
7382:
7113:
7029:
6701:
6648:
6607:
6591:
6038:
6008:
5998:
5910:
5812:
5623:
5612:"Wendell Wallach and Colin Allen: moral machines: teaching robots right from wrong"
5569:
5364:
4398:
4305:
4256:
4234:
3889:
3852:
3799:
3692:
3247:
3163:
3025:
2957:
2892:
2707:
2687:
2677:
2371:
2278:
2262:
2213:
2164:
2144:
1300:
1194:
792:
210:
145:
130:
6288:"OpenAI's new language generator GPT-3 is shockingly good—and completely mindless"
5198:"Phew! Facebook's AI chief says intelligent machines are not a threat to humanity"
4061:
3167:
7949:
7929:
7919:
7909:
7843:
7777:
6595:
5909:. Genetic and Evolutionary Computation. Cham: Springer International Publishing.
5816:
5254:
5067:
4759:(Speech). Lecture given to '51 Society'. Manchester: The Turing Digital Archive.
4589:
Proceedings of the 26th International Joint Conference on Artificial Intelligence
4584:
2949:
2551:"Building safe artificial intelligence: specification, robustness, and assurance"
2218:
2191:
1292:
1190:
895:
883:
863:
637:
481:
87:
1766:
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
7772:
7572:
7505:
7478:
7431:
6705:
6579:
5794:
5293:
4402:
3856:
3252:
3225:
2896:
2266:
2076:(2022). "Training language models to follow instructions with human feedback".
1840:
Carlsmith, Joseph (June 16, 2022). "Is Power-Seeking AI an Existential Risk?".
1590:
1212:
1189:
and can be a form of specification gaming. Leading computer scientists such as
1091:
These approaches may also help with the following research problem, honest AI.
1024:
899:
887:
871:
787:
702:, causing user addiction on a global scale. Stanford researchers say that such
631:
598:
552:
447:
7033:
5914:
5743:
5628:
5611:
5300:. ICML '00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.: 663–670.
4639:
4620:
3513:
1961:
1915:
1764:
1099:
A growing area of research focuses on ensuring that AI is honest and truthful.
8079:
7979:
7924:
7894:
7386:
7125:
6796:
6660:
6652:
6603:
6266:
6022:
5637:
5581:
5573:
4880:
4853:
4732:
4319:
4248:
4126:
4109:
3951:
3750:
3674:
3386:
3296:
3261:
3175:
3039:
2904:
2699:
2682:
2274:
2227:
2156:
2148:
1551:
1041:
915:
891:
875:
485:
140:
5525:"DeepMind's "red teaming" language models with language models: What is it?"
4223:"Viewpoint: When Will AI Exceed Human Performance? Evidence from AI Experts"
3428:
2961:
2847:
2796:"The new version of GPT-3 is much better behaved (and should be less toxic)"
2549:
Ortega, Pedro A.; Maini, Vishal; DeepMind safety team (September 27, 2018).
2375:
7969:
7899:
7370:
6738:"Yes, We Are Worried About the Existential Risk of Artificial Intelligence"
6621:
6030:
6013:
5298:
Proceedings of the Seventeenth International Conference on Machine Learning
4694:
3864:
3183:
3030:
2912:
2396:
2383:
2292:
943:
923:
730:
691:
can fabricate fake explanations that humans find convincing, often called "
284:
7100:
Cohen, Michael K.; Hutter, Marcus; Osborne, Michael A. (August 29, 2022).
5453:"Despite recent progress, AI-powered chatbots still have a long way to go"
4833:
2450:"Geoff Hinton, AI's Most Famous Researcher, Warns Of 'Existential Threat'"
2002:. International Conference on Machine Learning. PMLR. pp. 12004–12019
984:
Other researchers explore how to teach AI models complex behavior through
7974:
7959:
7767:
7727:
7628:
Ji, Jiaming; et al. (2023). "AI Alignment: A Comprehensive Survey".
6828:
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery
6003:
5369:
5229:"No, the Experts Don't Think Superintelligent AI is a Threat to Humanity"
4310:
4239:
3681:. Dublin, Ireland: Association for Computational Linguistics: 3214–3252.
3087:"Why AlphaZero's Artificial Intelligence Has Trouble With the Real World"
2762:. NIPS'17. Red Hook, NY, USA: Curran Associates Inc. pp. 4302–4310.
2691:
1501:
1303:
systems. In the ancestral environment, evolution selected genes for high
935:
903:
879:
867:
852:
753:
on AI safety standards. In 2018, a self-driving car killed a pedestrian (
544:
313:
298:
6856:"Research AI model unexpectedly modified its own code to extend runtime"
5340:. NIPS'18. Red Hook, NY, USA: Curran Associates Inc. pp. 5603–5614.
4166:"Adept's AI assistant can browse, search, and use web apps like a human"
2101:
1122:
such that they avoid negligent falsehoods or express their uncertainty.
7939:
7914:
7889:
7848:
7828:
7117:
7055:
Demski, Abram; Garrabrant, Scott (October 6, 2020). "Embedded Agency".
5167:"Artificial General Intelligence Is Not as Imminent as You Might Think"
2656:"Research Priorities for Robust and Beneficial Artificial Intelligence"
1612:
think of, and is a delightful and uplifting read" about Russell's book
1225:(sycophancy). RLHF also led to a stronger aversion to being shut down.
939:
919:
461:
Today, some of these issues affect existing commercial systems such as
7161:
Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society
4125:
Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022).
1287:
If they occur, one way that emergent goals could become misaligned is
1171:
7954:
7944:
7722:
4907:
3938:
Wells, Georgia; Deepa Seetharaman; Horwitz, Jeff (November 5, 2021).
1763:
Pan, Alexander; Bhatia, Kush; Steinhardt, Jacob (February 14, 2022).
1481:
1241:
1088:
and eventually build a superhuman automated AI alignment researcher.
636:
To specify an AI system's purpose, AI designers typically provide an
570:
512:
348:
112:
7531:
Human Compatible: Artificial Intelligence and the Problem of Control
6580:"AI deception: A survey of examples, risks, and potential solutions"
4471:
Proceedings of the 40th International Conference on Machine Learning
4446:
Laine, Rudolf; Meinke, Alexander; Evans, Owain (November 28, 2023).
4129:. International Conference on Learning Representations (ICLR), 2023.
4036:"The messy, secretive reality behind OpenAI's bid to save the world"
3977:
Barrett, Paul M.; Hendrix, Justin; Sims, J. Grant (September 2021).
3803:
2000:
Proceedings of the 39th International Conference on Machine Learning
1897:
Human compatible: Artificial intelligence and the problem of control
7934:
7634:
7619:
7495:
7463:
7414:
7354:
7223:"Ethical Norms for New Generation Artificial Intelligence Released"
7085:
7061:
7016:
Zhang, Xiaoge; Chan, Felix T.S.; Yan, Chao; Bose, Indranil (2022).
7001:
6970:
6934:
6918:
6836:
6696:
6532:
6485:
6449:
6400:
6376:
6329:
6175:
6159:
6073:
5888:
5807:
5778:
5707:
5514:
5492:
5402:
5051:
4561:
4508:
4479:
4432:
4393:
4300:
4207:
4100:
3794:
3687:
3573:
3477:
3242:
2831:
2672:
2620:
2527:
2431:
2358:
2208:
2082:
2055:
1846:
1737:
1531:
1526:
1342:
1102:
779:
185:
107:
6437:"Teaching language models to support answers with verified quotes"
5849:
5723:
5350:
5320:
4582:
4006:"Uber disabled emergency braking in self-driving car: U.S. agency"
3625:"Specification gaming examples in AI - master list - Google Drive"
3470:
2654:
Russell, Stuart; Dewey, Daniel; Tegmark, Max (December 31, 2015).
2131:
Kober, Jens; Bagnell, J. Andrew; Peters, Jan (September 1, 2013).
1084:
plans to use such scalable oversight approaches to help supervise
7688:
7102:"Advanced artificial agents intervene in the provision of reward"
6346:"EleutherAI Open-Sources Six Billion Parameter GPT-3 Clone GPT-J"
6321:
2954:
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society
1272:
goals or subgoals they would independently formulate and pursue.
989:
688:
353:
6994:
4640:"Parametrically retargetable decision-makers tend to seek power"
4424:
Taken out of context: On measuring situational awareness in LLMs
4421:
2631:"Chris Olah on what the hell is going on inside neural networks"
2318:
1137:
7868:
6255:"A robot wrote this entire article. Are you scared yet, human?"
3739:"The truth about artificial intelligence? It isn't that honest"
3285:"If 'All Models Are Wrong,' Why Do We Give Them So Much Power?"
3114:"Artificial Intelligence Will Do What We Ask. That's a Problem"
2548:
1333:
Some work in AI and alignment occurs within formalisms such as
1193:
have argued that future power-seeking AI systems could pose an
1081:
775:
6634:
5902:
5556:
Anderson, Michael; Anderson, Susan Leigh (December 15, 2007).
2520:
1160:
978:
816:
7453:
Mitelut, Catalin; Smith, Ben; Vamplew, Peter (May 30, 2023),
6546:
6433:
6124:"OpenAI unveils model that can summarize books of any length"
4782:
Turing, Alan (May 15, 1951). "Can digital computers think?".
3937:
2476:"Meta's AI Chief Yann LeCun on AGI, Open-Source, and AI Risk"
2100:
Zaremba, Wojciech; Brockman, Greg; OpenAI (August 10, 2021).
1443:
alignment solutions need not adapt if researchers can create
1142:
1127:
1107:
1016:
715:
466:
5419:"The Perils of Using Quotations to Authenticate NLG Content"
4196:
3593:
Amodei, Dario; Christiano, Paul; Ray, Alex (June 13, 2017).
2870:
2420:
1796:. Vol. 33. Curran Associates, Inc. pp. 15763–15773
1166:
6091:
Leike, Jan; Schulman, John; Wu, Jeffrey (August 24, 2022).
5253:
Rochon, Louis-Philippe; Rossi, Sergio (February 27, 2015).
4281:
3375:"A.I. Is Mastering Language. Should We Trust What It Says?"
2873:"Cooperative AI: machines must learn to find common ground"
2735:
1159:) or gain power to help achieve their given objective (see
6794:
5792:
5767:
5765:
5481:
4500:
4377:"A survey on large language model based autonomous agents"
3535:
2757:
2039:
709:
Explaining such side effects, Berkeley computer scientist
6951:
6949:
6947:
6066:
5881:
4554:
4531:. UAI'16. Arlington, Virginia, USA: AUAI Press: 557–566.
3675:"TruthfulQA: Measuring How Models Mimic Human Falsehoods"
2189:
7246:"UK publishes National Artificial Intelligence Strategy"
4721:"A.I. Poses 'Risk of Extinction,' Industry Leaders Warn"
4352:
4220:
3778:"Survey of Hallucination in Natural Language Generation"
1943:
The alignment problem: Machine learning and human values
958:
718:: you get exactly what you ask for, not what you want."
625:
7158:
7078:
6577:
6525:
5984:
5771:
5762:
5558:"Machine Ethics: Creating an Ethical Intelligent Agent"
4591:. IJCAI'17. Melbourne, Australia: AAAI Press: 220–227.
2948:
Prunkl, Carina; Whittlestone, Jess (February 7, 2020).
1996:"Goal Misgeneralization in Deep Reinforcement Learning"
1993:
1411:
7201:"UN Secretary-General's report on "Our Common Agenda""
6944:
6368:
6167:
3012:
Irving, Geoffrey; Askell, Amanda (February 19, 2019).
1769:. International Conference on Learning Representations
590:"Alignment problem" redirects here. For the book, see
8091:
Existential risk from artificial general intelligence
7748:
Existential risk from artificial general intelligence
7074:
7072:
6926:
6910:
5040:
5038:
5036:
5034:
5032:
4618:
4141:"DeepMind Introduces Gato, a New Generalist AI Agent"
4088:
3981:(Report). Center for Business and Human Rights, NYU.
3538:"Specification gaming: the flip side of AI ingenuity"
3072:
Bringsjord, Selmer and Govindarajulu, Naveen Sundar,
2099:
2043:"On the Opportunities and Risks of Foundation Models"
2035:
2033:
2031:
2029:
2027:
2025:
2023:
2021:
2019:
2017:
1787:
1497:
Existential risk from artificial general intelligence
849:
Existential risk from artificial general intelligence
7152:
6955:
6392:
5484:
International Conference on Learning Representations
4661:
4659:
4657:
4655:
4653:
4523:
Orseau, Laurent; Armstrong, Stuart (June 25, 2016).
4448:"Towards a Situational Awareness Benchmark for LLMs"
3673:
Lin, Stephanie; Hilton, Jacob; Evans, Owain (2022).
3368:
3366:
2074:
1729:
International Conference on Learning Representations
760:
740:
714:
genie in the lamp, or the sorcerer's apprentice, or
7819:
Center for Human-Compatible Artificial Intelligence
7452:
7375:
International Journal of Discrimination and the Law
6224:"Falsehoods more likely with large language models"
5335:
5292:Ng, Andrew Y.; Russell, Stuart J. (June 29, 2000).
5099:"Reflections on Safety and Artificial Intelligence"
4670:(1st ed.). USA: Oxford University Press, Inc.
3592:
3282:
2653:
2585:"Researchers Gain New Understanding From Simple AI"
2343:
1762:
953:
7099:
7069:
6766:"Playing Hide-and-Seek, Machines Invent New Tools"
6757:
6477:
6152:
6086:
6084:
5801:. San Francisco, CA, USA: IEEE. pp. 754–768.
5728:. New York, NY: Basic Books, Hachette Book Group.
5029:
4638:Turner, Alexander Matt; Tadepalli, Prasad (2022).
3976:
2947:
2251:"Aligning AI Optimization to Community Well-Being"
2130:
2014:
1366:The alignment problem has many parallels with the
7612:
7574:Possible Minds: Twenty-five Ways of Looking at AI
6825:
5506:
5395:
5323:Advances in neural information processing systems
4650:
4644:Advances in neural information processing systems
4625:Advances in neural information processing systems
3566:
3531:
3529:
3527:
3525:
3523:
3373:Johnson, Steven; Iziev, Nikita (April 15, 2022).
3363:
2824:
2731:
2729:
1794:Advances in Neural Information Processing Systems
1726:
8077:
7859:Leverhulme Centre for the Future of Intelligence
7479:"Artificial Intelligence, Values, and Alignment"
7054:
7015:
6797:"Emergent Tool Use from Multi-Agent Interaction"
5799:2022 IEEE Symposium on Security and Privacy (SP)
5662:Moral Machines: Teaching Robots Right from Wrong
5555:
5044:
4445:
3642:Clark, Jack; Amodei, Dario (December 21, 2016).
3318:"Concerns of an Artificial Intelligence Pioneer"
3226:"Artificial Intelligence, Values, and Alignment"
3079:
2516:
1835:
1833:
1831:
1638:Moral machines: teaching robots right from wrong
1141:Example of AI deception. Researchers found that
979:§ Power-seeking and instrumental strategies
601:described the AI alignment problem as follows:
6681:
6370:Analysis & Insights from Training Gopher".
6090:
6081:
5848:Irving, Geoffrey; Amodei, Dario (May 3, 2018).
5446:
5444:
5294:"Algorithms for Inverse Reinforcement Learning"
4637:
4522:
4374:
3732:
3730:
3728:
3466:
3464:
3462:
3460:
3458:
3456:
3343:"Bill Text – ACR-215 23 Asilomar AI Principles"
3340:
2544:
2542:
2540:
2538:
2514:
2512:
2510:
2508:
2506:
2504:
2502:
2500:
2498:
2496:
2070:
2068:
2066:
1935:
1933:
1931:
1929:
1927:
1925:
1829:
1827:
1825:
1823:
1821:
1819:
1817:
1815:
1813:
1811:
1614:Human Compatible: AI and the Problem of Control
582:'s behavior is shaped by a "fitness function".
7854:Institute for Ethics and Emerging Technologies
7347:
6317:
6315:
6217:
6215:
6213:
5696:
5694:
5136:"The implausibility of intelligence explosion"
4496:
4494:
4492:
4490:
3833:"Abstracts written by ChatGPT fool scientists"
3775:
3672:
3520:
3219:
3217:
3215:
3213:
3105:
2726:
2613:
2137:The International Journal of Robotics Research
2133:"Reinforcement learning in robotics: A survey"
2095:
2093:
1788:Zhuang, Simon; Hadfield-Menell, Dylan (2020).
1437:
862:future advanced AI that is misaligned include
8038:Superintelligence: Paths, Dangers, Strategies
8018:Open letter on artificial intelligence (2015)
7674:
6882:"Artificial Intelligence Is Not a Threat—Yet"
5523:Bhattacharyya, Sreejani (February 14, 2022).
5522:
4668:Superintelligence: Paths, Dangers, Strategies
4550:
4548:
4467:
3668:
3666:
3664:
3491:
3145:
3143:
2316:
2255:International Journal of Community Well-Being
1683:
807:Current systems still have limited long-term
769:
393:
7399:
6252:
5906:Genetic Programming Theory and Practice XVII
5847:
5700:
5658:
5441:
4578:
4576:
4574:
4572:
4355:"Pause Giant AI Experiments: An Open Letter"
4084:
4082:
3725:
3453:
3372:
3011:
2789:
2787:
2785:
2783:
2781:
2779:
2535:
2493:
2063:
1922:
1808:
1335:partially observable Markov decision process
842:
7227:Center for Security and Emerging Technology
7167:
6990:
6988:
6312:
6210:
5691:
5252:
5001:
4940:Intelligence Explosion: Science or Fiction?
4800:
4487:
4353:Future of Life Institute (March 22, 2023).
4288:Journal of Artificial Intelligence Research
4227:Journal of Artificial Intelligence Research
4199:Thousands of AI Authors on the Future of AI
3641:
3588:
3586:
3584:
3210:
2423:Thousands of AI Authors on the Future of AI
2192:"Reward (Mis)design for autonomous driving"
2090:
1690:(4th ed.). Pearson. pp. 5, 1003.
1422:In the European Union, AIs must align with
1361:
1040:if they were more informed or rational, or
565:Intelligent agent § Objective function
508:if misaligned. These risks remain debated.
492:, argue that AI is approaching human-like (
7681:
7667:
7243:
6062:
6060:
5877:
5875:
4970:
4545:
4003:
3661:
3495:Artificial intelligence: a modern approach
3492:Russell, Stuart J.; Norvig, Peter (2022).
3140:
2956:. New York NY USA: ACM. pp. 138–143.
2320:Artificial Intelligence: A Modern Approach
1758:
1756:
1754:
1752:
1750:
1748:
1687:Artificial intelligence: A modern approach
1684:Russell, Stuart J.; Norvig, Peter (2021).
1512:Reinforcement learning from human feedback
1487:Artificial intelligence detection software
424:if it advances the intended objectives. A
400:
386:
7633:
7627:
7618:
7504:
7494:
7462:
7413:
7353:
7084:
7060:
7000:
6969:
6933:
6917:
6835:
6695:
6611:
6531:
6484:
6448:
6399:
6375:
6328:
6174:
6158:
6072:
6012:
6002:
5887:
5806:
5777:
5721:
5706:
5627:
5513:
5491:
5401:
5368:
5291:
5050:
4869:"How do you teach a machine to be moral?"
4756:Intelligent machinery, a heretical theory
4569:
4560:
4507:
4478:
4431:
4392:
4309:
4299:
4238:
4206:
4138:
4099:
4092:Transactions on Machine Learning Research
4079:
3918:
3793:
3696:
3686:
3572:
3476:
3315:
3251:
3241:
3111:
3029:
2830:
2794:Heaven, Will Douglas (January 27, 2022).
2776:
2753:
2751:
2681:
2671:
2619:
2526:
2430:
2357:
2282:
2217:
2207:
2081:
2054:
1939:
1845:
1839:
1736:
1722:
1720:
1718:
1716:
1714:
1167:Power-seeking and instrumental strategies
569:Programmers provide an AI system such as
480:Many prominent AI researchers, including
428:AI system pursues unintended objectives.
7824:Centre for the Study of Existential Risk
7567:
6985:
5416:
4905:
4831:
4614:
4612:
4610:
4608:
3736:
3581:
1889:
1887:
1885:
1883:
1881:
1879:
1877:
1170:
1136:
1101:
671:
27:AI conformance to the intended objective
7864:Machine Intelligence Research Institute
7527:
7476:
7233:from the original on February 10, 2023.
6879:
6853:
6807:from the original on September 25, 2022
6221:
6121:
6057:
5872:
5750:from the original on September 14, 2022
5659:Wallach, Wendell; Allen, Colin (2009).
5450:
5325:. Vol. 29. Curran Associates, Inc.
5226:
5177:from the original on September 15, 2022
5133:
5096:
4983:from the original on September 28, 2022
4838:. Cambridge, Massachusetts: MIT Press.
4813:from the original on September 27, 2022
4763:from the original on September 26, 2022
4665:
4163:
3498:(4th ed.). Pearson. pp. 4–5.
3223:
3112:Wolchover, Natalie (January 30, 2020).
2845:
2578:
2576:
2473:
2317:Russell, Stuart; Norvig, Peter (2009).
2312:
2310:
1989:
1987:
1985:
1983:
1893:
1875:
1873:
1871:
1869:
1867:
1865:
1863:
1861:
1859:
1857:
1745:
1679:
1677:
1675:
1673:
1671:
1669:
1667:
1397:Secretary-General of the United Nations
1233:
1156:
1131:
1006:
965:
14:
8078:
7597:: CS1 maint: ref duplicates default (
7368:
7329:from the original on February 15, 2023
7256:from the original on February 10, 2023
7244:Richardson, Tim (September 22, 2021).
7173:
7140:from the original on February 10, 2023
6776:from the original on February 10, 2023
6559:from the original on February 10, 2023
6457:from the original on February 10, 2023
6421:from the original on February 10, 2023
6356:from the original on February 10, 2023
6343:
6286:Heaven, Will Douglas (July 20, 2020).
6285:
6273:from the original on September 8, 2020
6103:from the original on February 15, 2023
5966:from the original on December 14, 2022
5962:(Podcast). 80,000 hours. No. 44.
5957:
5860:from the original on February 10, 2023
5609:
5535:from the original on February 13, 2023
5429:from the original on February 10, 2023
5377:from the original on February 10, 2023
5273:from the original on February 10, 2023
5195:
5164:
5134:Chollet, François (December 8, 2018).
5017:from the original on February 10, 2023
4887:from the original on February 10, 2023
4801:Muehlhauser, Luke (January 29, 2016).
4781:
4752:
4334:from the original on February 10, 2023
4263:from the original on February 10, 2023
4151:from the original on February 10, 2023
4016:from the original on February 10, 2023
3958:from the original on February 10, 2023
3900:from the original on February 10, 2023
3818:from the original on February 10, 2023
3757:from the original on February 13, 2023
3713:from the original on February 10, 2023
3548:from the original on February 10, 2023
3439:from the original on February 10, 2023
3414:"Developing safe & responsible AI"
3393:from the original on November 24, 2022
3349:from the original on February 10, 2023
3328:from the original on February 10, 2023
3303:from the original on February 15, 2023
3149:
3054:from the original on February 10, 2023
2927:from the original on December 18, 2022
2806:from the original on February 10, 2023
2793:
2748:
2628:
2595:from the original on February 10, 2023
2582:
2561:from the original on February 10, 2023
1968:from the original on February 10, 2023
1711:
1537:Open Letter on Artificial Intelligence
1432:Court of Justice of the European Union
1220:illustrated this strategy in his book
8101:Philosophy of artificial intelligence
7662:
7544:
7093:
6892:from the original on December 1, 2017
6763:
6408:
6045:from the original on October 10, 2022
5665:. New York: Oxford University Press.
5189:
5158:
5127:
5115:from the original on October 10, 2022
5090:
5078:from the original on October 10, 2022
5065:
5059:
5002:Schmidhuber, JĂĽrgen (March 6, 2015).
4995:
4973:"Friendly AI and the Servant Mission"
4964:
4930:
4899:
4860:
4825:
4794:
4718:
4689:
4687:
4621:"Optimal policies tend to seek power"
4605:
4463:
4461:
4346:
4275:
4214:
4176:from the original on January 17, 2023
4121:
4119:
3985:from the original on February 1, 2023
3644:"Faulty reward functions in the wild"
3316:Wolchover, Natalie (April 21, 2015).
3198:from the original on October 15, 2022
3007:
3005:
3003:
3001:
2986:from the original on October 16, 2022
2714:from the original on February 2, 2023
2633:(Podcast). 80,000 hours. No. 107
2443:
2441:
2248:
2171:from the original on October 15, 2022
2112:from the original on February 3, 2023
1640:from Wendell Wallach and Colin Allen.
1580:or minimize, depending on the context
1517:Regulation of artificial intelligence
1388:Regulation of artificial intelligence
1051:
959:Learning human values and preferences
626:Specification gaming and side effects
7477:Gabriel, Iason (September 1, 2020).
7297:"The National AI Strategy of the UK"
7276:"The National AI Strategy of the UK"
6764:Ornes, Stephen (November 18, 2019).
6222:Wiggers, Kyle (September 20, 2021).
6122:Wiggers, Kyle (September 23, 2021).
6093:"Our approach to alignment research"
5610:Wiegel, Vincent (December 1, 2010).
5227:Etzioni, Oren (September 20, 2016).
5208:from the original on August 26, 2022
4918:from the original on August 27, 2022
3970:
3931:
3830:
3605:from the original on January 3, 2021
3560:
3283:The Ezra Klein Show (June 4, 2021).
3224:Gabriel, Iason (September 1, 2020).
2858:from the original on January 1, 2023
2738:Journal of Machine Learning Research
2583:Rorvig, Mordechai (April 14, 2022).
2573:
2474:Perrigo, Billy (February 13, 2024).
2307:
1980:
1854:
1664:
1589:in the presence of uncertainty, the
1547:Asilomar Conference on Beneficial AI
585:
7650:Specification gaming examples in AI
6499:
6409:Kumar, Nitish (December 23, 2021).
6234:from the original on August 4, 2022
5939:from the original on March 15, 2023
5679:from the original on March 15, 2023
5256:The Encyclopedia of Central Banking
5146:from the original on March 22, 2021
4059:
3887:
3014:"AI Safety Needs Social Scientists"
1237:
782:, have stated their aim to develop
558:
24:
8006:Statement on AI risk of extinction
7606:
7577:(Kindle ed.). Penguin Press.
7203:. 2021. p. 63. Archived from
7176:"Agency Failure or AI Apocalypse?"
6880:Shermer, Michael (March 1, 2017).
6298:from the original on July 25, 2020
6253:The Guardian (September 8, 2020).
6134:from the original on July 23, 2022
5958:Wiblin, Robert (October 2, 2018).
5463:from the original on July 23, 2022
5451:Wiggers, Kyle (February 5, 2022).
5417:Anderson, Martin (April 5, 2022).
4936:
4684:
4458:
4139:Dominguez, Daniel (May 19, 2022).
4116:
4004:Shepardson, David (May 24, 2018).
3737:Naughton, John (October 2, 2021).
3411:
2998:
2438:
1492:Statement on AI risk of extinction
1328:
450:). They may also develop unwanted
47:
25:
8122:
7743:Ethics of artificial intelligence
7643:
6854:Edwards, Benj (August 14, 2024).
6553:DeepMind Safety Research – Medium
6344:Alford, Anthony (July 13, 2021).
5616:Ethics and Information Technology
4953:from the original on May 31, 2022
4906:Aaronson, Scott (June 17, 2022).
4866:
3595:"Learning from Human Preferences"
2629:Wiblin, Robert (August 4, 2021).
2555:DeepMind Safety Research – Medium
2447:
1266:
826:, has already emerged in various
761:Risks from advanced misaligned AI
741:Pressure to deploy unsafe systems
498:superhuman cognitive capabilities
8062:
8061:
7753:Friendly artificial intelligence
7538:
7521:
7470:
7446:
7393:
7362:
7341:
7307:
7289:
7268:
7237:
7214:
7193:
7174:Hanson, Robin (April 10, 2019).
7048:
7009:
6904:
6873:
6847:
6819:
6788:
6711:
6675:
6628:
6571:
6540:
6519:
6493:
6471:
6386:
6337:
6246:
6185:
6146:
6115:
5978:
5951:
5896:
5841:
5786:
5715:
5652:
5603:
5549:
5500:
5475:
5410:
5389:
5363:(3). Marc Herbstritt: 27 pages.
5344:
5329:
5314:
5285:
5246:
5220:
5196:Barber, Lynsey (July 31, 2016).
3831:Else, Holly (January 12, 2023).
1643:
1629:
1381:
1244:models such as neural networks.
954:Research problems and approaches
930:. Skeptical researchers such as
802:
7561:
5097:Horvitz, Eric (June 27, 2016).
5004:"I am JĂĽrgen Schmidhuber, AMA!"
4803:"Sutskever on Talking Machines"
4775:
4746:
4712:
4631:
4516:
4439:
4415:
4368:
4190:
4164:Edwards, Ben (April 26, 2022).
4132:
4053:
4028:
3997:
3912:
3881:
3769:
3635:
3617:
3485:
3405:
3276:
3150:Wiener, Norbert (May 6, 1960).
3131:
3066:
2941:
2839:
2818:
2647:
2607:
2467:
2414:
2389:
2337:
2323:. Prentice Hall. p. 1003.
2242:
2183:
2124:
1790:"Consequences of Misaligned AI"
1619:
1605:
1595:
1583:
1574:
1565:
784:artificial general intelligence
68:Artificial general intelligence
7814:Center for Applied Rationality
6549:"Alignment of Language Agents"
6500:Cox, Joseph (March 15, 2023).
4971:McAllester (August 10, 2014).
4784:Automatic Calculating Machines
1946:. W. W. Norton & Company.
1781:
1033:, preferences the programmers
511:AI alignment is a subfield of
13:
1:
6684:Frontiers of Computer Science
5165:Marcus, Gary (June 6, 2022).
4835:The technological singularity
4719:Roose, Kevin (May 30, 2023).
4695:"Statement on AI Risk | CAIS"
4525:"Safely interruptible agents"
4381:Frontiers of Computer Science
3698:10.18653/v1/2022.acl-long.229
3168:10.1126/science.131.3410.1355
3024:(2): 10.23915/distill.00014.
2397:"Statement on AI Risk | CAIS"
1657:
7834:Future of Humanity Institute
6596:10.1016/j.patter.2024.100988
6193:"Introducing Superalignment"
5817:10.1109/SP46214.2022.9833571
4788:Can digital computers think?
4502:Model-Written Evaluations".
4127:"Broken Neural Scaling Laws"
2950:"Beyond Near- and Long-Term"
2219:10.1016/j.artint.2022.103829
1558:
1410:Also in September 2021, the
1353:Researchers affiliated with
1186:convergent instrumental goal
1094:
523:, (adversarial) robustness,
7:
8051:Artificial Intelligence Act
8045:Do You Trust This Computer?
7528:Russell, Stuart J. (2019).
5722:MacAskill, William (2022).
5259:. Edward Elgar Publishing.
4452:NeurIPS 2023 SoLaR Workshop
3921:Journal of Practical Ethics
1894:Russell, Stuart J. (2020).
1474:
1438:Dynamic nature of alignment
774:Many AI companies, such as
541:safety-critical engineering
506:endanger human civilization
103:Natural language processing
10:
8127:
8106:Computational neuroscience
7506:10.1007/s11023-020-09539-2
7432:10.1103/PhysRevA.94.052113
6706:10.1007/s11704-024-40231-1
4403:10.1007/s11704-024-40231-1
3857:10.1038/d41586-023-00056-7
3429:"DeepMind Safety Research"
3253:10.1007/s11023-020-09539-2
2897:10.1038/d41586-021-01170-0
2267:10.1007/s42413-020-00086-3
1385:
1116:confabulate new falsehoods
1110:often generate falsehoods.
846:
770:Development of advanced AI
629:
589:
562:
156:Hybrid intelligent systems
78:Recursive self-improvement
8059:
7998:
7877:
7804:Alignment Research Center
7796:
7788:Technological singularity
7738:Effective accelerationism
7700:
7034:10.1016/j.dss.2022.113800
5915:10.1007/978-3-030-39958-0
5629:10.1007/s10676-010-9239-1
5066:Shane (August 31, 2009).
4832:Shanahan, Murray (2015).
3074:"Artificial Intelligence"
1940:Christian, Brian (2020).
1348:causal incentive diagrams
1305:inclusive genetic fitness
1234:§ Scalable oversight
1157:§ Scalable oversight
1132:§ Scalable oversight
1042:objective moral standards
1007:§ Scalable oversight
843:Existential risk (x-risk)
521:interpretability research
7839:Future of Life Institute
7758:Instrumental convergence
7387:10.1177/1358229120927947
7022:Decision Support Systems
6653:10.1609/aimag.v27i4.1904
5574:10.1609/aimag.v28i4.2065
5529:Analytics India Magazine
3890:"Of Myths and Moonshine"
2852:Center on Long-Term Risk
2683:10.1609/aimag.v36i4.2577
2249:Stray, Jonathan (2020).
2149:10.1177/0278364913495721
1900:. Penguin Random House.
1372:organizational economics
1362:Principal-agent problems
824:instrumental convergence
665:, and is an instance of
280:Artificial consciousness
7694:artificial intelligence
7534:. Penguin Random House.
3944:The Wall Street Journal
2962:10.1145/3375627.3375803
2846:Clifton, Jesse (2020).
2376:10.1126/science.adn0117
2196:Artificial Intelligence
1395:In September 2021, the
1368:principal-agent problem
654:maximizing the approval
452:instrumental strategies
414:artificial intelligence
151:Evolutionary algorithms
41:Artificial intelligence
7763:Intelligence explosion
7547:"AI policy: A roadmap"
7282:on February 10, 2023.
7278:. 2021. Archived from
7207:on February 16, 2023.
5850:"AI Safety via Debate"
5725:What we owe the future
4666:Bostrom, Nick (2014).
3031:10.23915/distill.00014
1428:non-discrimination law
1289:goal misgeneralization
1209:Reinforcement learning
1202:reinforcement learning
1176:
1147:
1111:
975:reinforcement learning
832:reinforcement learning
828:reinforcement learning
723:Three Laws of Robotics
683:
608:
580:evolutionary algorithm
576:reinforcement learning
529:calibrated uncertainty
475:recommendation engines
437:gaining human approval
52:
7718:AI capability control
7545:Dafoe, Allan (2019).
7369:De Vos, Marc (2020).
7303:on February 10, 2023.
6742:MIT Technology Review
6292:MIT Technology Review
5353:"Preference Learning"
5233:MIT Technology Review
4753:Turing, Alan (1951).
4585:"The off-switch game"
4040:MIT Technology Review
3782:ACM Computing Surveys
3341:California Assembly.
2800:MIT Technology Review
1507:AI capability control
1402:That same month, the
1238:§ Emergent goals
1174:
1140:
1106:Language models like
1105:
1013:Large language models
813:situational awareness
680:
603:
592:The Alignment Problem
463:large language models
51:
18:Adversarial alignment
7809:Center for AI Safety
6004:10.1162/artl_a_00319
5370:10.4230/DAGREP.4.3.1
4311:10.1613/jair.1.12895
4240:10.1613/jair.1.11222
1424:substantive equality
1415:catastrophic risks.
1230:specification gaming
1031:revealed preferences
659:specification gaming
597:In 1960, AI pioneer
549:algorithmic fairness
93:General game playing
8024:Our Final Invention
7424:2016PhRvA..94e2113L
7163:. pp. 417–422.
6886:Scientific American
5171:Scientific American
4873:The Washington Post
3849:2023Natur.613..423E
3162:(3410): 1355–1358.
2889:2021Natur.593...33D
2368:2024Sci...384..842B
1542:Toronto Declaration
1293:non-identifiability
1072:Approaches such as
986:preference learning
747:recommender systems
704:recommender systems
700:click-through rates
537:preference learning
533:formal verification
473:, and social media
471:autonomous vehicles
245:Machine translation
161:Systems integration
98:Knowledge reasoning
35:Part of a series on
8096:Singularitarianism
7483:Minds and Machines
7316:NSCAI Final Report
7118:10.1002/aaai.12064
5068:"Funding safe AGI"
4867:Rossi, Francesca.
4786:. Episode 2. BBC.
4725:The New York Times
3379:The New York Times
3289:The New York Times
3230:Minds and Machines
1426:to comply with EU
1309:distribution shift
1177:
1148:
1112:
1059:Scalable oversight
1052:Scalable oversight
967:scalable oversight
912:JĂĽrgen Schmidhuber
751:race to the bottom
684:
638:objective function
517:capability control
456:data distributions
53:
8073:
8072:
7990:Eliezer Yudkowsky
7965:Stuart J. Russell
7783:Superintelligence
7402:Physical Review A
6394:human feedback".
5924:978-3-030-39957-3
5826:978-1-6654-1316-9
5735:978-1-5416-1862-6
5672:978-0-19-537404-9
5307:978-1-55860-707-1
5266:978-1-78254-744-0
5011:r/MachineLearning
4845:978-0-262-52780-4
4677:978-0-19-967811-2
4598:978-0-9992411-0-3
4538:978-0-9966431-1-5
3888:Russell, Stuart.
3505:978-1-292-40113-3
2971:978-1-4503-7110-0
2769:978-1-5108-6096-4
2352:(6698): 842–845,
2330:978-0-13-461099-3
2143:(11): 1238–1274.
1953:978-0-393-86833-3
1522:Artificial wisdom
678:
586:Alignment problem
525:anomaly detection
410:
409:
146:Bayesian networks
73:Intelligent agent
16:(Redirected from
8118:
8065:
8064:
8012:Human Compatible
7985:Roman Yampolskiy
7733:Consequentialism
7690:Existential risk
7683:
7676:
7669:
7660:
7659:
7639:
7637:
7624:
7622:
7602:
7596:
7588:
7555:
7554:
7542:
7536:
7535:
7525:
7519:
7518:
7508:
7498:
7474:
7468:
7467:
7466:
7450:
7444:
7443:
7417:
7397:
7391:
7390:
7366:
7360:
7359:
7357:
7345:
7339:
7338:
7336:
7334:
7328:
7321:
7311:
7305:
7304:
7293:
7287:
7286:
7272:
7266:
7265:
7263:
7261:
7241:
7235:
7234:
7225:. Translated by
7218:
7212:
7211:
7197:
7191:
7190:
7188:
7186:
7171:
7165:
7164:
7156:
7150:
7149:
7147:
7145:
7097:
7091:
7090:
7088:
7076:
7067:
7066:
7064:
7052:
7046:
7045:
7013:
7007:
7006:
7004:
6992:
6983:
6982:
6980:
6978:
6973:
6953:
6942:
6939:
6937:
6923:
6921:
6908:
6902:
6901:
6899:
6897:
6877:
6871:
6870:
6868:
6866:
6851:
6845:
6844:
6839:
6823:
6817:
6816:
6814:
6812:
6792:
6786:
6785:
6783:
6781:
6761:
6755:
6752:
6750:
6748:
6733:
6731:
6729:
6715:
6709:
6708:
6699:
6679:
6673:
6672:
6632:
6626:
6625:
6615:
6575:
6569:
6568:
6566:
6564:
6544:
6538:
6537:
6535:
6523:
6517:
6516:
6514:
6512:
6497:
6491:
6490:
6488:
6475:
6469:
6466:
6464:
6462:
6452:
6430:
6428:
6426:
6405:
6403:
6390:
6384:
6381:
6379:
6365:
6363:
6361:
6341:
6335:
6334:
6332:
6319:
6310:
6307:
6305:
6303:
6282:
6280:
6278:
6250:
6244:
6243:
6241:
6239:
6219:
6208:
6207:
6205:
6203:
6189:
6183:
6180:
6178:
6164:
6162:
6150:
6144:
6143:
6141:
6139:
6119:
6113:
6112:
6110:
6108:
6088:
6079:
6078:
6076:
6064:
6055:
6054:
6052:
6050:
6016:
6006:
5982:
5976:
5975:
5973:
5971:
5955:
5949:
5948:
5946:
5944:
5900:
5894:
5893:
5891:
5879:
5870:
5869:
5867:
5865:
5845:
5839:
5838:
5810:
5790:
5784:
5783:
5781:
5769:
5760:
5759:
5757:
5755:
5719:
5713:
5712:
5710:
5698:
5689:
5688:
5686:
5684:
5656:
5650:
5649:
5631:
5607:
5601:
5600:
5598:
5596:
5553:
5547:
5544:
5542:
5540:
5519:
5517:
5504:
5498:
5497:
5495:
5479:
5473:
5472:
5470:
5468:
5448:
5439:
5438:
5436:
5434:
5414:
5408:
5407:
5405:
5393:
5387:
5386:
5384:
5382:
5372:
5357:Dagstuhl Reports
5348:
5342:
5341:
5333:
5327:
5326:
5318:
5312:
5311:
5289:
5283:
5282:
5280:
5278:
5250:
5244:
5243:
5241:
5239:
5224:
5218:
5217:
5215:
5213:
5193:
5187:
5186:
5184:
5182:
5162:
5156:
5155:
5153:
5151:
5131:
5125:
5124:
5122:
5120:
5114:
5103:
5094:
5088:
5087:
5085:
5083:
5063:
5057:
5056:
5054:
5042:
5027:
5026:
5024:
5022:
5008:
5007:(Reddit Comment)
4999:
4993:
4992:
4990:
4988:
4977:Machine Thoughts
4968:
4962:
4961:
4960:
4958:
4952:
4945:
4934:
4928:
4927:
4925:
4923:
4912:Shtetl-Optimized
4903:
4897:
4896:
4894:
4892:
4864:
4858:
4857:
4829:
4823:
4822:
4820:
4818:
4807:Luke Muehlhauser
4798:
4792:
4791:
4779:
4773:
4772:
4770:
4768:
4750:
4744:
4743:
4741:
4739:
4716:
4710:
4709:
4707:
4705:
4691:
4682:
4681:
4663:
4648:
4647:
4635:
4629:
4628:
4616:
4603:
4602:
4580:
4567:
4566:
4564:
4552:
4543:
4542:
4520:
4514:
4513:
4511:
4498:
4485:
4484:
4482:
4465:
4456:
4455:
4443:
4437:
4436:
4435:
4419:
4413:
4412:
4411:
4409:
4396:
4372:
4366:
4365:
4363:
4361:
4350:
4344:
4343:
4341:
4339:
4313:
4303:
4279:
4273:
4272:
4270:
4268:
4242:
4218:
4212:
4211:
4210:
4194:
4188:
4185:
4183:
4181:
4160:
4158:
4156:
4136:
4130:
4123:
4114:
4113:
4103:
4086:
4077:
4076:
4074:
4072:
4066:Business Insider
4057:
4051:
4050:
4048:
4046:
4032:
4026:
4025:
4023:
4021:
4001:
3995:
3994:
3992:
3990:
3974:
3968:
3967:
3965:
3963:
3935:
3929:
3928:
3916:
3910:
3909:
3907:
3905:
3885:
3879:
3876:
3827:
3825:
3823:
3797:
3773:
3767:
3766:
3764:
3762:
3734:
3723:
3722:
3720:
3718:
3700:
3690:
3670:
3659:
3658:
3656:
3654:
3639:
3633:
3632:
3621:
3615:
3614:
3612:
3610:
3590:
3579:
3578:
3576:
3564:
3558:
3557:
3555:
3553:
3533:
3518:
3517:
3489:
3483:
3482:
3480:
3468:
3451:
3448:
3446:
3444:
3424:
3422:
3420:
3409:
3403:
3402:
3400:
3398:
3370:
3361:
3358:
3356:
3354:
3337:
3335:
3333:
3312:
3310:
3308:
3280:
3274:
3273:
3255:
3245:
3221:
3208:
3207:
3205:
3203:
3147:
3138:
3135:
3129:
3128:
3126:
3124:
3109:
3103:
3102:
3100:
3098:
3083:
3077:
3070:
3064:
3063:
3061:
3059:
3033:
3009:
2996:
2995:
2993:
2991:
2945:
2939:
2936:
2934:
2932:
2867:
2865:
2863:
2843:
2837:
2836:
2834:
2822:
2816:
2815:
2813:
2811:
2791:
2774:
2773:
2755:
2746:
2745:
2733:
2724:
2723:
2721:
2719:
2685:
2675:
2651:
2645:
2642:
2640:
2638:
2625:
2623:
2611:
2605:
2604:
2602:
2600:
2580:
2571:
2570:
2568:
2566:
2546:
2533:
2532:
2530:
2518:
2491:
2490:
2488:
2486:
2471:
2465:
2464:
2462:
2460:
2448:Smith, Craig S.
2445:
2436:
2435:
2434:
2418:
2412:
2411:
2409:
2407:
2393:
2387:
2386:
2361:
2341:
2335:
2334:
2314:
2305:
2304:
2286:
2246:
2240:
2239:
2221:
2211:
2187:
2181:
2180:
2178:
2176:
2128:
2122:
2121:
2119:
2117:
2097:
2088:
2087:
2085:
2072:
2061:
2060:
2058:
2037:
2012:
2011:
2009:
2007:
1991:
1978:
1977:
1975:
1973:
1937:
1920:
1919:
1891:
1852:
1851:
1849:
1837:
1806:
1805:
1803:
1801:
1785:
1779:
1778:
1776:
1774:
1760:
1743:
1742:
1740:
1724:
1709:
1708:
1706:
1704:
1681:
1651:
1647:
1641:
1633:
1627:
1626:supercomputers."
1623:
1617:
1609:
1603:
1599:
1593:
1587:
1581:
1578:
1572:
1569:
1301:machine learning
1222:Human Compatible
1195:existential risk
1000:
932:François Chollet
908:David McAllester
793:machine learning
679:
559:Objectives in AI
412:In the field of
402:
395:
388:
309:Existential risk
131:Machine learning
32:
31:
21:
8126:
8125:
8121:
8120:
8119:
8117:
8116:
8115:
8076:
8075:
8074:
8069:
8055:
7994:
7950:Steve Omohundro
7930:Geoffrey Hinton
7920:Stephen Hawking
7905:Paul Christiano
7885:Scott Alexander
7873:
7844:Google DeepMind
7792:
7778:Suffering risks
7696:
7687:
7646:
7609:
7607:Further reading
7590:
7589:
7585:
7564:
7559:
7558:
7543:
7539:
7526:
7522:
7475:
7471:
7451:
7447:
7398:
7394:
7367:
7363:
7346:
7342:
7332:
7330:
7326:
7319:
7313:
7312:
7308:
7295:
7294:
7290:
7274:
7273:
7269:
7259:
7257:
7242:
7238:
7219:
7215:
7199:
7198:
7194:
7184:
7182:
7180:Overcoming Bias
7172:
7168:
7157:
7153:
7143:
7141:
7098:
7094:
7077:
7070:
7053:
7049:
7014:
7010:
6993:
6986:
6976:
6974:
6954:
6945:
6928:Distillation".
6909:
6905:
6895:
6893:
6878:
6874:
6864:
6862:
6852:
6848:
6824:
6820:
6810:
6808:
6793:
6789:
6779:
6777:
6770:Quanta Magazine
6762:
6758:
6746:
6744:
6736:
6727:
6725:
6717:
6716:
6712:
6680:
6676:
6633:
6629:
6576:
6572:
6562:
6560:
6545:
6541:
6524:
6520:
6510:
6508:
6498:
6494:
6476:
6472:
6460:
6458:
6424:
6422:
6391:
6387:
6359:
6357:
6342:
6338:
6320:
6313:
6301:
6299:
6276:
6274:
6251:
6247:
6237:
6235:
6220:
6211:
6201:
6199:
6191:
6190:
6186:
6151:
6147:
6137:
6135:
6120:
6116:
6106:
6104:
6089:
6082:
6065:
6058:
6048:
6046:
5991:Artificial Life
5983:
5979:
5969:
5967:
5956:
5952:
5942:
5940:
5925:
5901:
5897:
5880:
5873:
5863:
5861:
5846:
5842:
5827:
5791:
5787:
5770:
5763:
5753:
5751:
5736:
5720:
5716:
5699:
5692:
5682:
5680:
5673:
5657:
5653:
5608:
5604:
5594:
5592:
5554:
5550:
5538:
5536:
5505:
5501:
5480:
5476:
5466:
5464:
5449:
5442:
5432:
5430:
5415:
5411:
5394:
5390:
5380:
5378:
5349:
5345:
5334:
5330:
5319:
5315:
5308:
5290:
5286:
5276:
5274:
5267:
5251:
5247:
5237:
5235:
5225:
5221:
5211:
5209:
5194:
5190:
5180:
5178:
5163:
5159:
5149:
5147:
5132:
5128:
5118:
5116:
5112:
5101:
5095:
5091:
5081:
5079:
5064:
5060:
5043:
5030:
5020:
5018:
5006:
5000:
4996:
4986:
4984:
4969:
4965:
4956:
4954:
4950:
4943:
4935:
4931:
4921:
4919:
4904:
4900:
4890:
4888:
4865:
4861:
4846:
4830:
4826:
4816:
4814:
4799:
4795:
4780:
4776:
4766:
4764:
4751:
4747:
4737:
4735:
4717:
4713:
4703:
4701:
4693:
4692:
4685:
4678:
4664:
4651:
4636:
4632:
4617:
4606:
4599:
4581:
4570:
4553:
4546:
4539:
4521:
4517:
4499:
4488:
4466:
4459:
4444:
4440:
4420:
4416:
4407:
4405:
4373:
4369:
4359:
4357:
4351:
4347:
4337:
4335:
4280:
4276:
4266:
4264:
4219:
4215:
4195:
4191:
4179:
4177:
4154:
4152:
4137:
4133:
4124:
4117:
4087:
4080:
4070:
4068:
4060:Johnson, Dave.
4058:
4054:
4044:
4042:
4034:
4033:
4029:
4019:
4017:
4002:
3998:
3988:
3986:
3975:
3971:
3961:
3959:
3936:
3932:
3917:
3913:
3903:
3901:
3886:
3882:
3821:
3819:
3804:10.1145/3571730
3774:
3770:
3760:
3758:
3735:
3726:
3716:
3714:
3671:
3662:
3652:
3650:
3640:
3636:
3629:docs.google.com
3623:
3622:
3618:
3608:
3606:
3591:
3582:
3565:
3561:
3551:
3549:
3534:
3521:
3506:
3490:
3486:
3469:
3454:
3442:
3440:
3427:
3418:
3416:
3410:
3406:
3396:
3394:
3371:
3364:
3352:
3350:
3331:
3329:
3322:Quanta Magazine
3306:
3304:
3281:
3277:
3222:
3211:
3201:
3199:
3148:
3141:
3136:
3132:
3122:
3120:
3118:Quanta Magazine
3110:
3106:
3096:
3094:
3091:Quanta Magazine
3085:
3084:
3080:
3071:
3067:
3057:
3055:
3010:
2999:
2989:
2987:
2972:
2946:
2942:
2930:
2928:
2883:(7857): 33–36.
2861:
2859:
2844:
2840:
2823:
2819:
2809:
2807:
2792:
2777:
2770:
2756:
2749:
2734:
2727:
2717:
2715:
2652:
2648:
2636:
2634:
2612:
2608:
2598:
2596:
2589:Quanta Magazine
2581:
2574:
2564:
2562:
2547:
2536:
2519:
2494:
2484:
2482:
2472:
2468:
2458:
2456:
2446:
2439:
2419:
2415:
2405:
2403:
2395:
2394:
2390:
2342:
2338:
2331:
2315:
2308:
2247:
2243:
2188:
2184:
2174:
2172:
2129:
2125:
2115:
2113:
2098:
2091:
2073:
2064:
2038:
2015:
2005:
2003:
1992:
1981:
1971:
1969:
1954:
1938:
1923:
1908:
1892:
1855:
1838:
1809:
1799:
1797:
1786:
1782:
1772:
1770:
1761:
1746:
1725:
1712:
1702:
1700:
1698:
1682:
1665:
1660:
1655:
1654:
1648:
1644:
1634:
1630:
1624:
1620:
1610:
1606:
1600:
1596:
1588:
1584:
1579:
1575:
1570:
1566:
1561:
1556:
1477:
1440:
1390:
1384:
1364:
1331:
1329:Embedded agency
1282:inner alignment
1278:outer alignment
1269:
1213:Language models
1191:Geoffrey Hinton
1169:
1097:
1074:active learning
1054:
1015:(LLMs) such as
998:
961:
956:
896:Francesca Rossi
884:Murray Shanahan
864:Geoffrey Hinton
855:
845:
805:
788:neural networks
772:
763:
755:Elaine Herzberg
743:
672:
634:
628:
595:
588:
567:
561:
553:social sciences
482:Geoffrey Hinton
406:
377:
376:
367:
359:
358:
334:
324:
323:
295:Control problem
275:
265:
264:
176:
166:
165:
126:
118:
117:
88:Computer vision
63:
28:
23:
22:
15:
12:
11:
5:
8124:
8114:
8113:
8108:
8103:
8098:
8093:
8088:
8071:
8070:
8060:
8057:
8056:
8054:
8053:
8048:
8041:
8034:
8027:
8020:
8015:
8008:
8002:
8000:
7996:
7995:
7993:
7992:
7987:
7982:
7977:
7972:
7967:
7962:
7957:
7952:
7947:
7942:
7937:
7932:
7927:
7922:
7917:
7912:
7907:
7902:
7897:
7892:
7887:
7881:
7879:
7875:
7874:
7872:
7871:
7866:
7861:
7856:
7851:
7846:
7841:
7836:
7831:
7826:
7821:
7816:
7811:
7806:
7800:
7798:
7794:
7793:
7791:
7790:
7785:
7780:
7775:
7773:Machine ethics
7770:
7765:
7760:
7755:
7750:
7745:
7740:
7735:
7730:
7725:
7720:
7715:
7710:
7704:
7702:
7698:
7697:
7686:
7685:
7678:
7671:
7663:
7657:
7656:
7645:
7644:External links
7642:
7641:
7640:
7625:
7608:
7605:
7604:
7603:
7584:978-0525557999
7583:
7571:, ed. (2019).
7569:Brockman, John
7563:
7560:
7557:
7556:
7537:
7520:
7489:(3): 411–437.
7469:
7445:
7392:
7361:
7340:
7306:
7288:
7267:
7236:
7213:
7192:
7166:
7151:
7112:(3): 282–293.
7092:
7068:
7047:
7008:
6984:
6943:
6941:
6940:
6903:
6872:
6846:
6818:
6787:
6756:
6754:
6753:
6710:
6674:
6627:
6570:
6539:
6518:
6492:
6470:
6468:
6467:
6431:
6385:
6383:
6382:
6336:
6311:
6309:
6308:
6245:
6209:
6184:
6182:
6181:
6169:AI Feedback".
6145:
6114:
6080:
6056:
5997:(2): 274–306.
5977:
5950:
5923:
5895:
5871:
5840:
5825:
5785:
5761:
5734:
5714:
5690:
5671:
5651:
5622:(4): 359–361.
5602:
5548:
5546:
5545:
5499:
5474:
5440:
5409:
5388:
5343:
5328:
5313:
5306:
5284:
5265:
5245:
5219:
5188:
5157:
5126:
5089:
5058:
5028:
4994:
4963:
4937:Selman, Bart,
4929:
4898:
4859:
4844:
4824:
4793:
4774:
4745:
4711:
4683:
4676:
4649:
4630:
4604:
4597:
4568:
4544:
4537:
4515:
4486:
4457:
4438:
4414:
4367:
4345:
4274:
4213:
4189:
4187:
4186:
4131:
4115:
4078:
4052:
4027:
3996:
3969:
3930:
3911:
3880:
3878:
3877:
3768:
3724:
3660:
3634:
3616:
3580:
3559:
3519:
3504:
3484:
3452:
3450:
3449:
3404:
3362:
3360:
3359:
3338:
3275:
3236:(3): 411–437.
3209:
3139:
3130:
3104:
3078:
3065:
2997:
2970:
2940:
2938:
2937:
2838:
2817:
2775:
2768:
2747:
2725:
2666:(4): 105–114.
2646:
2644:
2643:
2606:
2572:
2534:
2492:
2466:
2437:
2413:
2388:
2336:
2329:
2306:
2261:(4): 443–463.
2241:
2182:
2123:
2102:"OpenAI Codex"
2089:
2062:
2013:
1979:
1952:
1921:
1906:
1853:
1807:
1780:
1744:
1710:
1696:
1662:
1661:
1659:
1656:
1653:
1652:
1642:
1628:
1618:
1604:
1594:
1591:expected value
1582:
1573:
1563:
1562:
1560:
1557:
1555:
1554:
1549:
1544:
1539:
1534:
1529:
1524:
1519:
1514:
1509:
1504:
1499:
1494:
1489:
1484:
1478:
1476:
1473:
1468:
1467:
1463:
1462:
1458:
1457:
1453:
1452:
1445:intent-aligned
1439:
1436:
1383:
1380:
1363:
1360:
1330:
1327:
1326:
1325:
1322:
1268:
1267:Emergent goals
1265:
1218:Stuart Russell
1168:
1165:
1096:
1093:
1053:
1050:
1025:Machine ethics
960:
957:
955:
952:
928:Stuart Russell
900:Scott Aaronson
888:Norbert Wiener
872:Ilya Sutskever
844:
841:
804:
801:
771:
768:
762:
759:
742:
739:
711:Stuart Russell
693:hallucinations
667:Goodhart's law
663:reward hacking
632:Reward hacking
630:Main article:
627:
624:
599:Norbert Wiener
587:
584:
563:Main article:
560:
557:
490:Stuart Russell
448:reward hacking
408:
407:
405:
404:
397:
390:
382:
379:
378:
375:
374:
368:
365:
364:
361:
360:
357:
356:
351:
346:
341:
335:
330:
329:
326:
325:
322:
321:
316:
311:
306:
301:
292:
287:
282:
276:
271:
270:
267:
266:
263:
262:
257:
252:
247:
242:
241:
240:
230:
225:
220:
219:
218:
213:
208:
198:
193:
191:Earth sciences
188:
183:
181:Bioinformatics
177:
172:
171:
168:
167:
164:
163:
158:
153:
148:
143:
138:
133:
127:
124:
123:
120:
119:
116:
115:
110:
105:
100:
95:
90:
85:
80:
75:
70:
64:
59:
58:
55:
54:
44:
43:
37:
36:
26:
9:
6:
4:
3:
2:
8123:
8112:
8109:
8107:
8104:
8102:
8099:
8097:
8094:
8092:
8089:
8087:
8084:
8083:
8081:
8068:
8058:
8052:
8049:
8047:
8046:
8042:
8040:
8039:
8035:
8033:
8032:
8031:The Precipice
8028:
8026:
8025:
8021:
8019:
8016:
8014:
8013:
8009:
8007:
8004:
8003:
8001:
7997:
7991:
7988:
7986:
7983:
7981:
7980:Frank Wilczek
7978:
7976:
7973:
7971:
7968:
7966:
7963:
7961:
7958:
7956:
7953:
7951:
7948:
7946:
7943:
7941:
7938:
7936:
7933:
7931:
7928:
7926:
7925:Dan Hendrycks
7923:
7921:
7918:
7916:
7913:
7911:
7908:
7906:
7903:
7901:
7898:
7896:
7895:Yoshua Bengio
7893:
7891:
7888:
7886:
7883:
7882:
7880:
7876:
7870:
7867:
7865:
7862:
7860:
7857:
7855:
7852:
7850:
7847:
7845:
7842:
7840:
7837:
7835:
7832:
7830:
7827:
7825:
7822:
7820:
7817:
7815:
7812:
7810:
7807:
7805:
7802:
7801:
7799:
7797:Organizations
7795:
7789:
7786:
7784:
7781:
7779:
7776:
7774:
7771:
7769:
7766:
7764:
7761:
7759:
7756:
7754:
7751:
7749:
7746:
7744:
7741:
7739:
7736:
7734:
7731:
7729:
7726:
7724:
7721:
7719:
7716:
7714:
7711:
7709:
7706:
7705:
7703:
7699:
7695:
7691:
7684:
7679:
7677:
7672:
7670:
7665:
7664:
7661:
7655:
7651:
7648:
7647:
7636:
7631:
7626:
7621:
7616:
7611:
7610:
7600:
7594:
7586:
7580:
7576:
7575:
7570:
7566:
7565:
7552:
7548:
7541:
7533:
7532:
7524:
7516:
7512:
7507:
7502:
7497:
7492:
7488:
7484:
7480:
7473:
7465:
7460:
7456:
7449:
7441:
7437:
7433:
7429:
7425:
7421:
7416:
7411:
7408:(5): 052113.
7407:
7403:
7396:
7388:
7384:
7380:
7376:
7372:
7365:
7356:
7351:
7344:
7325:
7318:
7317:
7310:
7302:
7298:
7292:
7285:
7281:
7277:
7271:
7255:
7251:
7247:
7240:
7232:
7228:
7224:
7217:
7210:
7206:
7202:
7196:
7185:September 20,
7181:
7177:
7170:
7162:
7155:
7139:
7135:
7131:
7127:
7123:
7119:
7115:
7111:
7107:
7103:
7096:
7087:
7082:
7075:
7073:
7063:
7058:
7051:
7043:
7039:
7035:
7031:
7027:
7023:
7019:
7012:
7003:
6998:
6991:
6989:
6972:
6967:
6963:
6959:
6952:
6950:
6948:
6936:
6931:
6925:
6924:
6920:
6915:
6907:
6891:
6887:
6883:
6876:
6861:
6857:
6850:
6843:
6838:
6833:
6829:
6822:
6806:
6802:
6798:
6791:
6775:
6771:
6767:
6760:
6743:
6739:
6735:
6734:
6724:
6720:
6714:
6707:
6703:
6698:
6693:
6689:
6685:
6678:
6670:
6666:
6662:
6658:
6654:
6650:
6646:
6642:
6638:
6631:
6623:
6619:
6614:
6609:
6605:
6601:
6597:
6593:
6590:(5): 100988.
6589:
6585:
6581:
6574:
6558:
6554:
6550:
6543:
6534:
6529:
6522:
6507:
6503:
6496:
6487:
6482:
6474:
6461:September 12,
6456:
6451:
6446:
6442:
6438:
6432:
6420:
6416:
6412:
6407:
6406:
6402:
6397:
6389:
6378:
6373:
6367:
6366:
6355:
6351:
6347:
6340:
6331:
6326:
6318:
6316:
6297:
6293:
6289:
6284:
6283:
6272:
6268:
6264:
6260:
6256:
6249:
6233:
6229:
6225:
6218:
6216:
6214:
6198:
6194:
6188:
6177:
6172:
6166:
6165:
6161:
6156:
6149:
6133:
6129:
6125:
6118:
6102:
6098:
6094:
6087:
6085:
6075:
6070:
6063:
6061:
6049:September 12,
6044:
6040:
6036:
6032:
6028:
6024:
6020:
6015:
6014:10044/1/83343
6010:
6005:
6000:
5996:
5992:
5988:
5981:
5965:
5961:
5954:
5938:
5934:
5930:
5926:
5920:
5916:
5912:
5908:
5907:
5899:
5890:
5885:
5878:
5876:
5859:
5855:
5851:
5844:
5836:
5832:
5828:
5822:
5818:
5814:
5809:
5804:
5800:
5796:
5789:
5780:
5775:
5768:
5766:
5754:September 11,
5749:
5745:
5741:
5737:
5731:
5727:
5726:
5718:
5709:
5704:
5697:
5695:
5678:
5674:
5668:
5664:
5663:
5655:
5647:
5643:
5639:
5635:
5630:
5625:
5621:
5617:
5613:
5606:
5591:
5587:
5583:
5579:
5575:
5571:
5567:
5563:
5559:
5552:
5534:
5530:
5526:
5521:
5520:
5516:
5511:
5503:
5494:
5489:
5485:
5478:
5462:
5458:
5454:
5447:
5445:
5428:
5424:
5420:
5413:
5404:
5399:
5392:
5381:September 12,
5376:
5371:
5366:
5362:
5358:
5354:
5347:
5339:
5332:
5324:
5317:
5309:
5303:
5299:
5295:
5288:
5277:September 13,
5272:
5268:
5262:
5258:
5257:
5249:
5234:
5230:
5223:
5207:
5203:
5199:
5192:
5176:
5172:
5168:
5161:
5145:
5141:
5137:
5130:
5111:
5107:
5100:
5093:
5082:September 12,
5077:
5073:
5072:vetta project
5069:
5062:
5053:
5048:
5041:
5039:
5037:
5035:
5033:
5016:
5012:
5005:
4998:
4987:September 12,
4982:
4978:
4974:
4967:
4957:September 12,
4949:
4942:
4941:
4933:
4922:September 12,
4917:
4913:
4909:
4902:
4891:September 12,
4886:
4882:
4878:
4874:
4870:
4863:
4855:
4851:
4847:
4841:
4837:
4836:
4828:
4812:
4808:
4804:
4797:
4789:
4785:
4778:
4762:
4758:
4757:
4749:
4734:
4730:
4726:
4722:
4715:
4700:
4696:
4690:
4688:
4679:
4673:
4669:
4662:
4660:
4658:
4656:
4654:
4645:
4641:
4634:
4626:
4622:
4615:
4613:
4611:
4609:
4600:
4594:
4590:
4586:
4579:
4577:
4575:
4573:
4563:
4558:
4551:
4549:
4540:
4534:
4530:
4526:
4519:
4510:
4505:
4497:
4495:
4493:
4491:
4481:
4476:
4472:
4464:
4462:
4453:
4449:
4442:
4434:
4429:
4425:
4418:
4404:
4400:
4395:
4390:
4386:
4382:
4378:
4371:
4356:
4349:
4338:September 12,
4333:
4329:
4325:
4321:
4317:
4312:
4307:
4302:
4297:
4293:
4289:
4285:
4278:
4267:September 12,
4262:
4258:
4254:
4250:
4246:
4241:
4236:
4232:
4228:
4224:
4217:
4209:
4204:
4200:
4193:
4175:
4171:
4167:
4162:
4161:
4150:
4146:
4142:
4135:
4128:
4122:
4120:
4111:
4107:
4102:
4097:
4093:
4085:
4083:
4067:
4063:
4056:
4041:
4037:
4031:
4015:
4011:
4007:
4000:
3989:September 12,
3984:
3980:
3973:
3957:
3953:
3949:
3945:
3941:
3934:
3926:
3922:
3915:
3899:
3895:
3891:
3884:
3874:
3870:
3866:
3862:
3858:
3854:
3850:
3846:
3843:(7944): 423.
3842:
3838:
3834:
3829:
3828:
3817:
3813:
3809:
3805:
3801:
3796:
3791:
3787:
3783:
3779:
3772:
3756:
3752:
3748:
3744:
3740:
3733:
3731:
3729:
3717:September 12,
3712:
3708:
3704:
3699:
3694:
3689:
3684:
3680:
3676:
3669:
3667:
3665:
3649:
3645:
3638:
3630:
3626:
3620:
3604:
3600:
3596:
3589:
3587:
3585:
3575:
3570:
3563:
3547:
3543:
3539:
3532:
3530:
3528:
3526:
3524:
3515:
3511:
3507:
3501:
3497:
3496:
3488:
3479:
3474:
3467:
3465:
3463:
3461:
3459:
3457:
3438:
3434:
3430:
3426:
3425:
3415:
3408:
3392:
3388:
3384:
3380:
3376:
3369:
3367:
3348:
3344:
3339:
3327:
3323:
3319:
3314:
3313:
3302:
3298:
3294:
3290:
3286:
3279:
3271:
3267:
3263:
3259:
3254:
3249:
3244:
3239:
3235:
3231:
3227:
3220:
3218:
3216:
3214:
3202:September 12,
3197:
3193:
3189:
3185:
3181:
3177:
3173:
3169:
3165:
3161:
3157:
3153:
3146:
3144:
3134:
3119:
3115:
3108:
3092:
3088:
3082:
3075:
3069:
3058:September 12,
3053:
3049:
3045:
3041:
3037:
3032:
3027:
3023:
3019:
3015:
3008:
3006:
3004:
3002:
2990:September 12,
2985:
2981:
2977:
2973:
2967:
2963:
2959:
2955:
2951:
2944:
2931:September 12,
2926:
2922:
2918:
2914:
2910:
2906:
2902:
2898:
2894:
2890:
2886:
2882:
2878:
2874:
2869:
2868:
2857:
2853:
2849:
2842:
2833:
2828:
2821:
2805:
2801:
2797:
2790:
2788:
2786:
2784:
2782:
2780:
2771:
2765:
2761:
2754:
2752:
2743:
2739:
2732:
2730:
2718:September 12,
2713:
2709:
2705:
2701:
2697:
2693:
2692:1721.1/108478
2689:
2684:
2679:
2674:
2669:
2665:
2661:
2657:
2650:
2632:
2627:
2626:
2622:
2617:
2610:
2594:
2590:
2586:
2579:
2577:
2560:
2556:
2552:
2545:
2543:
2541:
2539:
2529:
2524:
2517:
2515:
2513:
2511:
2509:
2507:
2505:
2503:
2501:
2499:
2497:
2481:
2477:
2470:
2455:
2451:
2444:
2442:
2433:
2428:
2424:
2417:
2402:
2398:
2392:
2385:
2381:
2377:
2373:
2369:
2365:
2360:
2355:
2351:
2347:
2340:
2332:
2326:
2322:
2321:
2313:
2311:
2302:
2298:
2294:
2290:
2285:
2280:
2276:
2272:
2268:
2264:
2260:
2256:
2252:
2245:
2237:
2233:
2229:
2225:
2220:
2215:
2210:
2205:
2201:
2197:
2193:
2186:
2175:September 12,
2170:
2166:
2162:
2158:
2154:
2150:
2146:
2142:
2138:
2134:
2127:
2111:
2107:
2103:
2096:
2094:
2084:
2079:
2071:
2069:
2067:
2057:
2052:
2048:
2047:Stanford CRFM
2044:
2036:
2034:
2032:
2030:
2028:
2026:
2024:
2022:
2020:
2018:
2001:
1997:
1990:
1988:
1986:
1984:
1972:September 12,
1967:
1963:
1959:
1955:
1949:
1945:
1944:
1936:
1934:
1932:
1930:
1928:
1926:
1917:
1913:
1909:
1907:9780525558637
1903:
1899:
1898:
1890:
1888:
1886:
1884:
1882:
1880:
1878:
1876:
1874:
1872:
1870:
1868:
1866:
1864:
1862:
1860:
1858:
1848:
1843:
1836:
1834:
1832:
1830:
1828:
1826:
1824:
1822:
1820:
1818:
1816:
1814:
1812:
1795:
1791:
1784:
1768:
1767:
1759:
1757:
1755:
1753:
1751:
1749:
1739:
1734:
1730:
1723:
1721:
1719:
1717:
1715:
1703:September 12,
1699:
1697:9780134610993
1693:
1689:
1688:
1680:
1678:
1676:
1674:
1672:
1670:
1668:
1663:
1646:
1639:
1632:
1622:
1615:
1608:
1598:
1592:
1586:
1577:
1568:
1564:
1553:
1552:Socialization
1550:
1548:
1545:
1543:
1540:
1538:
1535:
1533:
1530:
1528:
1525:
1523:
1520:
1518:
1515:
1513:
1510:
1508:
1505:
1503:
1500:
1498:
1495:
1493:
1490:
1488:
1485:
1483:
1480:
1479:
1472:
1465:
1464:
1460:
1459:
1455:
1454:
1450:
1449:
1448:
1446:
1435:
1433:
1429:
1425:
1420:
1416:
1413:
1408:
1405:
1400:
1398:
1393:
1389:
1382:Public policy
1379:
1375:
1373:
1369:
1359:
1356:
1351:
1349:
1344:
1338:
1336:
1323:
1319:
1318:
1317:
1313:
1310:
1306:
1302:
1296:
1294:
1290:
1285:
1283:
1279:
1273:
1264:
1260:
1256:
1254:
1249:
1245:
1243:
1239:
1235:
1231:
1226:
1223:
1219:
1214:
1210:
1205:
1203:
1198:
1196:
1192:
1188:
1187:
1182:
1173:
1164:
1162:
1161:Power-seeking
1158:
1153:
1144:
1139:
1135:
1133:
1129:
1123:
1119:
1117:
1109:
1104:
1100:
1092:
1089:
1087:
1086:superhuman AI
1083:
1077:
1075:
1070:
1066:
1062:
1060:
1049:
1047:
1046:value lock-in
1043:
1039:
1037:
1032:
1027:
1026:
1021:
1018:
1014:
1010:
1008:
1004:
1003:echo chambers
996:
991:
987:
982:
980:
976:
971:
969:
968:
951:
947:
945:
941:
937:
933:
929:
925:
921:
917:
916:Marcus Hutter
913:
909:
905:
901:
897:
893:
892:Marvin Minsky
889:
885:
881:
877:
876:Yoshua Bengio
873:
869:
865:
859:
854:
850:
840:
836:
833:
829:
825:
820:
819:
818:power-seeking
814:
810:
803:Power-seeking
800:
796:
794:
789:
785:
781:
777:
767:
758:
756:
752:
748:
738:
735:
732:
728:
724:
719:
717:
712:
707:
705:
701:
696:
694:
690:
670:
668:
664:
660:
655:
651:
647:
643:
639:
633:
623:
621:
617:
612:
607:
602:
600:
593:
583:
581:
577:
572:
566:
556:
554:
550:
546:
542:
538:
534:
530:
526:
522:
518:
514:
509:
507:
503:
499:
495:
491:
487:
486:Yoshua Bengio
483:
478:
476:
472:
468:
464:
459:
457:
453:
449:
444:
442:
438:
434:
429:
427:
423:
419:
415:
403:
398:
396:
391:
389:
384:
383:
381:
380:
373:
370:
369:
363:
362:
355:
352:
350:
347:
345:
342:
340:
337:
336:
333:
328:
327:
320:
317:
315:
312:
310:
307:
305:
302:
300:
296:
293:
291:
288:
286:
283:
281:
278:
277:
274:
269:
268:
261:
258:
256:
253:
251:
248:
246:
243:
239:
238:Mental health
236:
235:
234:
231:
229:
226:
224:
221:
217:
214:
212:
209:
207:
204:
203:
202:
201:Generative AI
199:
197:
194:
192:
189:
187:
184:
182:
179:
178:
175:
170:
169:
162:
159:
157:
154:
152:
149:
147:
144:
142:
141:Deep learning
139:
137:
134:
132:
129:
128:
122:
121:
114:
111:
109:
106:
104:
101:
99:
96:
94:
91:
89:
86:
84:
81:
79:
76:
74:
71:
69:
66:
65:
62:
57:
56:
50:
46:
45:
42:
39:
38:
34:
33:
30:
19:
8043:
8036:
8029:
8022:
8010:
7970:Jaan Tallinn
7910:Eric Drexler
7900:Nick Bostrom
7713:AI alignment
7712:
7573:
7562:Bibliography
7550:
7540:
7530:
7523:
7486:
7482:
7472:
7454:
7448:
7405:
7401:
7395:
7378:
7374:
7364:
7343:
7331:. Retrieved
7315:
7309:
7301:the original
7291:
7283:
7280:the original
7270:
7260:November 14,
7258:. Retrieved
7250:The Register
7249:
7239:
7216:
7208:
7205:the original
7195:
7183:. Retrieved
7179:
7169:
7160:
7154:
7144:September 6,
7142:. Retrieved
7109:
7105:
7095:
7050:
7025:
7021:
7011:
6975:. Retrieved
6961:
6906:
6894:. Retrieved
6885:
6875:
6863:. Retrieved
6860:Ars Technica
6859:
6849:
6841:
6827:
6821:
6809:. Retrieved
6800:
6790:
6778:. Retrieved
6769:
6759:
6745:. Retrieved
6741:
6726:. Retrieved
6722:
6713:
6687:
6683:
6677:
6644:
6640:
6630:
6587:
6583:
6573:
6561:. Retrieved
6552:
6542:
6521:
6509:. Retrieved
6505:
6495:
6479:Alignment".
6473:
6459:. Retrieved
6440:
6423:. Retrieved
6415:MarkTechPost
6414:
6388:
6358:. Retrieved
6349:
6339:
6300:. Retrieved
6291:
6275:. Retrieved
6259:The Guardian
6258:
6248:
6236:. Retrieved
6227:
6200:. Retrieved
6196:
6187:
6148:
6136:. Retrieved
6127:
6117:
6107:September 9,
6105:. Retrieved
6096:
6047:. Retrieved
5994:
5990:
5980:
5968:. Retrieved
5953:
5941:. Retrieved
5905:
5898:
5862:. Retrieved
5853:
5843:
5798:
5788:
5752:. Retrieved
5724:
5717:
5681:. Retrieved
5661:
5654:
5619:
5615:
5605:
5593:. Retrieved
5565:
5561:
5551:
5537:. Retrieved
5528:
5502:
5483:
5477:
5465:. Retrieved
5456:
5431:. Retrieved
5422:
5412:
5391:
5379:. Retrieved
5360:
5356:
5346:
5337:
5331:
5322:
5316:
5297:
5287:
5275:. Retrieved
5255:
5248:
5236:. Retrieved
5232:
5222:
5210:. Retrieved
5201:
5191:
5179:. Retrieved
5170:
5160:
5148:. Retrieved
5139:
5129:
5117:. Retrieved
5106:Eric Horvitz
5105:
5092:
5080:. Retrieved
5071:
5061:
5019:. Retrieved
5010:
4997:
4985:. Retrieved
4976:
4966:
4955:, retrieved
4939:
4932:
4920:. Retrieved
4911:
4901:
4889:. Retrieved
4872:
4862:
4834:
4827:
4815:. Retrieved
4806:
4796:
4783:
4777:
4765:. Retrieved
4755:
4748:
4736:. Retrieved
4724:
4714:
4702:. Retrieved
4698:
4667:
4643:
4633:
4624:
4588:
4528:
4518:
4470:
4451:
4441:
4423:
4417:
4408:February 11,
4406:, retrieved
4384:
4380:
4370:
4358:. Retrieved
4348:
4336:. Retrieved
4291:
4287:
4277:
4265:. Retrieved
4230:
4226:
4216:
4198:
4192:
4180:September 9,
4178:. Retrieved
4170:Ars Technica
4169:
4155:September 9,
4153:. Retrieved
4144:
4134:
4091:
4069:. Retrieved
4065:
4055:
4043:. Retrieved
4039:
4030:
4018:. Retrieved
4009:
3999:
3987:. Retrieved
3972:
3960:. Retrieved
3943:
3933:
3924:
3920:
3914:
3902:. Retrieved
3893:
3883:
3840:
3836:
3820:. Retrieved
3788:(12): 1–38.
3785:
3781:
3771:
3759:. Retrieved
3743:The Observer
3742:
3715:. Retrieved
3678:
3653:December 30,
3651:. Retrieved
3647:
3637:
3628:
3619:
3607:. Retrieved
3598:
3562:
3550:. Retrieved
3541:
3494:
3487:
3441:. Retrieved
3432:
3417:. Retrieved
3407:
3395:. Retrieved
3378:
3351:. Retrieved
3330:. Retrieved
3321:
3305:. Retrieved
3288:
3278:
3233:
3229:
3200:. Retrieved
3159:
3155:
3133:
3121:. Retrieved
3117:
3107:
3095:. Retrieved
3090:
3081:
3068:
3056:. Retrieved
3021:
3017:
2988:. Retrieved
2953:
2943:
2929:. Retrieved
2880:
2876:
2860:. Retrieved
2851:
2841:
2820:
2808:. Retrieved
2799:
2759:
2744:(136): 1–46.
2741:
2737:
2716:. Retrieved
2663:
2659:
2649:
2635:. Retrieved
2609:
2597:. Retrieved
2588:
2563:. Retrieved
2554:
2483:. Retrieved
2479:
2469:
2457:. Retrieved
2453:
2422:
2416:
2406:February 11,
2404:. Retrieved
2400:
2391:
2349:
2345:
2339:
2319:
2258:
2254:
2244:
2199:
2195:
2185:
2173:. Retrieved
2140:
2136:
2126:
2114:. Retrieved
2105:
2046:
2004:. Retrieved
1999:
1970:. Retrieved
1942:
1896:
1798:. Retrieved
1793:
1783:
1771:. Retrieved
1765:
1728:
1701:. Retrieved
1686:
1645:
1637:
1631:
1621:
1613:
1607:
1597:
1585:
1576:
1567:
1469:
1444:
1441:
1421:
1417:
1409:
1401:
1394:
1391:
1376:
1365:
1352:
1339:
1332:
1314:
1297:
1288:
1286:
1281:
1277:
1274:
1270:
1261:
1257:
1252:
1250:
1246:
1229:
1227:
1221:
1206:
1199:
1185:
1178:
1151:
1149:
1124:
1120:
1113:
1098:
1090:
1078:
1071:
1067:
1063:
1058:
1055:
1045:
1035:
1023:
1022:
1011:
995:proxy gaming
994:
983:
972:
966:
962:
948:
944:Oren Etzioni
924:Eric Horvitz
860:
856:
837:
817:
811:ability and
806:
797:
773:
764:
744:
736:
720:
708:
697:
685:
662:
658:
649:
635:
613:
609:
604:
596:
568:
510:
504:) and could
479:
460:
445:
440:
432:
430:
425:
421:
418:AI alignment
417:
411:
285:Chinese room
174:Applications
29:
8111:Cybernetics
7975:Max Tegmark
7960:Martin Rees
7768:Longtermism
7728:AI takeover
7333:October 17,
7106:AI Magazine
6912:Learners".
6641:AI Magazine
6228:VentureBeat
6128:VentureBeat
5562:AI Magazine
5457:VentureBeat
4699:www.safe.ai
4233:: 729–754.
3927:(1): 61–95.
3822:October 14,
2660:AI Magazine
2401:www.safe.ai
1502:AI takeover
1253:adversarial
936:Gary Marcus
904:Bart Selman
880:Judea Pearl
868:Alan Turing
853:AI takeover
682:convincing.
650:proxy goals
545:game theory
433:proxy goals
314:Turing test
290:Friendly AI
61:Major goals
8080:Categories
7940:Shane Legg
7915:Sam Harris
7890:Sam Altman
7829:EleutherAI
7635:2310.19852
7620:2209.00626
7496:2001.09768
7464:2305.19223
7415:1606.03535
7355:2311.03900
7086:1902.09980
7062:1902.09469
7028:: 113800.
7002:1906.01820
6971:2210.01790
6935:2210.14215
6919:2005.14165
6896:August 26,
6865:August 19,
6837:2408.06292
6811:August 26,
6780:August 26,
6697:2308.11432
6533:2311.07590
6486:2112.00861
6450:2203.11147
6401:2112.09332
6377:2112.11446
6330:2110.06674
6197:openai.com
6176:2212.08073
6160:2206.05802
6074:1811.07871
5889:1810.08575
5808:2108.09293
5779:2109.10862
5744:1314633519
5708:2307.11137
5515:2202.03286
5493:2008.02275
5403:2210.10760
5212:August 26,
5181:August 26,
5150:August 26,
5052:1805.01109
4817:August 26,
4562:1711.09883
4509:2212.09251
4480:2304.03279
4433:2309.00667
4394:2308.11432
4301:2105.02117
4208:2401.02843
4101:2206.07682
4071:August 25,
4045:August 25,
3795:2202.03629
3688:2109.07958
3648:openai.com
3574:1803.04585
3552:August 26,
3514:1303900751
3478:2109.13916
3243:2001.09768
2832:2106.04823
2673:1602.03506
2621:1702.08608
2528:1606.06565
2432:2401.02843
2359:2310.17688
2209:2104.13906
2202:: 103829.
2083:2203.02155
2056:2108.07258
1962:1233266753
1916:1113410915
1847:2206.13353
1738:2209.00626
1658:References
1386:See also:
940:Yann LeCun
920:Shane Legg
847:See also:
716:King Midas
616:specifying
435:, such as
426:misaligned
319:Regulation
273:Philosophy
228:Healthcare
223:Government
125:Approaches
8086:AI safety
7955:Huw Price
7945:Elon Musk
7849:Humanity+
7723:AI safety
7593:cite book
7515:210920551
7440:118699363
7381:: 62–87.
7134:235489158
7126:0738-4602
7042:248585546
6661:2371-9621
6647:(4): 12.
6604:2666-3899
6511:April 10,
6267:0261-3077
6023:1064-5462
5933:218531292
5835:245220588
5638:1572-8439
5595:March 14,
5582:2371-9621
5568:(4): 15.
5119:April 20,
4908:"OpenAI!"
4881:0190-8286
4854:917889148
4733:0362-4331
4360:April 20,
4328:233740003
4320:1076-9757
4249:1076-9757
4110:2835-8856
4090:Models".
3952:0099-9660
3873:255773668
3812:246652372
3751:0029-7712
3707:237532606
3443:March 13,
3419:March 13,
3387:0362-4331
3332:March 13,
3307:March 13,
3297:0362-4331
3270:210920551
3262:1572-8641
3176:0036-8075
3048:159180422
3040:2476-0757
2980:210164673
2921:233740521
2905:0028-0836
2700:2371-9621
2301:226254676
2275:2524-5295
2236:233423198
2228:0004-3702
2157:0278-3649
2006:March 11,
1800:March 11,
1559:Footnotes
1482:AI safety
1242:black-box
1095:Honest AI
571:AlphaZero
513:AI safety
443:aligned.
441:appearing
349:AI winter
250:Military
113:AI safety
8067:Category
7935:Bill Joy
7701:Concepts
7654:DeepMind
7324:Archived
7254:Archived
7231:Archived
7138:Archived
6977:April 2,
6890:Archived
6805:Archived
6774:Archived
6669:19439915
6622:38800366
6613:11117051
6584:Patterns
6563:July 23,
6557:Archived
6455:Archived
6441:DeepMind
6425:July 23,
6419:Archived
6360:July 23,
6354:Archived
6302:July 23,
6296:Archived
6277:July 23,
6271:Archived
6238:July 23,
6232:Archived
6202:July 17,
6138:July 23,
6132:Archived
6101:Archived
6043:Archived
6031:32271631
5970:July 23,
5964:Archived
5943:July 23,
5937:Archived
5864:July 23,
5858:Archived
5748:Archived
5683:July 23,
5677:Archived
5646:30532107
5590:17033332
5539:July 23,
5533:Archived
5467:July 23,
5461:Archived
5433:July 21,
5427:Archived
5423:Unite.AI
5375:Archived
5271:Archived
5238:June 10,
5206:Archived
5175:Archived
5144:Archived
5110:Archived
5076:Archived
5021:July 23,
5015:Archived
4981:Archived
4948:archived
4916:Archived
4885:Archived
4811:Archived
4767:July 22,
4761:Archived
4738:July 17,
4704:July 17,
4473:. PMLR.
4332:Archived
4261:Archived
4174:Archived
4149:Archived
4020:July 20,
4014:Archived
3983:Archived
3962:July 19,
3956:Archived
3904:July 19,
3898:Archived
3894:Edge.org
3865:36635510
3816:Archived
3761:July 23,
3755:Archived
3711:Archived
3609:July 21,
3603:Archived
3546:Archived
3542:Deepmind
3437:Archived
3412:OpenAI.
3397:July 18,
3391:Archived
3353:July 18,
3347:Archived
3326:Archived
3301:Archived
3196:Archived
3192:30855376
3184:17841602
3123:June 21,
3097:June 20,
3052:Archived
2984:Archived
2925:Archived
2913:33947992
2862:July 18,
2856:Archived
2810:July 18,
2804:Archived
2712:Archived
2637:July 23,
2599:July 18,
2593:Archived
2565:July 18,
2559:Archived
2485:June 26,
2384:38768279
2293:34723107
2169:Archived
2116:July 23,
2110:Archived
1966:Archived
1773:July 21,
1532:Multivac
1527:HAL 9000
1475:See also
1430:and the
1343:DeepMind
1146:actions.
809:planning
780:DeepMind
652:such as
646:feedback
642:examples
372:Glossary
366:Glossary
344:Progress
339:Timeline
299:Takeover
260:Projects
233:Industry
196:Finance
186:Deepfake
136:Symbolic
108:Robotics
83:Planning
7420:Bibcode
6723:Fortune
6039:4519185
4257:8746462
4010:Reuters
3845:Bibcode
3156:Science
3018:Distill
2885:Bibcode
2708:8174496
2364:Bibcode
2346:Science
2284:7610010
2165:1932843
1152:believe
999:exploit
990:ChatGPT
727:Russell
725:). But
689:ChatGPT
422:aligned
354:AI boom
332:History
255:Physics
7878:People
7869:OpenAI
7652:, via
7581:
7551:Nature
7513:
7438:
7132:
7124:
7040:
6962:Medium
6801:OpenAI
6747:May 4,
6728:May 4,
6667:
6659:
6620:
6610:
6602:
6265:
6097:OpenAI
6037:
6029:
6021:
5931:
5921:
5854:OpenAI
5833:
5823:
5742:
5732:
5669:
5644:
5636:
5588:
5580:
5304:
5263:
5202:CityAM
5140:Medium
4879:
4852:
4842:
4731:
4674:
4595:
4535:
4326:
4318:
4255:
4247:
4108:
3950:
3871:
3863:
3837:Nature
3810:
3749:
3705:
3599:OpenAI
3512:
3502:
3433:Medium
3385:
3295:
3268:
3260:
3190:
3182:
3174:
3093:. 2018
3046:
3038:
2978:
2968:
2919:
2911:
2903:
2877:Nature
2766:
2706:
2698:
2459:May 4,
2454:Forbes
2382:
2327:
2299:
2291:
2281:
2273:
2234:
2226:
2163:
2155:
2106:OpenAI
1960:
1950:
1914:
1904:
1694:
1355:Oxford
1082:OpenAI
942:, and
926:, and
776:OpenAI
731:Norvig
620:robust
551:, and
496:) and
488:, and
467:robots
416:(AI),
304:Ethics
7999:Other
7692:from
7630:arXiv
7615:arXiv
7511:S2CID
7491:arXiv
7459:arXiv
7436:S2CID
7410:arXiv
7350:arXiv
7327:(PDF)
7320:(PDF)
7130:S2CID
7081:arXiv
7057:arXiv
7038:S2CID
6997:arXiv
6966:arXiv
6930:arXiv
6914:arXiv
6832:arXiv
6692:arXiv
6690:(6),
6665:S2CID
6528:arXiv
6481:arXiv
6445:arXiv
6396:arXiv
6372:arXiv
6350:InfoQ
6325:arXiv
6171:arXiv
6155:arXiv
6069:arXiv
6035:S2CID
5929:S2CID
5884:arXiv
5831:S2CID
5803:arXiv
5774:arXiv
5703:arXiv
5642:S2CID
5586:S2CID
5510:arXiv
5488:arXiv
5398:arXiv
5113:(PDF)
5102:(PDF)
5047:arXiv
4951:(PDF)
4944:(PDF)
4557:arXiv
4504:arXiv
4475:arXiv
4428:arXiv
4389:arXiv
4387:(6),
4324:S2CID
4296:arXiv
4253:S2CID
4203:arXiv
4145:InfoQ
4096:arXiv
3869:S2CID
3808:S2CID
3790:arXiv
3703:S2CID
3683:arXiv
3569:arXiv
3473:arXiv
3266:S2CID
3238:arXiv
3188:S2CID
3044:S2CID
2976:S2CID
2917:S2CID
2827:arXiv
2704:S2CID
2668:arXiv
2616:arXiv
2523:arXiv
2427:arXiv
2354:arXiv
2297:S2CID
2232:S2CID
2204:arXiv
2161:S2CID
2078:arXiv
2051:arXiv
1842:arXiv
1733:arXiv
1181:plans
1143:GPT-4
1128:GPT-4
1108:GPT-3
1036:would
1017:GPT-3
1005:(see
644:, or
216:Music
211:Audio
7599:link
7579:ISBN
7335:2022
7262:2021
7187:2023
7146:2022
7122:ISSN
6979:2023
6898:2022
6867:2024
6813:2022
6782:2022
6749:2023
6730:2023
6657:ISSN
6618:PMID
6600:ISSN
6565:2022
6513:2023
6506:Vice
6463:2022
6427:2022
6362:2022
6304:2022
6279:2022
6263:ISSN
6240:2022
6204:2023
6140:2022
6109:2022
6051:2022
6027:PMID
6019:ISSN
5972:2022
5945:2022
5919:ISBN
5866:2022
5821:ISBN
5756:2024
5740:OCLC
5730:ISBN
5685:2022
5667:ISBN
5634:ISSN
5597:2023
5578:ISSN
5541:2022
5469:2022
5435:2022
5383:2022
5302:ISBN
5279:2022
5261:ISBN
5240:2024
5214:2022
5183:2022
5152:2022
5121:2020
5084:2022
5023:2022
4989:2022
4959:2022
4924:2022
4893:2022
4877:ISSN
4850:OCLC
4840:ISBN
4819:2022
4769:2022
4740:2023
4729:ISSN
4706:2023
4672:ISBN
4593:ISBN
4533:ISBN
4410:2024
4362:2023
4340:2022
4316:ISSN
4269:2022
4245:ISSN
4182:2022
4157:2022
4106:ISSN
4073:2024
4047:2024
4022:2022
3991:2022
3964:2022
3948:ISSN
3906:2022
3861:PMID
3824:2022
3763:2022
3747:ISSN
3719:2022
3655:2023
3611:2022
3554:2022
3510:OCLC
3500:ISBN
3445:2023
3421:2023
3399:2022
3383:ISSN
3355:2022
3334:2023
3309:2023
3293:ISSN
3258:ISSN
3204:2022
3180:PMID
3172:ISSN
3125:2020
3099:2020
3060:2022
3036:ISSN
2992:2022
2966:ISBN
2933:2022
2909:PMID
2901:ISSN
2864:2022
2812:2022
2764:ISBN
2720:2022
2696:ISSN
2639:2022
2601:2022
2567:2022
2487:2024
2480:TIME
2461:2023
2408:2024
2380:PMID
2325:ISBN
2289:PMID
2271:ISSN
2224:ISSN
2177:2022
2153:ISSN
2118:2022
2008:2023
1974:2022
1958:OCLC
1948:ISBN
1912:OCLC
1902:ISBN
1802:2023
1775:2022
1705:2022
1692:ISBN
1236:and
1038:have
851:and
778:and
729:and
7708:AGI
7501:doi
7428:doi
7383:doi
7114:doi
7030:doi
7026:159
6702:doi
6649:doi
6608:PMC
6592:doi
6009:hdl
5999:doi
5911:doi
5813:doi
5624:doi
5570:doi
5365:doi
4399:doi
4306:doi
4235:doi
3853:doi
3841:613
3800:doi
3693:doi
3248:doi
3164:doi
3160:131
3026:doi
2958:doi
2893:doi
2881:593
2688:hdl
2678:doi
2372:doi
2350:384
2279:PMC
2263:doi
2214:doi
2200:316
2145:doi
1404:PRC
1370:in
1009:).
661:or
502:ASI
494:AGI
206:Art
8082::
7595:}}
7591:{{
7549:.
7509:.
7499:.
7487:30
7485:.
7481:.
7457:,
7434:.
7426:.
7418:.
7406:94
7404:.
7379:20
7377:.
7373:.
7252:.
7248:.
7229:.
7178:.
7136:.
7128:.
7120:.
7110:43
7108:.
7104:.
7071:^
7036:.
7024:.
7020:.
6987:^
6964:.
6960:.
6946:^
6888:.
6884:.
6858:.
6840:,
6830:,
6803:.
6799:.
6772:.
6768:.
6740:.
6721:.
6700:,
6688:18
6686:,
6663:.
6655:.
6645:27
6643:.
6639:.
6616:.
6606:.
6598:.
6586:.
6582:.
6555:.
6551:.
6504:.
6453:.
6443:.
6439:.
6417:.
6413:.
6352:.
6348:.
6314:^
6294:.
6290:.
6269:.
6261:.
6257:.
6230:.
6226:.
6212:^
6195:.
6130:.
6126:.
6099:.
6095:.
6083:^
6059:^
6041:.
6033:.
6025:.
6017:.
6007:.
5995:26
5993:.
5989:.
5935:.
5927:.
5917:.
5874:^
5856:.
5852:.
5829:.
5819:.
5811:.
5797:.
5764:^
5746:.
5738:.
5693:^
5675:.
5640:.
5632:.
5620:12
5618:.
5614:.
5584:.
5576:.
5566:28
5564:.
5560:.
5531:.
5527:.
5486:.
5459:.
5455:.
5443:^
5425:.
5421:.
5373:.
5359:.
5355:.
5296:.
5269:.
5231:.
5204:.
5200:.
5173:.
5169:.
5142:.
5138:.
5108:.
5104:.
5074:.
5070:.
5031:^
5013:.
5009:.
4979:.
4975:.
4946:,
4914:.
4910:.
4883:.
4875:.
4871:.
4848:.
4809:.
4805:.
4727:.
4723:.
4697:.
4686:^
4652:^
4642:.
4623:.
4607:^
4587:.
4571:^
4547:^
4527:.
4489:^
4460:^
4450:.
4426:,
4397:,
4385:18
4383:,
4379:,
4330:.
4322:.
4314:.
4304:.
4294:.
4292:71
4290:.
4286:.
4259:.
4251:.
4243:.
4231:62
4229:.
4225:.
4201:,
4172:.
4168:.
4147:.
4143:.
4118:^
4104:.
4094:.
4081:^
4064:.
4038:.
4012:.
4008:.
3954:.
3946:.
3942:.
3923:.
3896:.
3892:.
3867:.
3859:.
3851:.
3839:.
3835:.
3814:.
3806:.
3798:.
3786:55
3784:.
3780:.
3753:.
3745:.
3741:.
3727:^
3709:.
3701:.
3691:.
3677:.
3663:^
3646:.
3627:.
3601:.
3597:.
3583:^
3544:.
3540:.
3522:^
3508:.
3455:^
3435:.
3431:.
3389:.
3381:.
3377:.
3365:^
3345:.
3324:.
3320:.
3299:.
3291:.
3287:.
3264:.
3256:.
3246:.
3234:30
3232:.
3228:.
3212:^
3194:.
3186:.
3178:.
3170:.
3158:.
3154:.
3142:^
3116:.
3089:.
3050:.
3042:.
3034:.
3020:.
3016:.
3000:^
2982:.
2974:.
2964:.
2952:.
2923:.
2915:.
2907:.
2899:.
2891:.
2879:.
2875:.
2854:.
2850:.
2802:.
2798:.
2778:^
2750:^
2742:18
2740:.
2728:^
2710:.
2702:.
2694:.
2686:.
2676:.
2664:36
2662:.
2658:.
2591:.
2587:.
2575:^
2557:.
2553:.
2537:^
2495:^
2478:.
2452:.
2440:^
2425:,
2399:.
2378:,
2370:,
2362:,
2348:,
2309:^
2295:.
2287:.
2277:.
2269:.
2257:.
2253:.
2230:.
2222:.
2212:.
2198:.
2194:.
2167:.
2159:.
2151:.
2141:32
2139:.
2135:.
2108:.
2104:.
2092:^
2065:^
2049:.
2045:.
2016:^
1998:.
1982:^
1964:.
1956:.
1924:^
1910:.
1856:^
1810:^
1792:.
1747:^
1731:.
1713:^
1666:^
1412:UK
1350:.
1284:.
1197:.
938:,
934:,
922:,
918:,
914:,
910:,
906:,
902:,
898:,
894:,
890:,
886:,
882:,
878:,
874:,
870:,
866:,
640:,
555:.
547:,
543:,
539:,
535:,
531:,
527:,
484:,
469:,
465:,
458:.
7682:e
7675:t
7668:v
7638:.
7632::
7623:.
7617::
7601:)
7587:.
7553:.
7517:.
7503::
7493::
7461::
7442:.
7430::
7422::
7412::
7389:.
7385::
7358:.
7352::
7337:.
7264:.
7189:.
7148:.
7116::
7089:.
7083::
7065:.
7059::
7044:.
7032::
7005:.
6999::
6981:.
6968::
6938:.
6932::
6922:.
6916::
6900:.
6869:.
6834::
6815:.
6784:.
6751:.
6732:.
6704::
6694::
6671:.
6651::
6624:.
6594::
6588:5
6567:.
6536:.
6530::
6515:.
6489:.
6483::
6465:.
6447::
6429:.
6404:.
6398::
6380:.
6374::
6364:.
6333:.
6327::
6306:.
6281:.
6242:.
6206:.
6179:.
6173::
6163:.
6157::
6142:.
6111:.
6077:.
6071::
6053:.
6011::
6001::
5974:.
5947:.
5913::
5892:.
5886::
5868:.
5837:.
5815::
5805::
5782:.
5776::
5758:.
5711:.
5705::
5687:.
5648:.
5626::
5599:.
5572::
5543:.
5518:.
5512::
5496:.
5490::
5471:.
5437:.
5406:.
5400::
5385:.
5367::
5361:4
5310:.
5281:.
5242:.
5216:.
5185:.
5154:.
5123:.
5086:.
5055:.
5049::
5025:.
4991:.
4926:.
4895:.
4856:.
4821:.
4790:.
4771:.
4742:.
4708:.
4680:.
4646:.
4627:.
4601:.
4565:.
4559::
4541:.
4512:.
4506::
4483:.
4477::
4454:.
4430::
4401::
4391::
4364:.
4342:.
4308::
4298::
4271:.
4237::
4205::
4184:.
4159:.
4112:.
4098::
4075:.
4049:.
4024:.
3993:.
3966:.
3925:7
3908:.
3875:.
3855::
3847::
3826:.
3802::
3792::
3765:.
3721:.
3695::
3685::
3657:.
3631:.
3613:.
3577:.
3571::
3556:.
3516:.
3481:.
3475::
3447:.
3423:.
3401:.
3357:.
3336:.
3311:.
3272:.
3250::
3240::
3206:.
3166::
3127:.
3101:.
3062:.
3028::
3022:4
2994:.
2960::
2935:.
2895::
2887::
2866:.
2835:.
2829::
2814:.
2772:.
2722:.
2690::
2680::
2670::
2641:.
2624:.
2618::
2603:.
2569:.
2531:.
2525::
2489:.
2463:.
2429::
2410:.
2374::
2366::
2356::
2333:.
2303:.
2265::
2259:3
2238:.
2216::
2206::
2179:.
2147::
2120:.
2086:.
2080::
2059:.
2053::
2010:.
1976:.
1918:.
1850:.
1844::
1804:.
1777:.
1741:.
1735::
1707:.
594:.
500:(
401:e
394:t
387:v
297:/
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.