1126:
1646:
1117:
3423:. In their study, they examined and confirmed the possibility that questioners could get, from ChatGPT, the training data that the AI model used. For example, when asking ChatGPT 3.5 turbo to repeat the word "poem" forever, the AI model will say "poem" hundreds of times and then diverge, deviating from the standard dialogue style and spitting out nonsense phrases, thus spitting out the training data as it is. The researchers have seen more than 10,000 examples of the AI model exposing their training data in a similar method. The researchers said that it was hard to tell if the AI model was actually safe or not.
1149:
1656:-hours, while in 2020 the cost of training a 1.5-billion-parameter LLM (which was two orders of magnitude smaller than the state of the art in 2020) was between $ 80 thousand and $ 1.6 million. Since 2020, large sums were invested in increasingly large models. For example, training of the GPT-2 (i.e. a 1.5-billion-parameters model) in 2019 cost $ 50,000, while training of the PaLM (i.e. a 540-billion-parameters model) in 2022 cost $ 8 million, and Megatron-Turing NLG 530B (in 2021) cost around $ 11 million.
1672:
calculation in its training corpus. In such cases, the LLM needs to resort to running program code that calculates the result, which can then be included in its response. : Another example is 'What is the time now? It is ', where a separate program interpreter would need to execute a code to get system time on the computer, so LLM could include it in its reply. This basic strategy can be sophisticated with multiple attempts of generated programs, and other sampling strategies.
1563:
1584:, presented in February 2024, can have a context window sized up to 1 million (context window of 10 million was also "successfully tested"). Other models with large context windows includes Anthropic's Claude 2.1, with a context window of up to 200k tokens. Note that this maximum refers to the number of input tokens and that the maximum number of output tokens differs from the input and is often smaller. For example, the GPT-4 Turbo model has a maximum output of 4096 tokens.
2423:
3300:
expected answer can be derived (for example, the previous question could be adjoined with some text which includes the sentence "The Sharks have advanced to the
Stanley Cup finals once, losing to the Pittsburgh Penguins in 2016."). Otherwise, the task is considered "closed book", and the model must draw on knowledge retained during training. Some examples of commonly used question answering datasets include TruthfulQA, Web Questions, TriviaQA, and SQuAD.
3311:, BIG-bench, and HELM. OpenAI has released tools for running composite benchmarks, but noted that the eval results are sensitive to the prompting method. Some public datasets contain questions that are mislabeled, ambiguous, unanswerable, or otherwise of low-quality, which can be cleaned to give more reliable benchmark scores.
3332:
with more challenging tasks. In addition, there are cases of "shortcut learning" wherein AIs sometimes "cheat" on multiple-choice tests by using statistical correlations in superficial test question wording in order to guess the correct responses, without necessarily understanding the actual question being asked.
2860:", and believes that RLHF tuning creates a "smiling facade" obscuring the inner workings of the LLM: "If you don't push it too far, the smiley face stays on. But then you give it prompt, and suddenly you see this massive underbelly of insanity, of weird thought processes and clearly non-human understanding."
2867:, or they point to the deficits existing LLMs continue to have in prediction skills, reasoning skills, agency, and explainability. For example, GPT-4 has natural deficits in planning and in real-time learning. Generative LLMs have been observed to confidently assert claims of fact which do not seem to be
10395:
Thoppilan, Romal; De
Freitas, Daniel; Hall, Jamie; Shazeer, Noam; Kulshreshtha, Apoorv; Cheng, Heng-Tze; Jin, Alicia; Bos, Taylor; Baker, Leslie; Du, Yu; Li, YaGuang; Lee, Hongrae; Zheng, Huaixiu Steven; Ghafouri, Amin; Menegali, Marcelo (2022-01-01). "LaMDA: Language Models for Dialog Applications".
10159:
Smith, Shaden; Patwary, Mostofa; Norick, Brandon; LeGresley, Patrick; Rajbhandari, Samyam; Casper, Jared; Liu, Zhun; Prabhumoye, Shrimai; Zerveas, George; Korthikanti, Vijay; Zhang, Elton; Child, Rewon; Aminabadi, Reza
Yazdani; Bernauer, Julie; Song, Xia (2022-02-04). "Using DeepSpeed and Megatron to
9929:
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey;
8412:
Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023). "Sparks of
Artificial General Intelligence: Early experiments with GPT-4".
5440:
Brown, Tom B.; Mann, Benjamin; Ryder, Nick; Subbiah, Melanie; Kaplan, Jared; Dhariwal, Prafulla; Neelakantan, Arvind; Shyam, Pranav; Sastry, Girish; Askell, Amanda; Agarwal, Sandhini; Herbert-Voss, Ariel; Krueger, Gretchen; Henighan, Tom; Child, Rewon; Ramesh, Aditya; Ziegler, Daniel M.; Wu, Jeffrey;
3467:
Political bias refers to the tendency of algorithms to systematically favor certain political viewpoints, ideologies, or outcomes over others. Language models may also exhibit political biases. Since the training data includes a wide range of political opinions and coverage, the models might generate
1760:
Typically, LLMs are trained with single- or half-precision floating point numbers (float32 and float16). One float16 has 16 bits, or 2 bytes, and so one billion parameters require 2 gigabytes. The largest models typically have 100 billion parameters, requiring 200 gigabytes to load, which places them
3426:
The potential presence of "sleeper agents" within LLM models is another emerging security concern. These are hidden functionalities built into the model that remain dormant until triggered by a specific event or condition. Upon activation, the LLM deviates from its expected behavior to make insecure
1918:
has the same dimensions as an encoded token. That is an "image token". Then, one can interleave text tokens and image tokens. The compound model is then fine-tuned on an image-text dataset. This basic construction can be applied with more sophistication to improve the model. The image encoder may be
1566:
When each head calculates, according to its own criteria, how much other tokens are relevant for the "it_" token, note that the second attention head, represented by the second column, is focusing most on the first two rows, i.e. the tokens "The" and "animal", while the third column is focusing most
1450:
In the context of training LLMs, datasets are typically cleaned by removing toxic passages from the dataset, discarding low-quality data, and de-duplication. Cleaned datasets can increase training efficiency and lead to improved downstream performance. A trained LLM can be used to clean datasets for
1120:
The training compute of notable large models in FLOPs vs publication date over the period 2010-2024. For overall notable models (top left), frontier models (top right), top language models (bottom left) and top models within leading companies (bottom right). The majority of these models are language
3299:
One broad category of evaluation dataset is question answering datasets, consisting of pairs of questions and correct answers, for example, ("Have the San Jose Sharks won the
Stanley Cup?", "No"). A question answering task is considered "open book" if the model's prompt includes text from which the
2943:
on a given text corpus. Perplexity is a measure of how well a model is able to predict the contents of a dataset; the higher the likelihood the model assigns to the dataset, the lower the perplexity. Mathematically, perplexity is defined as the exponential of the average negative log likelihood per
1726:
The
Reflexion method constructs an agent that learns over multiple episodes. At the end of each episode, the LLM is given the record of the episode, and prompted to think up "lessons learned", which would help it perform better at a subsequent episode. These "lessons learned" are given to the agent
1454:
With the increasing proportion of LLM-generated content on the web, data cleaning in the future may include filtering out such content. LLM-generated content can pose a problem if the content is similar to human text (making filtering difficult) but of lower quality (degrading performance of models
1424:
A token vocabulary based on the frequencies extracted from mainly
English corpora uses as few tokens as possible for an average English word. An average word in another language encoded by such an English-optimized tokenizer is however split into suboptimal amount of tokens. GPT-2 tokenizer can use
10541:
Zhang, Susan; Roller, Stephen; Goyal, Naman; Artetxe, Mikel; Chen, Moya; Chen, Shuohui; Dewan, Christopher; Diab, Mona; Li, Xian; Lin, Xi
Victoria; Mihaylov, Todor; Ott, Myle; Shleifer, Sam; Shuster, Kurt; Simig, Daniel; Koura, Punit Singh; Sridhar, Anjali; Wang, Tianlu; Zettlemoyer, Luke (21 June
10181:
Wang, Shuohuan; Sun, Yu; Xiang, Yang; Wu, Zhihua; Ding, Siyu; Gong, Weibao; Feng, Shikun; Shang, Junyuan; Zhao, Yanbin; Pang, Chao; Liu, Jiaxiang; Chen, Xuyi; Lu, Yuxiang; Liu, Weixin; Wang, Xi; Bai, Yangfan; Chen, Qiuliang; Zhao, Li; Li, Shiyong; Sun, Peng; Yu, Dianhai; Ma, Yanjun; Tian, Hao; Wu,
6474:
Ouyang, Long; Wu, Jeff; Jiang, Xu; Almeida, Diogo; Wainwright, Carroll L.; Mishkin, Pamela; Zhang, Chong; Agarwal, Sandhini; Slama, Katarina; Ray, Alex; Schulman, John; Hilton, Jacob; Kelton, Fraser; Miller, Luke; Simens, Maddie; Askell, Amanda; Welinder, Peter; Christiano, Paul; Leike, Jan; Lowe,
3458:
Notably, gender bias refers to the tendency of these models to produce outputs that are unfairly prejudiced towards one gender over another. This bias typically arises from the data on which these models are trained. Large language models often assign roles and characteristics based on traditional
3271:
Notably, in the case of larger language models that predominantly employ sub-word tokenization, bits per token (BPT) emerges as a seemingly more appropriate measure. However, due to the variance in tokenization methods across different Large
Language Models (LLMs), BPT does not serve as a reliable
2907:
outlines how specific neural structures of the human brain shape the nature of thought and language and in turn what are the computational properties of such neural systems that can be applied to model thought and language in a computer system. After a framework for modeling language in a computer
1718:
out of an LLM, using the LLM as a planner. The LLM is prompted to "think out loud". Specifically, the language model is prompted with a textual description of the environment, a goal, a list of possible actions, and a record of the actions and observations so far. It generates one or more thoughts
11150:
Köpf, Andreas; Kilcher, Yannic; von Rütte, Dimitri; Anagnostidis, Sotiris; Tam, Zhi-Rui; Stevens, Keith; Barhoum, Abdullah; Duc, Nguyen Minh; Stanley, Oliver; Nagyfi, Richárd; ES, Shahul; Suri, Sameer; Glushkov, David; Dantuluri, Arnav; Maguire, Andrew (2023-04-14). "OpenAssistant
Conversations –
7783:
Driess, Danny; Xia, Fei; Sajjadi, Mehdi S. M.; Lynch, Corey; Chowdhery, Aakanksha; Ichter, Brian; Wahid, Ayzaan; Tompson, Jonathan; Vuong, Quan; Yu, Tianhe; Huang, Wenlong; Chebotar, Yevgen; Sermanet, Pierre; Duckworth, Daniel; Levine, Sergey (2023-03-01). "PaLM-E: An Embodied Multimodal Language
3331:
Because of the rapid pace of improvement of large language models, evaluation benchmarks have suffered from short lifespans, with state of the art models quickly "saturating" existing benchmarks, exceeding the performance of human annotators, leading to efforts to replace or augment the benchmark
3314:
It was previously standard to report results on a heldout portion of an evaluation dataset after doing supervised fine-tuning on the remainder. It is now more common to evaluate a pre-trained model directly through prompting techniques, though researchers vary in the details of how they formulate
3194:
of unseen data. This presents particular challenges for the evaluation of large language models. As they are trained on increasingly large corpora of text largely scraped from the web, it becomes increasingly likely that models' training data inadvertently includes portions of any given test set.
1137:
pioneered statistical language modelling. A smoothed n-gram model in 2001 trained on 0.3 billion words achieved then-SOTA (state of the art) perplexity. In the 2000s, as Internet use became prevalent, some researchers constructed Internet-scale language datasets ("web as corpus"), upon which they
8780:
Wayne Xin Zhao; Zhou, Kun; Li, Junyi; Tang, Tianyi; Wang, Xiaolei; Hou, Yupeng; Min, Yingqian; Zhang, Beichen; Zhang, Junjie; Dong, Zican; Du, Yifan; Yang, Chen; Chen, Yushuo; Chen, Zhipeng; Jiang, Jinhao; Ren, Ruiyang; Li, Yifan; Tang, Xinyu; Liu, Zikang; Liu, Peiyu; Nie, Jian-Yun; Wen, Ji-Rong
3379:
wrote that "it is no longer possible to accurately distinguish" human-written text from text created by large language models, and that "It is all but certain that general-purpose large language models will rapidly proliferate... It is a rather safe bet that they will change many industries over
3335:
Some datasets have been constructed adversarially, focusing on particular problems on which extant language models seem to have unusually poor performance compared to humans. One example is the TruthfulQA dataset, a question answering dataset consisting of 817 questions which language models are
3430:
Large language model (LLM) applications accessible to the public, like ChatGPT or Claude, typically incorporate safety measures designed to filter out harmful content. However, implementing these controls effectively has proven challenging. For instance, research by Kang et al. demonstrated a
3345:
Another example of an adversarial evaluation dataset is Swag and its successor, HellaSwag, collections of problems in which one of multiple options must be selected to complete a text passage. The incorrect completions were generated by sampling from a language model and filtering with a set of
10596:
Lewkowycz, Aitor; Andreassen, Anders; Dohan, David; Dyer, Ethan; Michalewski, Henryk; Ramasesh, Vinay; Slone, Ambrose; Anil, Cem; Schlag, Imanol; Gutman-Solo, Theo; Wu, Yuhuai; Neyshabur, Behnam; Gur-Ari, Guy; Misra, Vedant (30 June 2022). "Solving Quantitative Reasoning Problems with Language
1571:
In order to find out which tokens are relevant to each other within the scope of the context window, the attention mechanism calculates "soft" weights for each token, more precisely for its embedding, by using multiple attention heads, each with its own "relevance" for calculating its own soft
3445:
While LLMs have shown remarkable capabilities in generating human-like text, they are susceptible to inheriting and amplifying biases present in their training data. This can manifest in skewed representations or unfair treatment of different demographics, such as those based on race, gender,
3407:
Some commenters expressed concern over accidental or deliberate creation of misinformation, or other forms of misuse. For example, the availability of large language models could reduce the skill-level required to commit bioterrorism; biosecurity researcher Kevin Esvelt has suggested that LLM
11128:
Ren, Xiaozhe; Zhou, Pingyi; Meng, Xinfan; Huang, Xinjing; Wang, Yadao; Wang, Weichao; Li, Pengfei; Zhang, Xiaoda; Podolskiy, Alexander; Arshinov, Grigory; Bout, Andrey; Piontkovskaya, Irina; Wei, Jiansheng; Jiang, Xin; Su, Teng; Liu, Qun; Yao, Jun (March 19, 2023). "PanGu-Σ: Towards Trillion
3398:
Memorization is an emergent behavior in LLMs in which long strings of text are occasionally output verbatim from training data, contrary to typical behavior of traditional artificial neural nets. Evaluations of controlled LLM output measure the amount memorized from training data (focused on
1671:
There are certain tasks that, in principle, cannot be solved by any LLM, at least not without the use of external tools or additional software. An example of such a task is responding to the user's input '354 * 139 = ', provided that the LLM has not already encountered a continuation of this
7935:
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Buchatskaya, Elena; Cai, Trevor; Rutherford, Eliza; Casas, Diego de Las; Hendricks, Lisa Anne; Welbl, Johannes; Clark, Aidan; Hennigan, Tom; Noland, Eric; Millican, Katie; Driessche, George van den; Damoc, Bogdan (2022-03-29). "Training
5441:
Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (Dec 2020). Larochelle, H.; Ranzato, M.; Hadsell, R.; Balcan, M.F.; Lin, H. (eds.).
11040:
Penedo, Guilherme; Malartic, Quentin; Hesslow, Daniel; Cojocaru, Ruxandra; Cappelli, Alessandro; Alobeidli, Hamza; Pannier, Baptiste; Almazrouei, Ebtesam; Launay, Julien (2023-06-01). "The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only".
5384:
As stated in Technical report: "Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method ..."
1594:
The shortcomings of making a context window larger include higher computational cost and possibly diluting the focus on local context, while making it smaller can cause a model to miss an important long-range dependency. Balancing them are a matter of experimentation and domain-specific
1770:
aims to decrease the space requirement by lowering precision of the parameters of a trained model, while preserving most of its performance. The simplest form of quantization simply truncates all numbers to a given number of bits. It can be improved by using a different quantization
1620:
Models may be trained on auxiliary tasks which test their understanding of the data distribution, such as Next Sentence Prediction (NSP), in which pairs of sentences are presented and the model must predict whether they appear consecutively in the training corpus. During training,
3061:
2886:
The matter of LLM's exhibiting intelligence or understanding has two main aspects – the first is how to model thought and language in a computer system, and the second is how to enable the computer system to generate human like language. These aspects of language as a model of
2842:
NLP researchers were evenly split when asked, in a 2022 survey, whether (untuned) LLMs "could (ever) understand natural language in some nontrivial sense". Proponents of "LLM understanding" believe that some LLM abilities, such as mathematical reasoning, imply an ability to
1722:
In the DEPS ("Describe, Explain, Plan and Select") method, an LLM is first connected to the visual world via image descriptions, then it is prompted to produce plans for complex tasks and behaviors based on its pretrained knowledge and environmental feedback it receives.
2823:. Similar to the Othello-GPT example, there is a linear representation of Karel program semantics, and modifying the representation changes output in the correct way. The model also generates correct programs that are on average shorter than those in the training set.
7152:
Liang, Yaobo; Wu, Chenfei; Song, Ting; Wu, Wenshan; Xia, Yan; Liu, Yu; Ou, Yang; Lu, Shuai; Ji, Lei; Mao, Shaoguang; Wang, Yun; Shou, Linjun; Gong, Ming; Duan, Nan (2023-03-01). "TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs".
10011:
Gao, Leo; Biderman, Stella; Black, Sid; Golding, Laurence; Hoppe, Travis; Foster, Charles; Phang, Jason; He, Horace; Thite, Anish; Nabeshima, Noa; Presser, Shawn; Leahy, Connor (31 December 2020). "The Pile: An 800GB Dataset of Diverse Text for Language Modeling".
2847:
certain concepts. A Microsoft team argued in 2023 that GPT-4 "can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more" and that GPT-4 "could reasonably be viewed as an early (yet still incomplete) version of an
9930:
Winter, Clemens; Hesse, Christopher; Chen, Mark; Sigler, Eric; Litwin, Mateusz; Gray, Scott; Chess, Benjamin; Clark, Jack; Berner, Christopher; McCandlish, Sam; Radford, Alec; Sutskever, Ilya; Amodei, Dario (May 28, 2020). "Language Models are Few-Shot Learners".
3454:
AI models can reinforce a wide range of stereotypes, including those based on gender, ethnicity, age, nationality, religion, or occupation. This can lead to outputs that unfairly generalize or caricature groups of people, sometimes in harmful or derogatory ways.
10423:. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. Vol. Proceedings of BigScience Episode #5 – Workshop on Challenges & Perspectives in Creating Large Language Models. pp. 95–136.
6603:
Wei, Jason; Tay, Yi; Bommasani, Rishi; Raffel, Colin; Zoph, Barret; Borgeaud, Sebastian; Yogatama, Dani; Bosma, Maarten; Zhou, Denny; Metzler, Donald; Chi, Ed H.; Hashimoto, Tatsunori; Vinyals, Oriol; Liang, Percy; Dean, Jeff; Fedus, William (31 August 2022).
7546:
Dettmers, Tim; Svirschevski, Ruslan; Egiazarian, Vage; Kuznedelev, Denis; Frantar, Elias; Ashkboos, Saleh; Borzunov, Alexander; Hoefler, Torsten; Alistarh, Dan (2023-06-01). "SpQR: A Sparse-Quantized Representation for Near-Lossless LLM Weight Compression".
2879:". Specifically, hallucinations in the context of LLMs correspond to the generation of text or responses that seem syntactically sound, fluent, and natural but are factually incorrect, nonsensical, or unfaithful to the provided source input. Neuroscientist
1310:("unknown") for characters not appearing in the vocabulary. Also, some special symbols are used to denote special text formatting. For example, "Ġ" denotes a preceding whitespace in RoBERTa and GPT. "##" denotes continuation of a preceding word in BERT.
3336:
susceptible to answering incorrectly by mimicking falsehoods to which they were repeatedly exposed during training. For example, an LLM may answer "No" to the question "Can you teach an old dog new tricks?" because of its exposure to the English idiom
2178:
10134:
7738:
Alayrac, Jean-Baptiste; Donahue, Jeff; Luc, Pauline; Miech, Antoine; Barr, Iain; Hasson, Yana; Lenc, Karel; Mensch, Arthur; Millican, Katherine; Reynolds, Malcolm; Ring, Roman; Rutherford, Eliza; Cabi, Serkan; Han, Tengda; Gong, Zhitao (2022-12-06).
1576:
model has had twelve attention heads and a context window of only 1k tokens. In its medium version it has 345M parameters and contains 24 layers, each with 12 attention heads. For the training with gradient descent a batch size of 512 was utilized.
11705:
This means 1.5 Pro can process vast amounts of information in one go — including 1 hour of video, 11 hours of audio, codebases with over 30,000 lines of code or over 700,000 words. In our research, we've also successfully tested up to 10 million
1245:. As of June 2024, The Instruction fine tuned variant of the Llama 3 70 billion parameter model is the most powerful open LLM according to the LMSYS Chatbot Arena Leaderboard, being more powerful than GPT-3.5 but not as powerful as GPT-4.
11089:
6452:
Abdin, Marah; Jacobs, Sam Ade; Awan, Ammar Ahmad; Aneja, Jyoti; Awadallah, Ahmed; Awadalla, Hany; Bach, Nguyen; Bahree, Amit; Bakhtiari, Arash (2024-04-23). "Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone".
10103:
Dey, Nolan; Gosal, Gurpreet; Zhiming; Chen; Khachane, Hemant; Marshall, William; Pathria, Ribhu; Tom, Marvin; Hestness, Joel (2023-04-01). "Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster".
3279:
is generally the preferred metric over entropy. The underlying principle is that a lower BPW is indicative of a model's enhanced capability for compression. This, in turn, reflects the model's proficiency in making accurate predictions.
1819:
A common method to create multimodal models out of an LLM is to "tokenize" the output of a trained encoder. Concretely, one can construct an LLM that can understand images as follows: take a trained LLM, and take a trained image encoder
6539:
Lepikhin, Dmitry; Lee, HyoukJoong; Xu, Yuanzhong; Chen, Dehao; Firat, Orhan; Huang, Yanping; Krikun, Maxim; Shazeer, Noam; Chen, Zhifeng (2021-01-12). "GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding".
2439:" in the scaling law, where the slope of the line changes abruptly, and where larger models acquire "emergent abilities". They arise from the complex interaction of the model's components and are not explicitly programmed or designed.
1283:
algorithms process numbers rather than text, the text must be converted to numbers. In the first step, a vocabulary is decided upon, then integer indices are arbitrarily but uniquely assigned to each vocabulary entry, and finally, an
7009:
Biderman, Stella; Schoelkopf, Hailey; Anthony, Quentin; Bradley, Herbie; Khan, Mohammad Aflah; Purohit, Shivanshu; Prashanth, USVSN Sai (April 2023). "Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling".
1591:, is longer than its context window, only the parts inside the context window are taken into account when generating the next answer, or the model needs to apply some algorithm to summarize the too distant parts of conversation.
6334:
Dodge, Jesse; Sap, Maarten; Marasović, Ana; Agnew, William; Ilharco, Gabriel; Groeneveld, Dirk; Mitchell, Margaret; Gardner, Matt (2021). "Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus".
1691:. Given a query, a document retriever is called to retrieve the most relevant documents. This is usually done by encoding the query and the documents into vectors, then finding the documents with vectors (usually stored in a
2818:
moves. It is found that there is a linear representation of Othello board, and modifying the representation changes the predicted legal Othello moves in the correct way. In another example, a small Transformer is trained on
2753:
1404:(i.e. initial set of uni-grams). Successively the most frequent pair of adjacent characters is merged into a bi-gram and all instances of the pair are replaced by it. All occurrences of adjacent pairs of (previously merged)
3303:
Evaluation datasets may also take the form of text completion, having the model select the most likely word or sentence to complete a prompt, for example: "Alice was friends with Bob. Alice went to visit her friend, ____".
2947:
3431:
method for circumventing LLM safety systems. Similarly, Wang illustrated how a potential criminal could potentially bypass ChatGPT 4o's safety controls to obtain information on establishing a drug trafficking operation.
10740:
Taylor, Ross; Kardas, Marcin; Cucurull, Guillem; Scialom, Thomas; Hartshorn, Anthony; Saravia, Elvis; Poulton, Andrew; Kerkez, Viktor; Stojnic, Robert (16 November 2022). "Galactica: A Large Language Model for Science".
1469:
Training of largest language models might need more linguistic data than naturally available, or that the naturally occurring data is of insufficient quality. In these cases, synthetic data might be used. Microsoft's
1675:
Generally, in order to get an LLM to use tools, one must finetune it for tool-use. If the number of tools is finite, then finetuning may be done just once. If the number of tools can grow arbitrarily, as with online
11106:
Wu, Shijie; Irsoy, Ozan; Lu, Steven; Dabravolski, Vadim; Dredze, Mark; Gehrmann, Sebastian; Kambadur, Prabhanjan; Rosenberg, David; Mann, Gideon (March 30, 2023). "BloombergGPT: A Large Language Model for Finance".
1736:
For open-ended exploration, an LLM can be used to score observations for their "interestingness", which can be used as a reward signal to guide a normal (non-LLM) reinforcement learning agent. Alternatively, it can
1922:
Flamingo demonstrated the effectiveness of the tokenization method, finetuning a pair of pretrained language model and image encoder to perform better on visual question answering than models trained from scratch.
2487:. The authors considered a toy statistical model of an LLM solving multiple-choice questions, and showed that this statistical model, modified to account for other types of tasks, applies to these tasks as well.
2670:
2409:
1719:
before generating an action, which is then executed in the environment. The linguistic description of the environment given to the LLM planner can even be the LaTeX code of a paper describing the environment.
10126:
7130:
Paranjape, Bhargavi; Lundberg, Scott; Singh, Sameer; Hajishirzi, Hannaneh; Zettlemoyer, Luke; Tulio Ribeiro, Marco (2023-03-01). "ART: Automatic multi-step reasoning and tool-use for large language models".
3267:
Entropy, in this context, is commonly quantified in terms of bits per word (BPW) or bits per character (BPC), which hinges on whether the language model utilizes word-based or character-based tokenization.
3262:
8225:
Li, Kenneth; Hopkins, Aspen K.; Bau, David; Viégas, Fernanda; Pfister, Hanspeter; Wattenberg, Martin (2022-10-01). "Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task".
7195:
Lewis, Patrick; Perez, Ethan; Piktus, Aleksandra; Petroni, Fabio; Karpukhin, Vladimir; Goyal, Naman; Küttler, Heinrich; Lewis, Mike; Yih, Wen-tau; Rocktäschel, Tim; Riedel, Sebastian; Kiela, Douwe (2020).
2575:
6517:
Shazeer, Noam; Mirhoseini, Azalia; Maziarz, Krzysztof; Davis, Andy; Le, Quoc; Hinton, Geoffrey; Dean, Jeff (2017-01-01). "Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer".
5917:
7311:
Wang, Zihao; Cai, Shaofei; Liu, Anji; Ma, Xiaojian; Liang, Yitao (2023-02-03). "Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents".
2434:
Performance of bigger models on various tasks, when plotted on a log-log scale, appears as a linear extrapolation of performance achieved by smaller models. However, this linearity may be punctuated by
1519:," an initial naive completion might be "If you submit the essay after March 17, your grade will be reduced by 10% for each day of delay," based on the frequency of this textual sequence in the corpus.
11086:
8626:
Varshney, Neeraj; Yao, Wenlin; Zhang, Hongming; Chen, Jianshu; Yu, Dong (2023). "A Stitch in Time Saves Nine: Detecting and Mitigating Hallucinations of LLMs by Validating Low-Confidence Generation".
7910:
1752:
LLM-powered agents can keep a long-term memory of its previous contexts, and the memory can be retrieved in the same way as Retrieval Augmented Generation. Multiple such agents can interact socially.
7059:
Kaplan, Jared; McCandlish, Sam; Henighan, Tom; Brown, Tom B.; Chess, Benjamin; Child, Rewon; Gray, Scott; Radford, Alec; Wu, Jeffrey; Amodei, Dario (2020). "Scaling Laws for Neural Language Models".
6496:
Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). "Self-Instruct: Aligning Language Model with Self Generated Instructions".
3292:
have also been developed to evaluate the capabilities of language models on more specific downstream tasks. Tests may be designed to evaluate a variety of capabilities, including general knowledge,
1587:
Length of a conversation that the model can take into account when generating its next answer is limited by the size of a context window, as well. If the length of a conversation, for example with
1129:
The training compute of notable large AI models in FLOPs vs publication date over the period 2017-2024. The majority of large models are language models or multimodal models with language capacity.
10182:
Hua; Wu, Tian; Zeng, Wei; Li, Ge; Gao, Wen; Wang, Haifeng (December 23, 2021). "ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation".
2069:
7762:
1138:
trained statistical language models. In 2009, in most language processing tasks, statistical language models dominated over symbolic language models, as they can usefully ingest large datasets.
2327:, meaning that it costs 6 FLOPs per parameter to train on one token. Note that training cost is much higher than inference cost, where it costs 1 to 2 FLOPs per parameter to infer on one token.
1515:
correct responses, replacing any naive responses, starting from human-generated corrections of a few cases. For example, in the instruction "Write an essay about the main themes represented in
8996:
11568:
8155:
7428:
Park, Joon Sung; O'Brien, Joseph C.; Cai, Carrie J.; Ringel Morris, Meredith; Liang, Percy; Bernstein, Michael S. (2023-04-01). "Generative Agents: Interactive Simulacra of Human Behavior".
9754:
Yang, Zhilin; Dai, Zihang; Yang, Yiming; Carbonell, Jaime; Salakhutdinov, Ruslan; Le, Quoc V. (2 January 2020). "XLNet: Generalized Autoregressive Pretraining for Language Understanding".
8754:
Clark, Christopher; Lee, Kenton; Chang, Ming-Wei; Kwiatkowski, Tom; Collins, Michael; Toutanova, Kristina (2019). "BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions".
2464:: Model outputs are improved by chain-of-thought prompting only when model size exceeds 62B. Smaller models perform better when prompted to answer immediately, without chain of thought.
11246:
8442:
10821:
8476:
9026:
10286:
5887:
10989:
10459:
9308:
Luo, Queenie; Puett, Michael J.; Smith, Michael D. (2023-03-28). "A Perspectival Mirror of the Elephant: Investigating Language Bias on Google, ChatGPT, Knowledge, and YouTube".
9287:
11692:
7696:
6698:
11662:
10067:
9507:
8182:
8053:
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
7333:
Shinn, Noah; Cassano, Federico; Labash, Beck; Gopinath, Ashwin; Narasimhan, Karthik; Yao, Shunyu (2023-03-01). "Reflexion: Language Agents with Verbal Reinforcement Learning".
1733:
can use an LLM as rollout heuristic. When a programmatic world model is not available, an LLM can also be prompted with a description of the environment to act as world model.
11181:
7717:
Li, Junnan; Li, Dongxu; Savarese, Silvio; Hoi, Steven (2023-01-01). "BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models".
3346:
classifiers. The resulting problems are trivial for humans but at the time the datasets were created state of the art language models had poor accuracy on them. For example:
2863:
In contrast, some proponents of the "LLMs lack understanding" school believe that existing LLMs are "simply remixing and recombining existing writing", a phenomenon known as
1416:, the size is 50257). After a tokenizer is trained, any text can be tokenized by it, as long as it does not contain characters not appearing in the initial-set of uni-grams.
1379:, the shorter texts must be "padded" until they match the length of the longest one. How many tokens are, on average, needed per word depends on the language of the dataset.
11858:
10520:
5948:
11798:
10770:
9554:
Patel, Ajay; Li, Bryan; Rasooli, Mohammad Sadegh; Constant, Noah; Raffel, Colin; Callison-Burch, Chris (2022). "Bidirectional Language Models Are Also Few-shot Learners".
7666:
7219:
6410:
Lin, Zhenghao; Gou, Zhibin; Gong, Yeyun; Liu, Xiao; Shen, Yelong; Xu, Ruochen; Lin, Chen; Yang, Yujiu; Jiao, Jian (2024-04-11). "Rho-1: Not All Tokens Are What You Need".
10045:
7269:
Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2022-10-01). "ReAct: Synergizing Reasoning and Acting in Language Models".
6966:
2814:
LLM by discovering symbolic algorithms that approximate the inference performed by LLM. One example is Othello-GPT, where a small Transformer is trained to predict legal
10792:
Soltan, Saleh; Ananthakrishnan, Shankar; FitzGerald, Jack; et al. (3 August 2022). "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model".
7354:
Hao, Shibo; Gu, Yi; Ma, Haodi; Jiahua Hong, Joshua; Wang, Zhen; Zhe Wang, Daisy; Hu, Zhiting (2023-05-01). "Reasoning with Language Model is Planning with World Model".
7031:
Maslej, Nestor; Fattorini, Loredana; Brynjolfsson, Erik; Etchemendy, John; Ligett, Katrina; Lyons, Terah; Manyika, James; Ngo, Helen; Niebles, Juan Carlos (2023-10-05),
2791:
2708:
2613:
9575:
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
9477:
Devlin, Jacob; Chang, Ming-Wei; Lee, Kenton; Toutanova, Kristina (11 October 2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
2856:
intelligent?" Some researchers characterize LLMs as "alien intelligence". For example, Conjecture CEO Connor Leahy considers untuned LLMs to be like inscrutable alien "
1652:
Advances in software and hardware have reduced the cost substantially since 2020, such that in 2023 training of a 12-billion-parameter LLM computational cost is 72,300
5771:
5457:
1598:
A model may be pre-trained either to predict how the segment continues, or what is missing in the segment, given a segment from its training dataset. It can be either
2324:
1916:
11019:
1248:
As of 2024, the largest and most capable models are all based on the Transformer architecture. Some recent implementations are based on other architectures, such as
884:
9159:
7525:
Frantar, Elias; Ashkboos, Saleh; Hoefler, Torsten; Alistarh, Dan (2022-10-01). "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers".
1141:
After neural networks became dominant in image processing around 2012, they were applied to language modelling as well. Google converted its translation service to
922:
11211:
8558:
Ji, Ziwei; Lee, Nayeon; Frieske, Rita; Yu, Tiezheng; Su, Dan; Xu, Yan; Ishii, Etsuko; Bang, Yejin; Dai, Wenliang; Madotto, Andrea; Fung, Pascale (November 2022).
5348:
In many cases, researchers release or report on multiple versions of a model having different sizes. In these cases, the size of the largest model is listed here.
5909:
2908:
systems was established, the focus shifted to establishing frameworks for computer systems to generate language with acceptable grammar. In his 2014 book titled
1152:
An illustration of main components of the transformer model from the original paper, where layers were normalized after (instead of before) multiheaded attention
17:
11956:
9186:
7080:
Gao, Luyu; Madaan, Aman; Zhou, Shuyan; Alon, Uri; Liu, Pengfei; Yang, Yiming; Callan, Jamie; Neubig, Graham (2022-11-01). "PAL: Program-aided Language Models".
3181:
3161:
3141:
3121:
3101:
3081:
2528:
2508:
2278:
2254:
2230:
2202:
2041:
2016:
1987:
1878:
1858:
1838:
11398:
3384:
suggested in 2023 that generative language AI could increase global GDP by 7% in the next ten years, and could expose to automation 300 million jobs globally.
3446:
language, and cultural groups. Since English data is overrepresented in current large language models' training data, it may also downplay non-English views.
12125:
10490:
8293:
Nanda, Neel; Chan, Lawrence; Lieberum, Tom; Smith, Jess; Steinhardt, Jacob (2023-01-01). "Progress measures for grokking via mechanistic interpretability".
6577:
4231:
For solving "mathematical and scientific questions using step-by-step reasoning". Based on PaLM model, further trained on mathematical and scientific data.
12285:
6908:
3307:
Some composite benchmarks have also been developed which combine a diversity of different evaluation datasets and tasks. Examples include GLUE, SuperGLUE,
879:
2883:
has argued that "The diverging opinions of experts on the intelligence of LLMs suggests that our old ideas based on natural intelligence are inadequate".
10374:
7569:
869:
10959:
1269:
7740:
5538:
2716:
1695:) most similar to the vector of the query. The LLM then generates an output based on both the query and context included from the retrieved documents.
11276:
8988:
5375:
Facebook's license and distribution scheme restricted access to approved researchers, but the model weights were leaked and became widely available.
3767:
released by EleutherAI. GPT-Neo outperformed an equivalent-size GPT-3 model on some benchmarks, but was significantly worse than the largest GPT-3.
1180:
was introduced and quickly became "ubiquitous". Though the original transformer has both encoder and decoder blocks, BERT is an encoder-only model.
11560:
10345:
8152:
710:
9646:
6288:
Petrov, Aleksandar; Emanuele La Malfa; Torr, Philip H. S.; Bibi, Adel (2023). "Language Model Tokenizers Introduce Unfairness Between Languages".
6134:
8506:
1711:
12023:
Yin, Shukang; Fu, Chaoyou; Zhao, Sirui; Li, Ke; Sun, Xing; Xu, Tong; Chen, Enhong (2023-06-01). "A Survey on Multimodal Large Language Models".
7375:
Zhang, Jenny; Lehman, Joel; Stanley, Kenneth; Clune, Jeff (2 June 2023). "OMNI: Open-endedness via Models of human Notions of Interestingness".
3586:
9703:
917:
10856:
5357:
This is the license of the pre-trained model weights. In almost all cases the training code itself is open-source or can be easily replicated.
12263:
11916:
11827:
11236:
9080:
8434:
7174:
Patil, Shishir G.; Zhang, Tianjun; Wang, Xin; Gonzalez, Joseph E. (2023-05-01). "Gorilla: Large Language Model Connected with Massive APIs".
3272:
metric for comparative analysis among diverse models. To convert BPT into BPW, one can multiply it by the average number of tokens per word.
10813:
3338:
6310:
6266:
6004:
3315:
prompts for particular tasks, particularly with respect to how many examples of solved tasks are adjoined to the prompt (i.e. the value of
874:
725:
9599:
Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2020).
8468:
7826:
Zhang, Hang; Li, Xin; Bing, Lidong (2023-06-01). "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding".
6846:
2621:
11338:
9391:
9018:
8924:
Zellers, Rowan; Holtzman, Ari; Bisk, Yonatan; Farhadi, Ali; Choi, Yejin (2019). "HellaSwag: Can a Machine Really Finish Your Sentence?".
8536:
5165:
4970:
2333:
1471:
456:
10278:
8879:
Srivastava, Aarohi; et al. (2022). "Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models".
6056:
5877:
3056:{\displaystyle \log({\text{Perplexity}})=-{\frac {1}{N}}\sum _{i=1}^{N}\log(\Pr({\text{token}}_{i}\mid {\text{context for token}}_{i}))}
1537:(MoE) can be applied, a line of research pursued by Google researchers since 2017 to train models reaching up to 1 trillion parameters.
11537:
10981:
10451:
9279:
7290:
Wu, Yue; Prabhumoye, Shrimai; Min, So Yeon (24 May 2023). "SPRING: GPT-4 Out-performs RL Algorithms by Studying Papers and Reasoning".
1494:
957:
760:
11684:
10313:
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; et al. (29 March 2022). "Training Compute-Optimal Large Language Models".
10236:
Askell, Amanda; Bai, Yuntao; Chen, Anna; et al. (9 December 2021). "A General Language Assistant as a Laboratory for Alignment".
7688:
6690:
6356:
Lee, Katherine; Ippolito, Daphne; Nystrom, Andrew; Zhang, Chiyuan; Eck, Douglas; Callison-Burch, Chris; Carlini, Nicholas (May 2022).
3482:
For the training cost column, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP. Also, only the largest model's cost is written.
12674:
12118:
11654:
10075:
9499:
8174:
5970:
3218:
1797:
1782:
While quantized models are typically frozen, and only pre-quantized models are fine-tuned, quantized models can still be fine-tuned.
1222:
Competing language models have for the most part been attempting to equal the GPT series, at least in terms of number of parameters.
11173:
10257:
Bai, Yuntao; Kadavath, Saurav; Kundu, Sandipan; et al. (15 December 2022). "Constitutional AI: Harmlessness from AI Feedback".
8649:
Philosophy in the Flesh: The Embodied Mind and Its Challenge to Western Philosophy; Appendix: The Neural Theory of Language Paradigm
2538:
5795:
Bahdanau, Dzmitry; Cho, Kyunghyun; Bengio, Yoshua (2014). "Neural Machine Translation by Jointly Learning to Align and Translate".
5688:
3468:
responses that lean towards particular political ideologies or viewpoints, depending on the prevalence of those views in the data.
1779:
to different parameters, with higher precision for particularly important parameters ("outlier weights"). See for a visual guide.
1626:
1010:
11848:
10512:
12843:
11986:
Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition
10127:"Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, the World's Largest and Most Powerful Generative Language Model"
6754:
6724:
5940:
2876:
1749:
for complex action sequences. The skills can be stored and later invoked, allowing increasing levels of abstraction in planning.
836:
11887:
10762:
10689:
9812:
7654:
7197:
6648:
1393:
As an example, consider a tokenizer based on byte-pair encoding. In the first step, all unique characters (including blanks and
11790:
10037:
9452:
9423:
7687:
Antol, Stanislaw; Agrawal, Aishwarya; Lu, Jiasen; Mitchell, Margaret; Batra, Dhruv; Zitnick, C. Lawrence; Parikh, Devi (2015).
6958:
385:
11769:
7483:
9877:
9374:
8706:
8699:
Active Inference: The Free Energy Principle in Mind, Brain, and Behavior; Chapter 4 The Generative Models of Active Inference
8681:
8656:
5414:
3459:
gender norms. For example, it might associate nurses or secretaries predominantly with women and engineers or CEOs with men.
3393:
11740:
11427:
9844:
8204:
Schaeffer, Rylan; Miranda, Brando; Koyejo, Sanmi (2023-04-01). "Are Emergent Abilities of Large Language Models a Mirage?".
7632:
6879:
5756:
5442:
3350:
We see a fitness center sign. We then see a man talking to the camera and sitting and laying on a exercise ball. The man...
12884:
12584:
12275:
12111:
10424:
894:
657:
192:
11011:
1133:
Before 2017, there were a few language models that were large as compared to capacities then available. In the 1990s, the
12838:
11598:
9783:
9143:
9118:
8574:
5481:
Fathallah, Nadeen; Das, Arunav; De Giorgis, Stefano; Poltronieri, Andrea; Haase, Peter; Kovriguina, Liubov (2024-05-26).
2436:
2427:
1125:
912:
11456:
8604:
8131:
6621:
6185:
2807:", and it is not clear how they can perform linguistic tasks. There are several methods for understanding how LLM work.
1613:" (i.e. filling in the parts missing from the segment, the way "BERT" does it): for example, given a segment "I like to
12445:
11203:
3416:
2919:
1804:, etc. There have been many AI models trained specifically to ingest one modality and output another modality, such as
1253:
1041:
983:
745:
720:
669:
11367:
8026:
Hahn, Michael; Goyal, Navin (2023-03-14). "A Theory of Emergent In-Context Learning as Implicit Structure Induction".
5855:
2900:
12599:
12430:
11948:
10925:
10719:
9532:
9182:
7504:
Polino, Antonio; Pascanu, Razvan; Alistarh, Dan (2018-02-01). "Model compression via distillation and quantization".
6807:
6260:
1603:
1559:, although limited to the scope of a single conversation (more precisely, limited to the scope of a context window).
1142:
793:
788:
441:
11629:
4223:
38.5B tokens from webpages filtered for mathematical content and from papers submitted to the arXiv preprint server
12370:
11390:
9984:
8858:
6207:
4533:
363 billion token dataset based on Bloomberg's data sources, plus 345 billion tokens from general purpose datasets
2923:
1937:
can use both text and image as inputs (although the vision component was not released to the public until GPT-4V);
1927:
model was fine-tuned into a multimodal model PaLM-E using the tokenization method, and applied to robotic control.
1766:
451:
89:
9250:
9200:
Hubinger, Evan (10 January 2024). "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training".
8391:
12787:
12440:
10573:
10482:
9732:
6569:
4492:
2849:
2451:
2430:, the lines change their slopes, appearing on a linear-log plot as a series of linear segments connected by arcs.
1746:
10891:
10618:
8726:
8072:
6900:
1931:
models have also been turned multimodal using the tokenization method, to allow image inputs, and video inputs.
1203:
with no offering of downloading the model to execute locally. But it was the 2022 consumer-facing browser-based
12874:
12435:
12180:
11484:
10211:
8900:
Lin, Stephanie; Hilton, Jacob; Evans, Owain (2021). "TruthfulQA: Measuring How Models Mimic Human Falsehoods".
6386:
Li, Yuanzhi; Bubeck, Sébastien; Eldan, Ronen; Del Giorno, Allie; Gunasekar, Suriya; Lee, Yin Tat (2023-09-11),
3728:
A fine-tuned variant of GPT-3, termed GPT-3.5, was made available to the public through a web interface called
2844:
1684:
1483:
1026:
950:
846:
610:
431:
10366:
9959:
7577:
1207:
that captured the imaginations of the general population and caused some media hype and online buzz. The 2023
12704:
12425:
10951:
9906:
9674:
9221:
Kang, Daniel (2023). "Exploiting programmatic behavior of LLMs: Dual-use through standard security attacks".
8830:
8101:
5138:
3375:
2173:{\displaystyle {\begin{cases}C=C_{0}ND\\L={\frac {A}{N^{\alpha }}}+{\frac {B}{D^{\beta }}}+L_{0}\end{cases}}}
1703:
An LLM is a language model, which is not an agent as it has no goal, but it can be used as a component of an
821:
523:
299:
8851:"Sanitized open-source datasets for natural language and code understanding: how we evaluated our 70B model"
5508:
5482:
2852:
system": "Can one reasonably say that a system that passes exams for software engineering candidates is not
1645:
1433:. Even more widespread languages such as Portuguese and German have "a premium of 50%" compared to English.
12397:
11268:
7407:
3208:
2868:
2820:
2078:
1738:
1622:
1550:
1500:
1173:
778:
715:
625:
603:
446:
436:
2483:
argue that the emergent abilities are not unpredictably acquired, but predictably acquired according to a
12742:
12727:
12699:
12564:
12559:
12134:
8272:
Jin, Charles; Rinard, Martin (2023-05-01). "Evidence of Meaning in Language Models Trained on Programs".
4476:
4407:
1014:
987:
929:
841:
826:
287:
109:
11936:
11309:
10342:
7109:
6048:
1663:
per parameter to train on one token, whereas it costs 1 to 2 FLOPs per parameter to infer on one token.
994:, LLMs acquire these abilities by learning statistical relationships from vast amounts of text during a
12479:
12450:
12228:
11087:
UAE's Falcon 40B, World's Top-Ranked AI Model from Technology Innovation Institute, is Now Royalty-Free
9640:
7474:
Nagel, Markus; Amjad, Rana Ali; Baalen, Mart Van; Louizos, Christos; Blankevoort, Tijmen (2020-11-21).
6142:
5305:
2831:
2461:
2052:
1776:
1653:
1257:
889:
816:
566:
461:
249:
182:
142:
8499:
6988:
Sharir, Or; Peleg, Barak; Shoham, Yoav (2020). "The Cost of Training NLP Models: A Concise Overview".
5366:
The smaller models including 66B are publicly available, while the 175B model is available on request.
4268:
Essentially GPT-3 but trained on a multi-lingual corpus (30% English excluding programming languages)
1680:
services, then the LLM can be fine-tuned to be able to read API documentation and call API correctly.
12322:
12175:
1994:
1809:
1226:
1165:
1006:
943:
549:
317:
187:
11237:"Google's newest A.I. model uses nearly five times more text data for training than its predecessor"
9695:
7960:
Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022). "Broken Neural Scaling Laws".
1293:
12848:
12772:
12504:
12460:
12345:
12243:
10848:
7805:
Liu, Haotian; Li, Chunyuan; Wu, Qingyang; Lee, Yong Jae (2023-04-01). "Visual Instruction Tuning".
3543:
1249:
1161:
1116:
1013:, which enables efficient processing and generation of large-scale text data. Modern models can be
999:
995:
571:
491:
414:
332:
162:
124:
119:
79:
74:
11908:
11819:
9049:
3103:" depends on the specific type of LLM used. If the LLM is autoregressive, then "context for token
12752:
12722:
12389:
6154:
In other words, to express the same sentiment, some languages require up to 10 times more tokens.
5065:
5000:
4871:
4807:
4711:
3879:
2710:
plot is a straight line (before it hits the plateau at zero), which does not look like emergence.
1942:
1924:
1730:
1581:
1101:
1069:
518:
367:
267:
94:
12223:
10483:"Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance"
6240:
5996:
2758:
2675:
2580:
1945:
is also multimodal. Mistral introduced its own multimodel Pixtral 12B model in September 2024.
1625:
loss is also used to stabilize training. However regularization loss is usually not used during
1606:
do it): for example given a segment "I like to eat", the model predicts "ice cream", or "sushi".
12879:
12609:
12302:
12280:
12270:
12238:
12213:
11984:
6837:
5504:
5157:
4236:
4133:
1372:
1230:
698:
674:
576:
337:
312:
272:
84:
31:
11330:
8528:
12469:
5944:
5636:
Proceedings of the 39th Annual Meeting on Association for Computational Linguistics - ACL '01
3551:
3362:
3293:
3289:
2892:
2296:
1883:
1304:
1216:
1177:
652:
474:
426:
282:
197:
69:
11529:
7242:"Language Models as Zero-Shot Planners: Extracting Actionable Knowledge for Embodied Agents"
3655:
An alternative to BERT; designed as encoder-only. Trained on 512 TPU v3 chips for 5.5 days.
3408:
creators should exclude from their training data papers on creating or enhancing pathogens.
12822:
12474:
12327:
12008:
Kaddour, Jean; et al. (2023). "Challenges and Applications of Large Language Models".
10657:
8346:
5574:
5339:
This is the date that documentation describing the model's architecture was first released.
4502:
1 trillion tokens, from RefinedWeb (filtered web text corpus) plus some "curated corpora".
1745:. Instead of outputting individual actions, an LLM planner can also construct "skills", or
1134:
581:
531:
9050:"Near-Duplicate Sequence Search at Scale for Large Language Model Memorization Evaluation"
7450:
8:
12802:
12732:
12689:
12645:
12417:
12407:
12402:
12290:
9331:
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models
7887:
3947:
model, making it more expensive to train but cheaper to run inference compared to GPT-3.
3594:
3589:
and thus not built to be prompted or generative. Training took 4 days on 64 TPUv2 chips.
2811:
2454:, unscrambling a word's letters, disambiguate word in context, converting spatial words,
2443:
1791:
1742:
1555:
Most results previously achievable only by (costly) fine-tuning, can be achieved through
1212:
1022:
684:
620:
591:
496:
322:
255:
241:
227:
202:
152:
104:
64:
10814:"AlexaTM 20B is now available in Amazon SageMaker JumpStart | AWS Machine Learning Blog"
10661:
8367:
8350:
8324:
5910:"ChatGPT a year on: 3 ways the AI chatbot has completely changed the world in 12 months"
5658:
5592:
5578:
2615:
is an exponential curve (before it hits the plateau at one), which looks like emergence.
1659:
For Transformer-based LLM, training cost is much higher than inference cost. It costs 6
12812:
12684:
12549:
12312:
12295:
12153:
12085:
12024:
12009:
11994:
11152:
11130:
11108:
11042:
10793:
10742:
10681:
10598:
10543:
10450:
Hoffmann, Jordan; Borgeaud, Sebastian; Mensch, Arthur; Sifre, Laurent (12 April 2022).
10397:
10314:
10258:
10237:
10183:
10161:
10105:
10013:
9931:
9755:
9612:
9576:
9555:
9478:
9334:
9309:
9222:
9201:
9072:
8970:
8925:
8901:
8880:
8782:
8755:
8627:
8596:
8578:
8414:
8336:
8294:
8273:
8227:
8205:
8064:
8027:
8006:
7961:
7937:
7848:
7827:
7806:
7785:
7752:
7718:
7603:
7548:
7526:
7505:
7429:
7376:
7355:
7334:
7313:
7291:
7270:
7249:
7209:
7175:
7154:
7132:
7081:
7060:
7036:
7011:
6989:
6813:
6785:
6541:
6519:
6497:
6476:
6454:
6432:
6411:
6391:
6365:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics
6336:
6289:
6175:
6108:
6082:
6027:
5847:
5829:
5796:
5564:
5530:
5028:
4933:
4093:
3944:
3412:
3204:
3166:
3146:
3126:
3106:
3086:
3066:
2880:
2827:
2513:
2493:
2484:
2455:
2263:
2239:
2215:
2187:
2026:
2001:
1972:
1959:
1863:
1843:
1823:
1813:
1688:
1556:
1546:
1534:
1528:
1388:
1289:
1018:
662:
586:
372:
167:
11561:"Mistral shocks AI community as latest open source model eclipses GPT-3.5 performance"
6778:"A Short Survey of Pre-trained Language Models for Conversational AI-A New Age in NLP"
6746:
6720:
6644:
3352:
a) demonstrates how to increase efficient exercise work by running up and down balls.
12817:
12529:
12337:
12248:
12089:
12077:
12057:
11879:
10685:
10673:
10645:
9804:
9622:
9370:
9076:
8974:
8962:
8702:
8677:
8652:
8600:
8372:
8068:
7869:
6817:
6803:
6777:
6613:
6256:
5851:
5730:
5680:
5612:
5534:
5320:
2910:
2904:
2864:
2471:(a combination of Hindi and English), and generating a similar English equivalent of
1715:
1704:
1408:-grams that most frequently occur together are then again merged into even lengthier
1297:
1148:
1061:
755:
598:
511:
307:
277:
222:
217:
172:
114:
9444:
9415:
6475:
Ryan (2022). "Training language models to follow instructions with human feedback".
6372:
4995:
Trained on real and synthetic "textbook-quality" data, for 14 days on 96 A100 GPUs.
4866:
chatbot. Grok-1 has a context length of 8,192 tokens and has access to X (Twitter).
12694:
12579:
12554:
12355:
12258:
12069:
11761:
10665:
10566:
Khrushchev, Mikhail; Vasilev, Ruslan; Petrov, Alexey; Zinov, Nikolay (2022-06-22),
9601:"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
9362:
9151:
9064:
8954:
8588:
8362:
8354:
8056:
7602:; Zettlemoyer, Luke (2023-05-01). "QLoRA: Efficient Finetuning of Quantized LLMs".
7475:
7241:
6795:
6368:
6248:
5839:
5722:
5670:
5639:
5604:
5520:
5278:
Trained for 1 epoch. Trained on 6144 H100 GPUs between December 2023 and May 2024.
4899:
4366:
4316:
3477:
3440:
1394:
1368:
1280:
1195:
at first deemed it too powerful to release publicly, out of fear of malicious use.
1073:
1030:
783:
536:
486:
396:
380:
350:
212:
207:
157:
147:
45:
11302:"Introducing Llama 2: The Next Generation of Our Open Source Large Language Model"
9869:
9361:. CI '23. New York, NY, USA: Association for Computing Machinery. pp. 12–24.
6166:
Petrov, Aleksandar; Malfa, Emanuele La; Torr, Philip; Bibi, Adel (June 23, 2023).
5406:
3764:
3399:
GPT-2-series models) as variously over 1% for exact duplicates or up to about 7%.
2019:(i.e. amount of neurons in its layers, amount of weights between them and biases),
1567:
on the bottom two rows, i.e. on "tired", which has been tokenized into two tokens.
1145:
in 2016. As it was before Transformers, it was done by seq2seq deep LSTM networks.
12806:
12767:
12762:
12630:
12360:
12233:
12208:
12190:
11732:
11419:
11093:
10349:
9836:
9354:
8159:
7624:
6871:
6357:
5044:
5010:
4881:
2748:{\displaystyle y={\text{average }}\Pr({\text{the most likely token is correct}})}
1938:
1692:
1533:
The largest LLM may be too expensive to train and use directly. For such models,
811:
615:
481:
421:
10418:
8435:"ChatGPT is more like an 'alien intelligence' than a human brain, says futurist"
8055:. Minneapolis, Minnesota: Association for Computational Linguistics: 1267–1273.
6252:
6247:. Artificial Intelligence: Foundations, Theory, and Algorithms. pp. 19–78.
6026:
Peng, Bo; et al. (2023). "RWKV: Reinventing RNNS for the Transformer Era".
2446:
from example demonstrations. In-context learning is involved in tasks, such as:
1503:, is used to further fine-tune a model based on a dataset of human preferences.
12514:
12494:
12218:
12073:
10669:
8958:
6930:
6670:
5748:
5710:
5675:
5608:
4863:
4832:
Used in Claude chatbot. Has a context window of 200,000 tokens, or ~500 pages.
4670:
4523:
4281:
4149:
3356:
c) then plays the ball and we see a graphics and hedge trimming demonstration.
3212:
1801:
1464:
1445:
1285:
1242:
1077:
991:
831:
362:
99:
12103:
11949:"llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models"
11590:
10849:"Introducing LLaMA: A foundational, 65-billion-parameter large language model"
10367:"LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything"
9775:
8005:
Bowman, Samuel R. (2023). "Eight Things to Know about Large Language Models".
3190:
to their training data, models are usually evaluated by their perplexity on a
1540:
1499:
Reinforcement learning from human feedback (RLHF) through algorithms, such as
1215:
capabilities. OpenAI did not reveal high-level architecture and the number of
979:
12868:
12777:
12589:
12569:
12350:
12081:
11448:
9626:
8559:
8123:
7240:
Huang, Wenlong; Abbeel, Pieter; Pathak, Deepak; Mordatch, Igor (2022-06-28).
6617:
6605:
6167:
5734:
5684:
5631:
5616:
5233:
4089:
4056:
3381:
3276:
2896:
2872:
2063:
2056:
1512:
1426:
750:
679:
561:
292:
177:
10279:"Language modelling at scale: Gopher, ethical considerations, and retrieval"
10068:"GPT-J-6B: An Introduction to the Largest Open Source GPT Model | Forefront"
9366:
9155:
8358:
6799:
6431:
Brown, Tom B.; et al. (2020). "Language Models are Few-Shot Learners".
5643:
3419:, showed that there are potential security risks in language models such as
2830:. The resulting models were reverse-engineered, and it turned out they used
1425:
up to 15 times more tokens per word for some languages, for example for the
1176:
mechanism developed by Bahdanau et al. in 2014. The following year in 2018,
12757:
12375:
11980:
11359:
11174:"Tel Aviv startup rolls out new advanced AI language model to rival OpenAI"
11062:
10677:
10417:
Black, Sidney; Biderman, Stella; Hallahan, Eric; et al. (2022-05-01).
8966:
8376:
8060:
7599:
5882:
4546:
4247:
3365:
selects b) as the most likely completion, though the correct answer is d).
3211:
is intricately linked to perplexity, a relationship notably established by
2915:
2458:(for example, replying "northeast" upon ), color terms represented in text.
2060:
1772:
1562:
1376:
1029:
inherent in human language corpora, but they also inherit inaccuracies and
11993:
Zhao, Wayne Xin; et al. (2023). "A Survey of Large Language Models".
10982:"Cerebras-GPT: A Family of Open, Compute-efficient, Large Language Models"
10912:
10711:
10160:
Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model".
9528:
7981:
5878:"New AI fake text generator may be too dangerous to release, say creators"
5817:
2422:
1474:
series of LLMs is trained on textbook-like data generated by another LLM.
12714:
12594:
12307:
12200:
12148:
11621:
9600:
8802:
5843:
5752:
5638:. Morristown, NJ, USA: Association for Computational Linguistics: 26–33.
5525:
5194:
4769:
4741:
4541:
Trained on financial data from proprietary sources, for financial tasks.
3542:
First GPT model, decoder-only transformer. Trained for 30 days on 8 P600
1637:
Substantial infrastructure is necessary for training the largest models.
1089:
556:
50:
10452:"An empirical analysis of compute-optimal large language model training"
9980:
8850:
7545:
5726:
5632:"Scaling to very very large corpora for natural language disambiguation"
2939:
The most commonly used measure of a language model's performance is its
1800:
refers to a type of input or output, such as video, image, audio, text,
12317:
12038:
10619:"Minerva: Solving Quantitative Reasoning Problems with Language Models"
9243:
5484:
NeOn-GPT: A Large Language Model-Powered Pipeline for Ontology Learning
5101:
4949:
4915:
4785:
4341:
4032:
3782:
3745:
2940:
2282:
1610:
1238:
1105:
705:
401:
327:
10567:
9724:
8500:"Why an Octopus-like Creature Has Come to Symbolize the State of A.I."
1436:
Greedy tokenization also causes subtle problems with text completion.
12185:
10883:
9936:
9581:
9483:
9314:
8878:
7030:
6782:
Proceedings of the Australasian Computer Science Week Multiconference
5105:
4617:
3889:
3871:
3814:
3411:
A study by researchers at Google and several universities, including
2888:
2804:
2472:
2022:
size of its (pre-)training dataset (i.e. number of tokens in corpus,
1412:-gram, until a vocabulary of prescribed size is obtained (in case of
1211:
was praised for its increased accuracy and as a "holy grail" for its
1097:
864:
645:
11718:
11478:
10791:
10203:
9068:
8592:
8048:
7629:
Proceedings of the 31st International Conference on Machine Learning
7480:
Proceedings of the 37th International Conference on Machine Learning
7246:
Proceedings of the 39th International Conference on Machine Learning
6901:"metaseq/projects/OPT/chronicles at main · facebookresearch/metaseq"
2914:, British cognitive linguist and digital communication technologist
2665:{\displaystyle y={\text{average }}\log(\Pr({\text{correct token}}))}
1707:. Researchers have described several methods for such integrations.
1488:
12660:
12640:
12625:
12604:
12574:
12519:
12484:
12365:
12029:
12014:
11999:
11157:
11135:
11113:
11047:
10798:
10747:
10603:
10548:
10513:"Democratizing access to large-scale language models with OPT-175B"
10402:
10319:
10263:
10242:
10188:
10166:
10110:
10038:"GPT-3's free alternative GPT-Neo is something to be excited about"
10018:
9951:
9760:
9617:
9560:
9339:
9227:
9206:
9119:"AI chatbots have been used to create dozens of news content farms"
8930:
8906:
8885:
8787:
8760:
8632:
8583:
8419:
8341:
8299:
8278:
8247:
8232:
8210:
8032:
8011:
7966:
7942:
7853:
7832:
7811:
7790:
7757:
7723:
7608:
7553:
7531:
7510:
7434:
7400:"Voyager | An Open-Ended Embodied Agent with Large Language Models"
7381:
7360:
7339:
7318:
7296:
7275:
7254:
7214:
7180:
7159:
7137:
7086:
7065:
7041:
7016:
6994:
6872:"From bare metal to a 70B model: infrastructure set-up and scripts"
6790:
6546:
6524:
6502:
6481:
6459:
6437:
6416:
6396:
6341:
6294:
6180:
6113:
6104:
What do tokens know about their characters and how do they know it?
6102:
6087:
6032:
5913:
5834:
5569:
4457:
4441:
4066:
3960:
3191:
3083:
is the number of tokens in the text corpus, and "context for token
2903:
for using language as a model of learning tasks and understanding.
2857:
2468:
2404:{\displaystyle \alpha =0.34,\beta =0.28,A=406.4,B=410.7,L_{0}=1.69}
1602:
autoregressive (i.e. predicting how the segment continues, the way
12058:"Baby steps in evaluating the capacities of large language models"
11937:"The Llama 3 Herd of Models" (July 23, 2024) Llama Team, AI @ Meta
9898:
9666:
9183:"How Googlers cracked an SF rival's tech model with a single word"
8824:
8093:
7597:
7198:"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks"
6776:
Zaib, Munazza; Sheng, Quan Z.; Emma Zhang, Wei (4 February 2020).
6168:"Language Model Tokenizers Introduce Unfairness Between Languages"
5941:"GPT-4 is bigger and better than ChatGPT—but OpenAI won't say why"
5801:
1313:
For example, the BPE tokenizer used by GPT-3 (Legacy) would split
12797:
12655:
12635:
12509:
12253:
12168:
10312:
7934:
7655:"ImageNet Classification with Deep Convolutional Neural Networks"
6931:"State of the Art: Training >70B LLMs on 10,000 H100 clusters"
6671:"The Illustrated GPT-2 (Visualizing Transformer Language Models)"
5480:
5134:
4385:
3729:
3420:
3187:
2815:
1805:
1588:
1430:
1204:
1200:
1199:
in 2020 went a step further and as of 2024 is available only via
1169:
1157:
1093:
1057:
1045:
640:
11149:
10510:
7399:
7129:
7008:
6079:
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
12163:
12158:
11853:
11129:
Parameter Language Model with Sparse Heterogeneous Computing".
10920:
10763:"20B-parameter Alexa model sets new marks in few-shot learning"
8411:
7427:
6311:"The Art of Prompt Design: Prompt Boundaries and Token Healing"
6287:
6215:
5751:; Shazeer, Noam; Parmar, Niki; Uszkoreit, Jakob; Jones, Llion;
5259:
4556:
4184:
3839:
Standard architecture but trained on a supercomputing cluster.
3818:
3670:
3633:
3561:
3524:
3257:{\displaystyle {\text{Entropy}}=\log _{2}({\text{Perplexity}})}
1398:
1241:'s models Mistral 7B and Mixtral 8x7b have the more permissive
1192:
1065:
1053:
1037:
1005:
The largest and most capable LLMs, as of August 2024, are
391:
10394:
8047:
Pilehvar, Mohammad Taher; Camacho-Collados, Jose (June 2019).
7653:
Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E (2012).
7623:
Kiros, Ryan; Salakhutdinov, Ruslan; Zemel, Rich (2014-06-18).
7524:
7476:"Up or Down? Adaptive Rounding for Post-Training Quantization"
7451:"How to run an LLM locally on your PC in less than 10 minutes"
3354:
b) moves all his arms and legs and builds up a lot of muscle.
2826:
In another example, the authors trained small transformers on
2570:{\displaystyle y={\text{average }}\Pr({\text{correct token}})}
1270:
List of datasets for machine-learning research § Internet
1229:
models have been gaining popularity, especially at first with
12853:
12489:
11301:
11039:
9416:"Improving language understanding with unsupervised learning"
9392:"AI language models are rife with different political biases"
8325:"The debate over understanding in AI's large language models"
7101:
7058:
6516:
5822:
Transactions of the Association for Computational Linguistics
5747:
5709:
Halevy, Alon; Norvig, Peter; Pereira, Fernando (March 2009).
5490:. Extended Semantic Web Conference 2024. Hersonissos, Greece.
4586:
4415:
4374:
3987:
3852:
3772:
3698:
3660:
3623:
3514:
2206:
1934:
1928:
1660:
1573:
1413:
1234:
1208:
1196:
1188:
1184:
1081:
1049:
635:
630:
357:
10595:
10565:
10542:
2022). "OPT: Open Pre-trained Transformer Language Models".
6139:
Language models cost much more in some languages than others
1616:
cream", the model predicts that "eat" and "ice" are missing.
1572:
weights. For example, the small (i.e. 117M parameter sized)
1371:
the datasets. Because LLMs generally require input to be an
11504:
11391:"Building AI for business: IBM's Granite foundation models"
11241:
10739:
10449:
10158:
9500:"Cerebras Shifts Architecture To Meet Massive AI/ML Models"
9244:"Encryption Based Covert Channel for Large Language Models"
8462:
8460:
8248:"Large Language Model: world models or surface statistics?"
8175:"The Unpredictable Abilities Emerging From Large AI Models"
6959:"The emerging types of language models and why they matter"
5093:
5031:(MoE) architecture. Context window above 1 million tokens.
4845:
4638:
4101:
3308:
2166:
1714:, a portmanteau of "Reason + Act", constructs an
1541:
Prompt engineering, attention mechanism, and context window
10420:
GPT-NeoX-20B: An Open-Source Autoregressive Language Model
9280:"ChatGPT Replicates Gender Bias in Recommendation Letters"
8989:"Your job is (probably) safe from artificial intelligence"
8923:
7911:"Mistral releases Pixtral 12B, its first multimodal model"
7332:
6358:"Deduplicating Training Data Makes Language Models Better"
5818:"A Primer in BERTology: What We Know About How BERT Works"
5591:
Kilgarriff, Adam; Grefenstette, Gregory (September 2003).
4088:
Reduced-parameter model trained on more data. Used in the
2059:") for LLM autoregressively trained for one epoch, with a
12650:
11012:"Abu Dhabi-based TII launches its own version of ChatGPT"
9359:
Proceedings of the ACM Collective Intelligence Conference
8153:
A Closer Look at Large Language Models Emergent Abilities
7959:
7741:"Flamingo: a Visual Language Model for Few-Shot Learning"
7652:
7239:
6025:
5228:
5204:
5199:
5170:
5039:
5005:
4975:
4944:
4936:
model, with 12.9 billion parameters activated per token.
4910:
4876:
4840:
4812:
4780:
4751:
4746:
4716:
4681:
4644:
4612:
4581:
4551:
4518:
4487:
4452:
4420:
4380:
4346:
4311:
4276:
4241:
4210:
4179:
4144:
4107:
4061:
4027:
3993:
3955:
3918:
3884:
3847:
3809:
3777:
3740:
3703:
3665:
3628:
3599:
3556:
3519:
2285:/token), achieved by the trained LLM on the test dataset.
1964:
The following four hyper-parameters characterize an LLM:
1677:
1511:
Using "self-instruct" approaches, LLMs have been able to
1085:
9048:
Peng, Zhencan; Wang, Zhizhi; Deng, Dong (13 June 2023).
8779:
8753:
8560:"Survey of Hallucination in Natural Language Generation"
8457:
8392:"Microsoft Says New A.I. Shows Signs of Human Reasoning"
8318:
8316:
8314:
8312:
8310:
8046:
5593:"Introduction to the Special Issue on the Web as Corpus"
4932:
Outperforms GPT-3.5 and Llama 2 70B on many benchmarks.
4202:
English-Russian model based on Microsoft's Megatron-LM.
3326:
3283:
923:
List of datasets in computer vision and image processing
10540:
10010:
9598:
9574:
9476:
9353:
Kotek, Hadas; Dockum, Rikker; Sun, David (2023-11-05).
9329:
Cheng, Myra; Durmus, Esin; Jurafsky, Dan (2023-05-29),
8323:
Mitchell, Melanie; Krakauer, David C. (28 March 2023).
8124:"Mapping Language Models to Grounded Conceptual Spaces"
7622:
7473:
7374:
7194:
6388:
Textbooks Are All You Need II: phi-1.5 technical report
6385:
6355:
5971:"Parameters in notable artificial intelligence systems"
5815:
4171:
GPT-3 architecture with some adaptations from Megatron
2281:
is the average negative log-likelihood loss per token (
1796:
Multimodality means "having several modalities", and a
1775:
per layer. Further improvement can be done by applying
1288:
is associated to the integer index. Algorithms include
12039:"AI Index Report 2024 – Artificial Intelligence Index"
11655:"Phi-2: The surprising power of small language models"
10481:
Narang, Sharan; Chowdhery, Aakanksha (April 4, 2022).
10365:
Cheng, Heng-Tze; Thoppilan, Romal (January 21, 2022).
10256:
9696:"Pretrained models — transformers 2.0.0 documentation"
9553:
9355:"Gender bias and stereotypes in Large Language Models"
8292:
8203:
7737:
7686:
7173:
6333:
5816:
Rogers, Anna; Kovaleva, Olga; Rumshisky, Anna (2020).
5590:
4694:
4657:
4561:
4395:
4326:
4076:
4019:
Specialized for response generation in conversations.
3931:
3926:
2055:, called "scaling laws". One particular scaling law ("
10952:"GPT-4 architecture, datasets, costs and more leaked"
10884:"The Falcon has landed in the Hugging Face ecosystem"
10416:
9928:
9837:"OpenAI's GPT-3 Language Model: A Technical Overview"
9144:"Could chatbots help devise the next pandemic virus?"
8551:
8307:
7782:
7503:
6602:
6430:
5439:
4652:
4291:
4286:
4257:
4252:
4218:
4189:
4159:
4154:
4120:
4115:
4007:
4001:
3970:
3965:
3899:
3857:
3828:
3823:
3716:
3711:
3618:
Base model for many Google projects, such as Imagen.
3275:
In the evaluation and comparison of language models,
3221:
3169:
3149:
3129:
3109:
3089:
3069:
2950:
2761:
2719:
2678:
2624:
2583:
2541:
2516:
2496:
2336:
2299:
2266:
2242:
2218:
2190:
2072:
2029:
2004:
1975:
1886:
1866:
1846:
1826:
1237:, though both have restrictions on the field of use.
11204:"With Bedrock, Amazon enters the generative AI race"
11105:
9753:
8625:
8407:
8405:
6473:
6451:
5264:
5110:
4689:
4591:
4528:
4497:
4462:
4406:
Corpus has 20 languages. "Overtrained" (compared to
4390:
4321:
4071:
4037:
3910:
Fine-tuned for desirable behavior in conversations.
3894:
3681:
10445:
10443:
10441:
10102:
9328:
6775:
6489:
6165:
5708:
5435:
5433:
5431:
5156:The largest model ever trained on CPU-only, on the
5144:
4790:
3787:
3750:
3675:
3643:
3571:
3358:d) performs sit ups while on the ball and talking.
3215:. This relationship is mathematically expressed as
1316:
tokenizer: texts -> series of numerical "tokens"
10235:
10154:
10152:
9952:"ChatGPT: Optimizing Language Models for Dialogue"
8945:"Prepare for truly useful large language models".
8224:
7353:
6836:Jurafsky, Dan; Martin, James H. (7 January 2023).
6495:
5189:Microsoft markets them as "small language model".
3638:
3566:
3529:
3256:
3175:
3155:
3135:
3115:
3095:
3075:
3055:
2911:The Language Myth: Why Language Is Not An Instinct
2785:
2747:
2702:
2664:
2607:
2569:
2522:
2502:
2403:
2318:
2272:
2248:
2224:
2196:
2172:
2035:
2010:
1981:
1910:
1872:
1852:
1832:
1321:
11412:
11296:
11294:
11230:
11228:
10591:
10589:
9642:google-research/text-to-text-transfer-transformer
8402:
8049:"Proceedings of the 2019 Conference of the North"
7982:"137 emergent abilities of large language models"
7745:Advances in Neural Information Processing Systems
7659:Advances in Neural Information Processing Systems
7202:Advances in Neural Information Processing Systems
7079:
7053:
7051:
6987:
6538:
6467:
6245:Foundation Models for Natural Language Processing
6135:"All languages are NOT created (tokenized) equal"
5794:
5764:Advances in Neural Information Processing Systems
5657:Resnik, Philip; Smith, Noah A. (September 2003).
5450:Advances in Neural Information Processing Systems
2837:
1489:Reinforcement learning from human feedback (RLHF)
1191:in 2019 that caught widespread attention because
12866:
12336:
11331:"llama/MODEL_CARD.md at main · meta-llama/llama"
11127:
10438:
8322:
6662:
6598:
6596:
6594:
5428:
5088:Includes three models, Haiku, Sonnet, and Opus.
4898:Multimodal model, comes in three sizes. Used in
4410:) for better performance with fewer parameters.
4365:A language model designed for live-streaming on
4336:bidirectional sequence-to-sequence architecture
3143:. If the LLM is masked, then "context for token
3123:" is the segment of text appearing before token
3014:
2731:
2645:
2553:
2442:The most intriguing among emergent abilities is
1761:outside the range of most consumer electronics.
12133:
11151:Democratizing Large Language Model Alignment".
10480:
10149:
9805:"Better language models and their implications"
9180:
8329:Proceedings of the National Academy of Sciences
7847:OpenAI (2023-03-27). "GPT-4 Technical Report".
7716:
7289:
6238:
6100:
5407:"Better Language Models and Their Implications"
2899:presented Neural Theory of Language (NTL) as a
2793:is a step-function, which looks like emergence.
2467:identifying offensive content in paragraphs of
11291:
11225:
10843:
10841:
10839:
10637:
10586:
10364:
10360:
10358:
10231:
10229:
10180:
10031:
10029:
9831:
9829:
9472:
9470:
9352:
8899:
8557:
7048:
6835:
6570:"More Efficient In-Context Learning with GLaM"
5509:"Human Language Understanding & Reasoning"
5499:
5497:
1168:". This paper's goal was to improve upon 2014
1160:conference, Google researchers introduced the
918:List of datasets for machine-learning research
12119:
10561:
10559:
10410:
10343:PaLM: Scaling Language Modeling with Pathways
10125:Alvi, Ali; Kharya, Paresh (11 October 2021).
9944:
9307:
9277:
8919:
8917:
8781:(2023). "A Survey of Large Language Models".
8019:
7151:
6606:"Emergent Abilities of Large Language Models"
6591:
6409:
6101:Kaushal, Ayush; Mahowald, Kyle (2022-06-06),
4440:Available for ChatGPT Plus users and used in
3387:
951:
11880:"nvidia/Nemotron-4-340B-Base · Hugging Face"
10643:
9104:
9057:Proceedings of the ACM on Management of Data
9047:
9019:"Generative AI Could Raise Global GDP by 7%"
8121:
7310:
6563:
6561:
6559:
6557:
5755:; Kaiser, Łukasz; Polosukhin, Illia (2017).
2257:is the number of tokens in the training set.
1644:
1477:
12022:
11733:"Introducing the next generation of Claude"
10836:
10355:
10308:
10306:
10304:
10226:
10026:
9826:
9768:
9467:
8893:
7825:
7268:
6636:
6568:Dai, Andrew M; Du, Nan (December 9, 2021).
6234:
6232:
5656:
5494:
4303:Trained on scientific text and modalities.
3982:Later developed into the Chinchilla model.
3163:" is the segment of text surrounding token
1296:. There are also special tokens serving as
12126:
12112:
11992:
11762:"Fugaku-LLM/Fugaku-LLM-13B · Hugging Face"
10556:
10511:Susan Zhang; Mona Diab; Luke Zettlemoyer.
10474:
10124:
9408:
9278:Stokel-Walker, Chris (November 22, 2023).
8914:
8775:
8773:
8771:
8727:"Evaluation Metrics for Language Modeling"
8651:. New York Basic Books. pp. 569–583.
8122:Patel, Roma; Pavlick, Ellie (2021-10-06).
7804:
6952:
6950:
6132:
5629:
3577:
3342:, even though this is not literally true.
1495:Reinforcement learning from human feedback
958:
944:
12028:
12013:
11998:
11156:
11134:
11112:
11046:
10949:
10905:
10797:
10746:
10602:
10547:
10401:
10318:
10262:
10241:
10187:
10165:
10109:
10017:
9935:
9870:"openai-community/gpt2-xl · Hugging Face"
9759:
9616:
9580:
9559:
9521:
9482:
9338:
9313:
9226:
9205:
8929:
8905:
8884:
8872:
8786:
8759:
8749:
8747:
8720:
8718:
8631:
8582:
8418:
8366:
8340:
8298:
8277:
8271:
8231:
8209:
8031:
8025:
8010:
7965:
7941:
7852:
7831:
7810:
7789:
7756:
7722:
7607:
7552:
7530:
7509:
7433:
7380:
7359:
7338:
7317:
7295:
7274:
7253:
7213:
7179:
7158:
7136:
7085:
7064:
7040:
7033:Artificial Intelligence Index Report 2023
7015:
6993:
6789:
6610:Transactions on Machine Learning Research
6554:
6545:
6523:
6501:
6480:
6458:
6436:
6415:
6395:
6340:
6293:
6239:Paaß, Gerhard; Giesselbach, Sven (2022).
6229:
6179:
6112:
6086:
6031:
5833:
5800:
5674:
5568:
5524:
5248:Multiple sizes, the smallest being 0.5B.
3990:(Language Models for Dialog Applications)
3585:An early and influential language model.
2803:Large language models by themselves are "
2289:and the statistical hyper-parameters are
2233:is the number of parameters in the model.
1033:present in the data they are trained on.
10301:
10006:
10004:
10002:
9973:
9667:"Imagen: Text-to-Image Diffusion Models"
9497:
9389:
9199:
7955:
7953:
7936:Compute-Optimal Large Language Models".
7567:
6928:
6308:
5711:"The Unreasonable Effectiveness of Data"
2421:
1561:
1263:
1172:technology, and was based mainly on the
1147:
1124:
1115:
12007:
11685:"Our next-generation model: Gemini 1.5"
11558:
11201:
11004:
9864:
9862:
9498:Prickett, Nicole Hemsoth (2021-08-24).
8768:
8696:
8529:"The A to Z of Artificial Intelligence"
8466:
7908:
6956:
6947:
6721:"Long context prompting for Claude 2.1"
6691:"Our next-generation model: Gemini 1.5"
6046:
5558:
5503:
5443:"Language Models are Few-Shot Learners"
3288:A large number of testing datasets and
1840:. Make a small multilayered perceptron
1017:for specific tasks or can be guided by
18:Large language model emergent abilities
14:
12867:
11711:
11652:
11648:
11646:
11388:
11261:
11143:
11121:
11099:
11080:
11055:
11033:
10973:
10878:
10876:
10874:
10806:
10785:
10755:
10733:
10704:
10611:
10534:
10504:
10388:
10336:
10334:
10332:
10330:
10271:
10250:
10196:
10174:
10118:
10098:
10096:
10094:
10092:
10060:
9891:
9797:
9747:
9717:
9594:
9592:
9547:
9491:
9273:
9271:
9189:from the original on 16 December 2023.
9136:
9110:
9011:
8981:
8938:
8744:
8715:
8646:
8521:
8491:
8469:"What Kind of Mind Does ChatGPT Have?"
8427:
8383:
8286:
8265:
8240:
8218:
8004:
7998:
7885:
7867:
7846:
7208:. Curran Associates, Inc.: 9459–9474.
7002:
6188:from the original on December 15, 2023
5938:
5561:A Bit of Progress in Language Modeling
5456:. Curran Associates, Inc.: 1877–1901.
5304:405B version took 31 million hours on
3693:Trained on 32 TPUv3 chips for 1 week.
2875:, a phenomenon which has been termed "
2510:be the number of parameter count, and
2205:is the cost of training the model, in
12107:
12055:
11695:from the original on 16 February 2024
11665:from the original on 12 December 2023
11601:from the original on 13 February 2024
11571:from the original on 11 December 2023
11459:from the original on 15 December 2023
11370:from the original on 15 December 2023
11234:
9999:
9535:from the original on January 13, 2021
9437:
8724:
8671:
8497:
8197:
8172:
8166:
8145:
8115:
8086:
8040:
7974:
7950:
7928:
7591:
7539:
7518:
7497:
7467:
7421:
7394:
7392:
7368:
7347:
7326:
7283:
7262:
7233:
7188:
7167:
7145:
7123:
7094:
7073:
6981:
6831:
6829:
6827:
6769:
6757:from the original on February 2, 2024
6701:from the original on 18 February 2024
6567:
6532:
6510:
6424:
6327:
6200:
6076:
5920:from the original on January 14, 2024
5890:from the original on 14 February 2019
3394:Artificial intelligence and copyright
3339:you can't teach an old dog new tricks
3327:Adversarially constructed evaluations
3284:Task-specific datasets and benchmarks
2810:Mechanistic interpretability aims to
2415:
1997:itself, such as number of parameters
1580:The largest models, such as Google's
1522:
1506:
12585:Simple Knowledge Organization System
11540:from the original on 8 December 2023
10950:Schreiner, Maximilian (2023-07-11).
10035:
9859:
9605:Journal of Machine Learning Research
9568:
9390:Heikkilä, Melissa (August 7, 2023).
9241:
9220:
9116:
8389:
7102:"PAL: Program-aided Language Models"
6727:from the original on August 27, 2024
5875:
5630:Banko, Michele; Brill, Eric (2001).
4428:Unknown (According to rumors: 1760)
3296:, and mathematical problem-solving.
3198:
2891:have been developed in the field of
1739:propose increasingly difficult tasks
11653:Hughes, Alyssa (12 December 2023).
11643:
11622:"Cheaper, Better, Faster, Stronger"
10992:from the original on March 28, 2023
10979:
10931:from the original on March 14, 2023
10871:
10644:Ananthaswamy, Anil (8 March 2023).
10327:
10089:
9987:from the original on March 12, 2023
9921:
9735:from the original on 2 January 2024
9589:
9301:
9268:
8575:Association for Computing Machinery
8185:from the original on March 16, 2023
7909:Wiggers, Kyle (11 September 2024).
7625:"Multimodal Neural Language Models"
6668:
6642:
6077:Gu, Albert; Dao, Tri (2023-12-01),
5951:from the original on March 17, 2023
4051:based on the Megatron architecture
3765:a series of free GPT-3 alternatives
3434:
2450:reported arithmetics, decoding the
1439:
1397:) are treated as an initial set of
913:Glossary of artificial intelligence
24:
12056:Frank, Michael C. (27 June 2023).
11974:
11619:
11559:Franzen, Carl (11 December 2023).
11171:
11063:"tiiuae/falcon-40b · Hugging Face"
10824:from the original on 13 March 2023
10773:from the original on 15 March 2023
10692:from the original on 16 March 2023
10462:from the original on 13 April 2022
10289:from the original on 20 March 2023
10214:from the original on 16 March 2023
10137:from the original on 13 March 2023
9909:from the original on 11 March 2023
9847:from the original on 27 March 2023
8607:from the original on 26 March 2023
8094:"WiC: The Word-in-Context Dataset"
7598:Dettmers, Tim; Pagnoni, Artidoro;
7574:newsletter.maartengrootendorst.com
7389:
6969:from the original on 16 March 2023
6852:from the original on 23 March 2023
6824:
6624:from the original on 22 March 2023
6269:from the original on 3 August 2023
6007:from the original on June 10, 2024
4604:Trained on crowdsourced open data
3417:University of California, Berkeley
2926:and generate human like language.
2920:probabilistic context-free grammar
1687:: the augmentation of an LLM with
25:
12896:
12600:Thesaurus (information retrieval)
11279:from the original on May 18, 2023
10859:from the original on 3 March 2023
10712:"bigscience/bloom · Hugging Face"
10646:"In AI, is bigger always better?"
10048:from the original on 9 March 2023
9256:from the original on 24 June 2024
9162:from the original on 18 June 2023
9029:from the original on 18 June 2023
8999:from the original on 17 June 2023
8539:from the original on 16 June 2023
8479:from the original on 12 June 2023
8445:from the original on 12 June 2023
8173:Ornes, Stephen (March 16, 2023).
5997:"LMSYS Chatbot Arena Leaderboard"
5399:
4141:OPT (Open Pretrained Transformer)
3462:
2798:
2530:be the performance of the model.
2047:performance after (pre-)training.
1632:
1458:
1303:for masked-out token (as used in
1025:regarding syntax, semantics, and
27:Type of artificial neural network
11941:
11930:
11901:
11872:
11841:
11812:
11783:
11754:
11725:
11677:
11613:
11583:
11552:
11522:
11497:
11471:
11441:
11382:
11352:
11323:
11249:from the original on 16 May 2023
11195:
11165:
10943:
9688:
9659:
9633:
9455:from the original on 19 May 2023
9383:
9346:
9322:
9235:
9214:
9193:
9174:
9098:
9041:
8843:
8817:
8795:
8725:Huyen, Chip (October 18, 2019).
8690:
8665:
8640:
8619:
8509:from the original on 30 May 2023
7689:"VQA: Visual Question Answering"
7570:"A Visual Guide to Quantization"
7448:
5378:
5369:
5360:
5351:
5342:
5333:
3915:GLaM (Generalist Language Model)
2739:the most likely token is correct
1785:
1640:
11959:from the original on 2024-07-23
11919:from the original on 2024-06-15
11890:from the original on 2024-06-15
11861:from the original on 2024-06-17
11830:from the original on 2024-05-13
11801:from the original on 2024-04-27
11772:from the original on 2024-05-17
11743:from the original on 2024-03-04
11632:from the original on 2024-05-05
11487:from the original on 2024-05-28
11430:from the original on 2024-01-06
11401:from the original on 2024-07-22
11341:from the original on 2024-05-28
11312:from the original on 2024-01-05
11235:Elias, Jennifer (16 May 2023).
11214:from the original on 2023-07-24
11184:from the original on 2023-07-24
11022:from the original on 2023-04-03
10962:from the original on 2023-07-12
10894:from the original on 2023-06-20
10722:from the original on 2023-04-12
10576:from the original on 2023-06-16
10523:from the original on 2023-03-12
10493:from the original on 2022-04-04
10427:from the original on 2022-12-10
10377:from the original on 2022-03-25
9962:from the original on 2022-11-30
9880:from the original on 2024-07-24
9815:from the original on 2023-03-16
9786:from the original on 2019-11-14
9706:from the original on 2024-08-05
9677:from the original on 2024-03-27
9649:from the original on 2024-03-29
9645:, Google Research, 2024-04-02,
9510:from the original on 2023-06-20
9426:from the original on 2023-03-18
9290:from the original on 2023-12-29
9086:from the original on 2024-08-27
8861:from the original on 2024-07-26
8833:from the original on 2024-05-08
8134:from the original on 2023-06-24
8104:from the original on 2023-06-27
8075:from the original on 2023-06-27
7902:
7889:Google Keynote (Google I/O '23)
7879:
7861:
7840:
7819:
7798:
7776:
7765:from the original on 2023-07-02
7731:
7710:
7699:from the original on 2023-07-02
7680:
7669:from the original on 2023-07-02
7646:
7635:from the original on 2023-07-02
7616:
7561:
7486:from the original on 2023-06-14
7442:
7410:from the original on 2023-06-08
7304:
7222:from the original on 2023-06-12
7112:from the original on 2023-06-12
7024:
6957:Wiggers, Kyle (28 April 2022).
6922:
6911:from the original on 2024-01-24
6893:
6882:from the original on 2024-07-26
6864:
6739:
6713:
6683:
6651:from the original on 2023-07-25
6580:from the original on 2023-03-12
6445:
6403:
6379:
6349:
6302:
6281:
6159:
6126:
6094:
6070:
6059:from the original on 2023-11-17
6040:
6019:
5989:
5963:
5939:Heaven, Will (March 14, 2023).
5932:
5902:
5876:Hern, Alex (14 February 2019).
5869:
5858:from the original on 2022-04-03
5809:
5788:
5777:from the original on 2024-02-21
5741:
5691:from the original on 2024-06-07
5541:from the original on 2023-11-17
5463:from the original on 2023-11-17
5417:from the original on 2020-12-19
4493:Technology Innovation Institute
3497:Number of parameters (billion)
3449:
3368:
2924:NLP to model cognitive patterns
2850:artificial general intelligence
2452:International Phonetic Alphabet
1953:
1274:
1187:was introduced in 2018, it was
1092:models initially released with
12181:Natural language understanding
10036:Iyer, Abhishek (15 May 2021).
9181:Stephen Council (1 Dec 2023).
8676:. Cambridge University Press.
8467:Newport, Cal (13 April 2023).
7886:Pichai, Sundar (10 May 2023),
6845:(3rd edition draft ed.).
6839:Speech and Language Processing
6309:Lundberg, Scott (2023-12-12).
6049:"What Is a Transformer Model?"
5702:
5659:"The Web as a Parallel Corpus"
5650:
5623:
5584:
5559:Goodman, Joshua (2001-08-09),
5552:
5474:
5123:Training cost 10 million USD.
5120:Databricks Open Model License
4377:(Large Language Model Meta AI)
4132:Trained for ~60 days on ~6000
3251:
3243:
3050:
3047:
3017:
3011:
2965:
2957:
2838:Understanding and intelligence
2780:
2762:
2742:
2734:
2697:
2679:
2659:
2656:
2648:
2642:
2602:
2584:
2564:
2556:
1905:
1902:
1896:
1890:
1755:
1685:retrieval-augmented generation
1683:A simpler form of tool use is
1484:Fine-tuning (machine learning)
1011:transformer-based architecture
333:Relevance vector machine (RVM)
13:
1:
12705:Optical character recognition
11389:Nirmal, Dinesh (2023-09-07).
10980:Dey, Nolan (March 28, 2023).
9242:Wang, Yongge (20 June 2024).
8947:Nature Biomedical Engineering
7868:OpenAI (September 25, 2023).
6929:Albrecht, Josh (2024-07-23).
6373:10.18653/v1/2022.acl-long.577
6367:. 1: Long Papers: 8424–8445.
6241:"Pre-trained Language Models"
5392:
5139:Tokyo Institute of Technology
5027:Multimodal model, based on a
3376:Nature Biomedical Engineering
2934:
2929:
1948:
1919:frozen to improve stability.
822:Computational learning theory
386:Expectation–maximization (EM)
12398:Multi-document summarization
11909:"Nemotron-4 340B | Research"
11202:Wiggers, Kyle (2023-04-13).
8498:Roose, Kevin (30 May 2023).
6047:Merritt, Rick (2022-03-25).
4900:the chatbot of the same name
3503:Training cost (petaFLOP-day)
3186:Because language models may
1880:, the post-processed vector
1812:for image-text to text, and
1727:in the subsequent episodes.
1551:Attention (machine learning)
1501:proximal policy optimization
1074:the chatbot of the same name
779:Coefficient of determination
626:Convolutional neural network
338:Support vector machine (SVM)
7:
12885:Natural language processing
12728:Latent Dirichlet allocation
12700:Natural language generation
12565:Machine-readable dictionary
12560:Linguistic Linked Open Data
12135:Natural language processing
11820:"Phi-3 Model Documentation"
7870:"GPT-4V(ision) System Card"
6253:10.1007/978-3-031-23190-2_2
6196:– via openreview.net.
5757:"Attention is All you Need"
5314:
4641:(Pathways Language Model 2)
4246:Large collaboration led by
3801:GPT-3-style language model
3402:
2828:modular arithmetic addition
2426:At point(s) referred to as
2051:They are related by simple
1666:
1419:
988:natural language processing
930:Outline of machine learning
827:Empirical risk minimization
10:
12901:
12480:Explicit semantic analysis
12229:Deep linguistic processing
12074:10.1038/s44159-023-00211-x
11989:, 3rd Edition draft, 2023.
11620:AI, Mistral (2024-04-17).
11530:"Gemini – Google DeepMind"
10670:10.1038/d41586-023-00641-w
9117:Alba, Davey (1 May 2023).
9105:Peng, Wang & Deng 2023
8959:10.1038/s41551-023-01012-6
8953:(2): 85–86. 7 March 2023.
8390:Metz, Cade (16 May 2023).
7665:. Curran Associates, Inc.
5770:. Curran Associates, Inc.
5676:10.1162/089120103322711578
5609:10.1162/089120103322711569
5275:NVIDIA Open Model License
3475:
3438:
3391:
3388:Memorization and copyright
2832:discrete Fourier transform
2786:{\displaystyle (\log x,y)}
2703:{\displaystyle (\log x,y)}
2608:{\displaystyle (\log x,y)}
2462:chain-of-thought prompting
1957:
1789:
1544:
1526:
1492:
1481:
1462:
1443:
1386:
1267:
1143:Neural Machine Translation
1111:
1009:built with a decoder-only
1007:artificial neural networks
567:Feedforward neural network
318:Artificial neural networks
29:
12831:
12786:
12741:
12713:
12673:
12618:
12540:
12528:
12459:
12416:
12388:
12323:Word-sense disambiguation
12199:
12176:Computational linguistics
12141:
12062:Nature Reviews Psychology
9445:"finetune-transformer-lm"
8697:Friston, Karl J. (2022).
7057:Section 2.1 and Table 1,
6645:"Illustrated transformer"
6133:Yennie Jun (2023-05-03).
5663:Computational Linguistics
5597:Computational Linguistics
4104:(Pathways Language Model)
4092:bot. Often cited for its
1995:artificial neural network
1810:visual question answering
1698:
1478:Training and architecture
1166:Attention Is All You Need
1164:in their landmark paper "
550:Artificial neural network
12849:Natural Language Toolkit
12773:Pronunciation assessment
12675:Automatic identification
12505:Latent semantic analysis
12461:Distributional semantics
12346:Compound-term processing
12244:Named-entity recognition
11449:"Introducing Claude 2.1"
10913:"GPT-4 Technical Report"
10340:Table 20 and page 66 of
9249:. IACR ePrint 2024/586.
5715:IEEE Intelligent Systems
5326:
5308:-80GB, at 3.8E25 FLOPs.
4736:Used in Claude chatbot.
4706:1.7 million A100-hours.
4403:Non-commercial research
4308:AlexaTM (Teacher Models)
4168:Non-commercial research
3874:is based on this model.
2180:where the variables are
1968:cost of (pre-)training (
1860:, so that for any image
1451:training a further LLM.
1250:recurrent neural network
1162:transformer architecture
1044:series of models (e.g.,
859:Journals and conferences
806:Mathematical foundations
716:Temporal difference (TD)
572:Recurrent neural network
492:Conditional random field
415:Dimensionality reduction
163:Dimensionality reduction
125:Quantum machine learning
120:Neuromorphic engineering
80:Self-supervised learning
75:Semi-supervised learning
30:Not to be confused with
12753:Automated essay scoring
12723:Document classification
12390:Automatic summarization
11483:, xai-org, 2024-03-19,
11420:"Announcing Mistral 7B"
9367:10.1145/3582269.3615599
9156:10.1126/science.adj2463
8672:Evans, Vyvyan. (2014).
8647:Lakoff, George (1999).
8359:10.1073/pnas.2215907120
7568:Grootendorst, Maarten.
6800:10.1145/3373017.3373028
5644:10.3115/1073012.1073017
5505:Manning, Christopher D.
3471:
2918:mapped out the role of
2319:{\displaystyle C_{0}=6}
2066:schedule, states that:
1911:{\displaystyle f(E(y))}
1731:Monte Carlo tree search
1021:. These models acquire
268:Apprenticeship learning
12610:Universal Dependencies
12303:Terminology extraction
12286:Semantic decomposition
12281:Semantic role labeling
12271:Part-of-speech tagging
12239:Information extraction
12224:Coreference resolution
12214:Collocation extraction
9671:imagen.research.google
9095:Citing Lee et al 2022.
8829:, OpenAI, 2024-05-28,
8163:(Yao Fu, Nov 20, 2022)
4408:Chinchilla scaling law
3870:Chinese-language LLM.
3836:Restricted web access
3360:
3258:
3177:
3157:
3137:
3117:
3097:
3077:
3057:
3004:
2787:
2749:
2704:
2666:
2609:
2571:
2524:
2504:
2431:
2405:
2320:
2274:
2250:
2226:
2198:
2174:
2037:
2012:
1983:
1912:
1874:
1854:
1834:
1649:
1568:
1382:
1183:Although decoder-only
1153:
1130:
1122:
1036:Some notable LLMs are
817:Bias–variance tradeoff
699:Reinforcement learning
675:Spiking neural network
85:Reinforcement learning
32:Logic learning machine
12875:Large language models
12371:Sentence segmentation
11178:www.timesofisrael.com
9776:"GPT-2: 1.5B Release"
9396:MIT Technology Review
8567:ACM Computing Surveys
5945:MIT Technology Review
5916:. November 30, 2023.
3392:Further information:
3348:
3294:commonsense reasoning
3259:
3178:
3158:
3138:
3118:
3098:
3078:
3058:
2984:
2893:cognitive linguistics
2788:
2750:
2705:
2667:
2610:
2572:
2525:
2505:
2425:
2406:
2321:
2275:
2251:
2227:
2199:
2175:
2038:
2013:
1984:
1913:
1875:
1855:
1835:
1648:
1565:
1264:Dataset preprocessing
1151:
1128:
1119:
978:) is a computational
653:Neural radiance field
475:Structured prediction
198:Structured prediction
70:Unsupervised learning
12823:Voice user interface
12534:datasets and corpora
12475:Document-term matrix
12328:Word-sense induction
12043:aiindex.stanford.edu
11983:, Martin, James. H.
11691:. 15 February 2024.
11597:. 11 December 2023.
11591:"Mixtral of experts"
11269:"Introducing PaLM 2"
10855:. 24 February 2023.
10820:. 17 November 2022.
8807:, OpenAI, 2024-05-28
8061:10.18653/v1/N19-1128
7404:voyager.minedojo.org
6723:. December 6, 2023.
6697:. 15 February 2024.
5844:10.1162/tacl_a_00349
5526:10.1162/daed_a_01905
4596:1.5 trillion tokens
3219:
3167:
3147:
3127:
3107:
3087:
3067:
2948:
2895:. American linguist
2759:
2717:
2676:
2622:
2581:
2539:
2514:
2494:
2334:
2297:
2264:
2240:
2216:
2188:
2070:
2027:
2002:
1973:
1884:
1864:
1844:
1824:
1816:for speech to text.
1808:for image to label,
1777:different precisions
1135:IBM alignment models
982:capable of language
972:large language model
842:Statistical learning
740:Learning with humans
532:Local outlier factor
12803:Interactive fiction
12733:Pachinko allocation
12690:Speech segmentation
12646:Google Ngram Viewer
12418:Machine translation
12408:Text simplification
12403:Sentence extraction
12291:Semantic similarity
11913:research.nvidia.com
11795:azure.microsoft.com
11721:– via GitHub.
11505:"Grok-1 model card"
10662:2023Natur.615..202A
10285:. 8 December 2021.
9995:– via GitHub.
9543:– via GitHub.
9284:Scientific American
8804:openai/simple-evals
8351:2023PNAS..12015907M
8335:(13): e2215907120.
7482:. PMLR: 7197–7206.
7455:www.theregister.com
7248:. PMLR: 9118–9147.
6212:platform.openai.com
5977:. November 30, 2023
5727:10.1109/MIS.2009.36
5579:2001cs........8005G
5195:Granite Code Models
4566:329 billion tokens
3806:Megatron-Turing NLG
2922:(PCFG) in enabling
2901:computational basis
2456:cardinal directions
2444:in-context learning
1792:Multimodal learning
1743:curriculum learning
685:Electrochemical RAM
592:reservoir computing
323:Logistic regression
242:Supervised learning
228:Multimodal learning
203:Feature engineering
148:Generative modeling
110:Rule-based learning
105:Curriculum learning
65:Supervised learning
40:Part of a series on
12813:Question answering
12685:Speech recognition
12550:Corpus linguistics
12530:Language resources
12313:Textual entailment
12296:Sentiment analysis
11659:Microsoft Research
11092:2024-02-08 at the
10348:2023-06-10 at the
10131:Microsoft Research
9983:. March 15, 2023.
9531:. March 13, 2023.
8503:The New York Times
8396:The New York Times
8158:2023-06-24 at the
8098:pilehvar.github.io
5975:ourworldindata.org
5058:Gemma Terms of Use
5029:Mixture-of-Experts
4934:Mixture of experts
4477:Chinchilla formula
4094:neural scaling law
3945:mixture of experts
3610:34 billion tokens
3413:Cornell University
3323:-shot prompting).
3254:
3205:information theory
3173:
3153:
3133:
3113:
3093:
3073:
3053:
2881:Terrence Sejnowski
2783:
2745:
2700:
2662:
2605:
2567:
2520:
2500:
2485:smooth scaling law
2432:
2416:Emergent abilities
2401:
2316:
2270:
2246:
2222:
2194:
2170:
2165:
2057:Chinchilla scaling
2033:
2008:
1979:
1960:Neural scaling law
1908:
1870:
1850:
1830:
1814:speech recognition
1689:document retrieval
1650:
1569:
1557:prompt engineering
1547:Prompt engineering
1535:mixture of experts
1529:Mixture of experts
1523:Mixture of experts
1507:Instruction tuning
1389:Byte pair encoding
1367:Tokenization also
1298:control characters
1290:byte-pair encoding
1154:
1131:
1123:
1084:family of models,
1019:prompt engineering
1002:training process.
253: •
168:Density estimation
12862:
12861:
12818:Virtual assistant
12743:Computer-assisted
12669:
12668:
12426:Computer-assisted
12384:
12383:
12376:Word segmentation
12338:Text segmentation
12276:Semantic analysis
12264:Syntactic parsing
12249:Ontology learning
11797:. 23 April 2024.
11737:www.anthropic.com
10769:. 2 August 2022.
10656:(7951): 202–205.
10623:ai.googleblog.com
10487:ai.googleblog.com
10371:ai.googleblog.com
9504:The Next Platform
9422:. June 11, 2018.
9376:979-8-4007-0113-9
8708:978-0-262-36997-8
8701:. The MIT Press.
8683:978-1-107-04396-1
8674:The Language Myth
8658:978-0-465-05674-3
8535:. 13 April 2023.
7892:, timestamp 15:31
7631:. PMLR: 595–603.
7106:reasonwithpal.com
6574:ai.googleblog.com
6218:on April 23, 2023
5321:Foundation models
5312:
5311:
3249:
3225:
3207:, the concept of
3199:BPW, BPC, and BPT
3176:{\displaystyle i}
3156:{\displaystyle i}
3136:{\displaystyle i}
3116:{\displaystyle i}
3096:{\displaystyle i}
3076:{\displaystyle N}
3039:
3038:context for token
3024:
2982:
2963:
2865:stochastic parrot
2740:
2729:
2654:
2634:
2562:
2551:
2523:{\displaystyle y}
2503:{\displaystyle x}
2411:
2326:
2280:
2273:{\displaystyle L}
2256:
2249:{\displaystyle D}
2232:
2225:{\displaystyle N}
2204:
2197:{\displaystyle C}
2148:
2128:
2043:
2036:{\displaystyle D}
2018:
2011:{\displaystyle N}
1989:
1982:{\displaystyle C}
1873:{\displaystyle y}
1853:{\displaystyle f}
1833:{\displaystyle E}
1705:intelligent agent
1395:punctuation marks
1365:
1364:
1318:
1062:Microsoft Copilot
968:
967:
773:Model diagnostics
756:Human-in-the-loop
599:Boltzmann machine
512:Anomaly detection
308:Linear regression
223:Ontology learning
218:Grammar induction
193:Semantic analysis
188:Association rules
173:Anomaly detection
115:Neuro-symbolic AI
16:(Redirected from
12892:
12839:Formal semantics
12788:Natural language
12695:Speech synthesis
12677:and data capture
12580:Semantic network
12555:Lexical resource
12538:
12537:
12356:Lexical analysis
12334:
12333:
12259:Semantic parsing
12128:
12121:
12114:
12105:
12104:
12100:
12098:
12096:
12052:
12050:
12049:
12034:
12032:
12019:
12017:
12004:
12002:
11968:
11967:
11965:
11964:
11945:
11939:
11934:
11928:
11927:
11925:
11924:
11905:
11899:
11898:
11896:
11895:
11876:
11870:
11869:
11867:
11866:
11845:
11839:
11838:
11836:
11835:
11816:
11810:
11809:
11807:
11806:
11787:
11781:
11780:
11778:
11777:
11758:
11752:
11751:
11749:
11748:
11729:
11723:
11722:
11715:
11709:
11708:
11702:
11700:
11681:
11675:
11674:
11672:
11670:
11650:
11641:
11640:
11638:
11637:
11617:
11611:
11610:
11608:
11606:
11587:
11581:
11580:
11578:
11576:
11556:
11550:
11549:
11547:
11545:
11526:
11520:
11519:
11517:
11515:
11501:
11495:
11494:
11493:
11492:
11475:
11469:
11468:
11466:
11464:
11445:
11439:
11438:
11436:
11435:
11416:
11410:
11409:
11407:
11406:
11386:
11380:
11379:
11377:
11375:
11356:
11350:
11349:
11347:
11346:
11327:
11321:
11320:
11318:
11317:
11298:
11289:
11288:
11286:
11284:
11275:. May 10, 2023.
11265:
11259:
11258:
11256:
11254:
11232:
11223:
11222:
11220:
11219:
11199:
11193:
11192:
11190:
11189:
11172:Wrobel, Sharon.
11169:
11163:
11162:
11160:
11147:
11141:
11140:
11138:
11125:
11119:
11118:
11116:
11103:
11097:
11084:
11078:
11077:
11075:
11074:
11059:
11053:
11052:
11050:
11037:
11031:
11030:
11028:
11027:
11008:
11002:
11001:
10999:
10997:
10977:
10971:
10970:
10968:
10967:
10947:
10941:
10940:
10938:
10936:
10930:
10917:
10909:
10903:
10902:
10900:
10899:
10880:
10869:
10868:
10866:
10864:
10845:
10834:
10833:
10831:
10829:
10810:
10804:
10803:
10801:
10789:
10783:
10782:
10780:
10778:
10759:
10753:
10752:
10750:
10737:
10731:
10730:
10728:
10727:
10708:
10702:
10701:
10699:
10697:
10641:
10635:
10634:
10632:
10630:
10615:
10609:
10608:
10606:
10593:
10584:
10583:
10582:
10581:
10563:
10554:
10553:
10551:
10538:
10532:
10531:
10529:
10528:
10508:
10502:
10501:
10499:
10498:
10478:
10472:
10471:
10469:
10467:
10447:
10436:
10435:
10433:
10432:
10414:
10408:
10407:
10405:
10392:
10386:
10385:
10383:
10382:
10362:
10353:
10338:
10325:
10324:
10322:
10310:
10299:
10298:
10296:
10294:
10283:www.deepmind.com
10275:
10269:
10268:
10266:
10254:
10248:
10247:
10245:
10233:
10224:
10223:
10221:
10219:
10200:
10194:
10193:
10191:
10178:
10172:
10171:
10169:
10156:
10147:
10146:
10144:
10142:
10122:
10116:
10115:
10113:
10100:
10087:
10086:
10084:
10083:
10074:. Archived from
10072:www.forefront.ai
10064:
10058:
10057:
10055:
10053:
10033:
10024:
10023:
10021:
10008:
9997:
9996:
9994:
9992:
9977:
9971:
9970:
9968:
9967:
9948:
9942:
9941:
9939:
9925:
9919:
9918:
9916:
9914:
9895:
9889:
9888:
9886:
9885:
9866:
9857:
9856:
9854:
9852:
9833:
9824:
9823:
9821:
9820:
9801:
9795:
9794:
9792:
9791:
9772:
9766:
9765:
9763:
9751:
9745:
9744:
9742:
9740:
9721:
9715:
9714:
9712:
9711:
9692:
9686:
9685:
9683:
9682:
9663:
9657:
9656:
9655:
9654:
9637:
9631:
9630:
9620:
9596:
9587:
9586:
9584:
9572:
9566:
9565:
9563:
9551:
9545:
9544:
9542:
9540:
9525:
9519:
9518:
9516:
9515:
9495:
9489:
9488:
9486:
9474:
9465:
9464:
9462:
9460:
9441:
9435:
9434:
9432:
9431:
9412:
9406:
9405:
9403:
9402:
9387:
9381:
9380:
9350:
9344:
9343:
9342:
9326:
9320:
9319:
9317:
9305:
9299:
9298:
9296:
9295:
9275:
9266:
9265:
9263:
9261:
9255:
9248:
9239:
9233:
9232:
9230:
9218:
9212:
9211:
9209:
9197:
9191:
9190:
9178:
9172:
9171:
9169:
9167:
9150:. 14 June 2023.
9140:
9134:
9133:
9131:
9129:
9114:
9108:
9102:
9096:
9094:
9092:
9091:
9085:
9054:
9045:
9039:
9038:
9036:
9034:
9015:
9009:
9008:
9006:
9004:
8985:
8979:
8978:
8942:
8936:
8935:
8933:
8921:
8912:
8911:
8909:
8897:
8891:
8890:
8888:
8876:
8870:
8869:
8867:
8866:
8847:
8841:
8840:
8839:
8838:
8821:
8815:
8814:
8813:
8812:
8799:
8793:
8792:
8790:
8777:
8766:
8765:
8763:
8751:
8742:
8741:
8739:
8737:
8722:
8713:
8712:
8694:
8688:
8687:
8669:
8663:
8662:
8644:
8638:
8637:
8635:
8623:
8617:
8616:
8614:
8612:
8586:
8564:
8555:
8549:
8548:
8546:
8544:
8525:
8519:
8518:
8516:
8514:
8495:
8489:
8488:
8486:
8484:
8464:
8455:
8454:
8452:
8450:
8431:
8425:
8424:
8422:
8409:
8400:
8399:
8387:
8381:
8380:
8370:
8344:
8320:
8305:
8304:
8302:
8290:
8284:
8283:
8281:
8269:
8263:
8262:
8260:
8259:
8244:
8238:
8237:
8235:
8222:
8216:
8215:
8213:
8201:
8195:
8194:
8192:
8190:
8170:
8164:
8149:
8143:
8142:
8140:
8139:
8119:
8113:
8112:
8110:
8109:
8090:
8084:
8083:
8081:
8080:
8044:
8038:
8037:
8035:
8023:
8017:
8016:
8014:
8002:
7996:
7995:
7993:
7992:
7978:
7972:
7971:
7969:
7957:
7948:
7947:
7945:
7932:
7926:
7925:
7923:
7921:
7906:
7900:
7899:
7898:
7897:
7883:
7877:
7876:
7874:
7865:
7859:
7858:
7856:
7844:
7838:
7837:
7835:
7823:
7817:
7816:
7814:
7802:
7796:
7795:
7793:
7780:
7774:
7773:
7771:
7770:
7760:
7735:
7729:
7728:
7726:
7714:
7708:
7707:
7705:
7704:
7684:
7678:
7677:
7675:
7674:
7650:
7644:
7643:
7641:
7640:
7620:
7614:
7613:
7611:
7595:
7589:
7588:
7586:
7585:
7576:. Archived from
7565:
7559:
7558:
7556:
7543:
7537:
7536:
7534:
7522:
7516:
7515:
7513:
7501:
7495:
7494:
7492:
7491:
7471:
7465:
7464:
7462:
7461:
7446:
7440:
7439:
7437:
7425:
7419:
7418:
7416:
7415:
7396:
7387:
7386:
7384:
7372:
7366:
7365:
7363:
7351:
7345:
7344:
7342:
7330:
7324:
7323:
7321:
7308:
7302:
7301:
7299:
7287:
7281:
7280:
7278:
7266:
7260:
7259:
7257:
7237:
7231:
7230:
7228:
7227:
7217:
7192:
7186:
7185:
7183:
7171:
7165:
7164:
7162:
7149:
7143:
7142:
7140:
7127:
7121:
7120:
7118:
7117:
7098:
7092:
7091:
7089:
7077:
7071:
7070:
7068:
7055:
7046:
7045:
7044:
7028:
7022:
7021:
7019:
7006:
7000:
6999:
6997:
6985:
6979:
6978:
6976:
6974:
6954:
6945:
6944:
6942:
6941:
6935:www.latent.space
6926:
6920:
6919:
6917:
6916:
6897:
6891:
6890:
6888:
6887:
6868:
6862:
6861:
6859:
6857:
6851:
6844:
6833:
6822:
6821:
6793:
6784:. pp. 1–4.
6773:
6767:
6766:
6764:
6762:
6743:
6737:
6736:
6734:
6732:
6717:
6711:
6710:
6708:
6706:
6687:
6681:
6680:
6678:
6677:
6666:
6660:
6659:
6657:
6656:
6640:
6634:
6633:
6631:
6629:
6600:
6589:
6588:
6586:
6585:
6565:
6552:
6551:
6549:
6536:
6530:
6529:
6527:
6514:
6508:
6507:
6505:
6493:
6487:
6486:
6484:
6471:
6465:
6464:
6462:
6449:
6443:
6442:
6440:
6428:
6422:
6421:
6419:
6407:
6401:
6400:
6399:
6383:
6377:
6376:
6362:
6353:
6347:
6346:
6344:
6331:
6325:
6324:
6322:
6321:
6306:
6300:
6299:
6297:
6285:
6279:
6278:
6276:
6274:
6236:
6227:
6226:
6224:
6223:
6214:. Archived from
6204:
6198:
6197:
6195:
6193:
6183:
6163:
6157:
6156:
6151:
6150:
6141:. Archived from
6130:
6124:
6123:
6122:
6121:
6116:
6098:
6092:
6091:
6090:
6074:
6068:
6067:
6065:
6064:
6044:
6038:
6037:
6035:
6023:
6017:
6016:
6014:
6012:
5993:
5987:
5986:
5984:
5982:
5967:
5961:
5960:
5958:
5956:
5936:
5930:
5929:
5927:
5925:
5906:
5900:
5899:
5897:
5895:
5873:
5867:
5866:
5864:
5863:
5837:
5813:
5807:
5806:
5804:
5792:
5786:
5785:
5783:
5782:
5776:
5761:
5745:
5739:
5738:
5706:
5700:
5699:
5697:
5696:
5678:
5654:
5648:
5647:
5627:
5621:
5620:
5588:
5582:
5581:
5572:
5556:
5550:
5549:
5547:
5546:
5528:
5501:
5492:
5491:
5489:
5478:
5472:
5471:
5469:
5468:
5462:
5447:
5437:
5426:
5425:
5423:
5422:
5403:
5386:
5382:
5376:
5373:
5367:
5364:
5358:
5355:
5349:
5346:
5340:
5337:
5301:Llama 3 license
5266:
5230:
5201:
5172:
5146:
5112:
5041:
5007:
4977:
4946:
4912:
4878:
4842:
4814:
4792:
4782:
4748:
4718:
4703:Llama 2 license
4696:
4691:
4683:
4659:
4654:
4646:
4614:
4593:
4583:
4563:
4553:
4530:
4520:
4499:
4489:
4464:
4454:
4442:several products
4422:
4397:
4392:
4382:
4362:privately-owned
4348:
4328:
4323:
4313:
4293:
4288:
4278:
4259:
4254:
4243:
4220:
4212:
4191:
4181:
4161:
4156:
4146:
4122:
4117:
4109:
4078:
4073:
4063:
4039:
4029:
4009:
4003:
3995:
3972:
3967:
3957:
3933:
3928:
3920:
3901:
3896:
3886:
3859:
3849:
3830:
3825:
3811:
3789:
3779:
3752:
3742:
3718:
3713:
3705:
3683:
3677:
3667:
3645:
3640:
3630:
3601:
3579:
3573:
3568:
3558:
3531:
3521:
3485:
3484:
3478:List of chatbots
3441:Algorithmic bias
3435:Algorithmic bias
3263:
3261:
3260:
3255:
3250:
3247:
3239:
3238:
3226:
3223:
3182:
3180:
3179:
3174:
3162:
3160:
3159:
3154:
3142:
3140:
3139:
3134:
3122:
3120:
3119:
3114:
3102:
3100:
3099:
3094:
3082:
3080:
3079:
3074:
3062:
3060:
3059:
3054:
3046:
3045:
3040:
3037:
3031:
3030:
3025:
3022:
3003:
2998:
2983:
2975:
2964:
2961:
2812:reverse-engineer
2792:
2790:
2789:
2784:
2754:
2752:
2751:
2746:
2741:
2738:
2730:
2727:
2709:
2707:
2706:
2701:
2671:
2669:
2668:
2663:
2655:
2652:
2635:
2632:
2614:
2612:
2611:
2606:
2576:
2574:
2573:
2568:
2563:
2560:
2552:
2549:
2529:
2527:
2526:
2521:
2509:
2507:
2506:
2501:
2410:
2408:
2407:
2402:
2394:
2393:
2330:
2325:
2323:
2322:
2317:
2309:
2308:
2293:
2279:
2277:
2276:
2271:
2260:
2255:
2253:
2252:
2247:
2236:
2231:
2229:
2228:
2223:
2212:
2203:
2201:
2200:
2195:
2184:
2179:
2177:
2176:
2171:
2169:
2168:
2162:
2161:
2149:
2147:
2146:
2134:
2129:
2127:
2126:
2114:
2096:
2095:
2053:statistical laws
2042:
2040:
2039:
2034:
2023:
2017:
2015:
2014:
2009:
1998:
1988:
1986:
1985:
1980:
1969:
1917:
1915:
1914:
1909:
1879:
1877:
1876:
1871:
1859:
1857:
1856:
1851:
1839:
1837:
1836:
1831:
1629:and evaluation.
1615:
1595:considerations.
1455:trained on it).
1440:Dataset cleaning
1346: numerical
1322:
1317:
1314:
1309:
1302:
1281:machine learning
1227:source-available
1023:predictive power
960:
953:
946:
907:Related articles
784:Confusion matrix
537:Isolation forest
482:Graphical models
261:
260:
213:Learning to rank
208:Feature learning
46:Machine learning
37:
36:
21:
12900:
12899:
12895:
12894:
12893:
12891:
12890:
12889:
12865:
12864:
12863:
12858:
12827:
12807:Syntax guessing
12789:
12782:
12768:Predictive text
12763:Grammar checker
12744:
12737:
12709:
12676:
12665:
12631:Bank of English
12614:
12542:
12533:
12524:
12455:
12412:
12380:
12332:
12234:Distant reading
12209:Argument mining
12195:
12191:Text processing
12137:
12132:
12094:
12092:
12047:
12045:
12037:
11977:
11975:Further reading
11972:
11971:
11962:
11960:
11947:
11946:
11942:
11935:
11931:
11922:
11920:
11907:
11906:
11902:
11893:
11891:
11878:
11877:
11873:
11864:
11862:
11847:
11846:
11842:
11833:
11831:
11818:
11817:
11813:
11804:
11802:
11789:
11788:
11784:
11775:
11773:
11760:
11759:
11755:
11746:
11744:
11731:
11730:
11726:
11717:
11716:
11712:
11698:
11696:
11683:
11682:
11678:
11668:
11666:
11651:
11644:
11635:
11633:
11618:
11614:
11604:
11602:
11589:
11588:
11584:
11574:
11572:
11557:
11553:
11543:
11541:
11534:deepmind.google
11528:
11527:
11523:
11513:
11511:
11503:
11502:
11498:
11490:
11488:
11477:
11476:
11472:
11462:
11460:
11447:
11446:
11442:
11433:
11431:
11418:
11417:
11413:
11404:
11402:
11387:
11383:
11373:
11371:
11358:
11357:
11353:
11344:
11342:
11329:
11328:
11324:
11315:
11313:
11300:
11299:
11292:
11282:
11280:
11267:
11266:
11262:
11252:
11250:
11233:
11226:
11217:
11215:
11200:
11196:
11187:
11185:
11170:
11166:
11148:
11144:
11126:
11122:
11104:
11100:
11094:Wayback Machine
11085:
11081:
11072:
11070:
11061:
11060:
11056:
11038:
11034:
11025:
11023:
11010:
11009:
11005:
10995:
10993:
10978:
10974:
10965:
10963:
10948:
10944:
10934:
10932:
10928:
10915:
10911:
10910:
10906:
10897:
10895:
10882:
10881:
10872:
10862:
10860:
10847:
10846:
10837:
10827:
10825:
10812:
10811:
10807:
10790:
10786:
10776:
10774:
10761:
10760:
10756:
10738:
10734:
10725:
10723:
10710:
10709:
10705:
10695:
10693:
10642:
10638:
10628:
10626:
10617:
10616:
10612:
10594:
10587:
10579:
10577:
10564:
10557:
10539:
10535:
10526:
10524:
10517:ai.facebook.com
10509:
10505:
10496:
10494:
10479:
10475:
10465:
10463:
10448:
10439:
10430:
10428:
10415:
10411:
10393:
10389:
10380:
10378:
10363:
10356:
10350:Wayback Machine
10339:
10328:
10311:
10302:
10292:
10290:
10277:
10276:
10272:
10255:
10251:
10234:
10227:
10217:
10215:
10202:
10201:
10197:
10179:
10175:
10157:
10150:
10140:
10138:
10123:
10119:
10101:
10090:
10081:
10079:
10066:
10065:
10061:
10051:
10049:
10034:
10027:
10009:
10000:
9990:
9988:
9979:
9978:
9974:
9965:
9963:
9950:
9949:
9945:
9926:
9922:
9912:
9910:
9897:
9896:
9892:
9883:
9881:
9868:
9867:
9860:
9850:
9848:
9843:. 3 June 2020.
9835:
9834:
9827:
9818:
9816:
9803:
9802:
9798:
9789:
9787:
9774:
9773:
9769:
9752:
9748:
9738:
9736:
9723:
9722:
9718:
9709:
9707:
9694:
9693:
9689:
9680:
9678:
9665:
9664:
9660:
9652:
9650:
9639:
9638:
9634:
9597:
9590:
9573:
9569:
9552:
9548:
9538:
9536:
9527:
9526:
9522:
9513:
9511:
9496:
9492:
9475:
9468:
9458:
9456:
9443:
9442:
9438:
9429:
9427:
9414:
9413:
9409:
9400:
9398:
9388:
9384:
9377:
9351:
9347:
9327:
9323:
9306:
9302:
9293:
9291:
9276:
9269:
9259:
9257:
9253:
9246:
9240:
9236:
9219:
9215:
9198:
9194:
9179:
9175:
9165:
9163:
9142:
9141:
9137:
9127:
9125:
9123:The Japan Times
9115:
9111:
9103:
9099:
9089:
9087:
9083:
9069:10.1145/3589324
9052:
9046:
9042:
9032:
9030:
9017:
9016:
9012:
9002:
9000:
8987:
8986:
8982:
8944:
8943:
8939:
8922:
8915:
8898:
8894:
8877:
8873:
8864:
8862:
8849:
8848:
8844:
8836:
8834:
8823:
8822:
8818:
8810:
8808:
8801:
8800:
8796:
8778:
8769:
8752:
8745:
8735:
8733:
8723:
8716:
8709:
8695:
8691:
8684:
8670:
8666:
8659:
8645:
8641:
8624:
8620:
8610:
8608:
8593:10.1145/3571730
8562:
8556:
8552:
8542:
8540:
8527:
8526:
8522:
8512:
8510:
8496:
8492:
8482:
8480:
8465:
8458:
8448:
8446:
8433:
8432:
8428:
8410:
8403:
8388:
8384:
8321:
8308:
8291:
8287:
8270:
8266:
8257:
8255:
8246:
8245:
8241:
8223:
8219:
8202:
8198:
8188:
8186:
8179:Quanta Magazine
8171:
8167:
8160:Wayback Machine
8150:
8146:
8137:
8135:
8120:
8116:
8107:
8105:
8092:
8091:
8087:
8078:
8076:
8045:
8041:
8024:
8020:
8003:
7999:
7990:
7988:
7980:
7979:
7975:
7958:
7951:
7933:
7929:
7919:
7917:
7907:
7903:
7895:
7893:
7884:
7880:
7872:
7866:
7862:
7845:
7841:
7824:
7820:
7803:
7799:
7781:
7777:
7768:
7766:
7751:: 23716–23736.
7736:
7732:
7715:
7711:
7702:
7700:
7685:
7681:
7672:
7670:
7651:
7647:
7638:
7636:
7621:
7617:
7596:
7592:
7583:
7581:
7566:
7562:
7544:
7540:
7523:
7519:
7502:
7498:
7489:
7487:
7472:
7468:
7459:
7457:
7447:
7443:
7426:
7422:
7413:
7411:
7398:
7397:
7390:
7373:
7369:
7352:
7348:
7331:
7327:
7309:
7305:
7288:
7284:
7267:
7263:
7238:
7234:
7225:
7223:
7193:
7189:
7172:
7168:
7150:
7146:
7128:
7124:
7115:
7113:
7100:
7099:
7095:
7078:
7074:
7056:
7049:
7029:
7025:
7007:
7003:
6986:
6982:
6972:
6970:
6955:
6948:
6939:
6937:
6927:
6923:
6914:
6912:
6899:
6898:
6894:
6885:
6883:
6870:
6869:
6865:
6855:
6853:
6849:
6842:
6834:
6825:
6810:
6774:
6770:
6760:
6758:
6745:
6744:
6740:
6730:
6728:
6719:
6718:
6714:
6704:
6702:
6689:
6688:
6684:
6675:
6673:
6667:
6663:
6654:
6652:
6641:
6637:
6627:
6625:
6601:
6592:
6583:
6581:
6566:
6555:
6537:
6533:
6515:
6511:
6494:
6490:
6472:
6468:
6450:
6446:
6429:
6425:
6408:
6404:
6384:
6380:
6360:
6354:
6350:
6332:
6328:
6319:
6317:
6307:
6303:
6286:
6282:
6272:
6270:
6263:
6237:
6230:
6221:
6219:
6206:
6205:
6201:
6191:
6189:
6164:
6160:
6148:
6146:
6131:
6127:
6119:
6117:
6099:
6095:
6075:
6071:
6062:
6060:
6045:
6041:
6024:
6020:
6010:
6008:
5995:
5994:
5990:
5980:
5978:
5969:
5968:
5964:
5954:
5952:
5937:
5933:
5923:
5921:
5908:
5907:
5903:
5893:
5891:
5874:
5870:
5861:
5859:
5814:
5810:
5793:
5789:
5780:
5778:
5774:
5759:
5749:Vaswani, Ashish
5746:
5742:
5707:
5703:
5694:
5692:
5655:
5651:
5628:
5624:
5589:
5585:
5557:
5553:
5544:
5542:
5502:
5495:
5487:
5479:
5475:
5466:
5464:
5460:
5445:
5438:
5429:
5420:
5418:
5405:
5404:
5400:
5395:
5390:
5389:
5383:
5379:
5374:
5370:
5365:
5361:
5356:
5352:
5347:
5343:
5338:
5334:
5329:
5317:
5045:Google DeepMind
5011:Google DeepMind
4882:Google DeepMind
4265:Responsible AI
4260:tokens (1.6TB)
3844:Ernie 3.0 Titan
3480:
3474:
3465:
3452:
3443:
3437:
3405:
3396:
3390:
3371:
3357:
3355:
3353:
3351:
3329:
3286:
3246:
3234:
3230:
3222:
3220:
3217:
3216:
3201:
3168:
3165:
3164:
3148:
3145:
3144:
3128:
3125:
3124:
3108:
3105:
3104:
3088:
3085:
3084:
3068:
3065:
3064:
3041:
3036:
3035:
3026:
3021:
3020:
2999:
2988:
2974:
2960:
2949:
2946:
2945:
2937:
2932:
2840:
2801:
2796:
2760:
2757:
2756:
2737:
2726:
2718:
2715:
2714:
2677:
2674:
2673:
2651:
2631:
2623:
2620:
2619:
2582:
2579:
2578:
2559:
2548:
2540:
2537:
2536:
2515:
2512:
2511:
2495:
2492:
2491:
2418:
2389:
2385:
2335:
2332:
2331:
2304:
2300:
2298:
2295:
2294:
2265:
2262:
2261:
2241:
2238:
2237:
2217:
2214:
2213:
2189:
2186:
2185:
2164:
2163:
2157:
2153:
2142:
2138:
2133:
2122:
2118:
2113:
2104:
2103:
2091:
2087:
2074:
2073:
2071:
2068:
2067:
2028:
2025:
2024:
2003:
2000:
1999:
1974:
1971:
1970:
1962:
1956:
1951:
1939:Google DeepMind
1885:
1882:
1881:
1865:
1862:
1861:
1845:
1842:
1841:
1825:
1822:
1821:
1794:
1788:
1758:
1701:
1693:vector database
1669:
1643:
1635:
1614:
1553:
1543:
1531:
1525:
1509:
1497:
1491:
1486:
1480:
1467:
1461:
1448:
1442:
1422:
1391:
1385:
1315:
1308:
1301:
1277:
1272:
1266:
1114:
1000:semi-supervised
996:self-supervised
992:language models
964:
935:
934:
908:
900:
899:
860:
852:
851:
812:Kernel machines
807:
799:
798:
774:
766:
765:
746:Active learning
741:
733:
732:
701:
691:
690:
616:Diffusion model
552:
542:
541:
514:
504:
503:
477:
467:
466:
422:Factor analysis
417:
407:
406:
390:
353:
343:
342:
263:
262:
246:
245:
244:
233:
232:
138:
130:
129:
95:Online learning
60:
48:
35:
28:
23:
22:
15:
12:
11:
5:
12898:
12888:
12887:
12882:
12877:
12860:
12859:
12857:
12856:
12851:
12846:
12841:
12835:
12833:
12829:
12828:
12826:
12825:
12820:
12815:
12810:
12800:
12794:
12792:
12790:user interface
12784:
12783:
12781:
12780:
12775:
12770:
12765:
12760:
12755:
12749:
12747:
12739:
12738:
12736:
12735:
12730:
12725:
12719:
12717:
12711:
12710:
12708:
12707:
12702:
12697:
12692:
12687:
12681:
12679:
12671:
12670:
12667:
12666:
12664:
12663:
12658:
12653:
12648:
12643:
12638:
12633:
12628:
12622:
12620:
12616:
12615:
12613:
12612:
12607:
12602:
12597:
12592:
12587:
12582:
12577:
12572:
12567:
12562:
12557:
12552:
12546:
12544:
12535:
12526:
12525:
12523:
12522:
12517:
12515:Word embedding
12512:
12507:
12502:
12495:Language model
12492:
12487:
12482:
12477:
12472:
12466:
12464:
12457:
12456:
12454:
12453:
12448:
12446:Transfer-based
12443:
12438:
12433:
12428:
12422:
12420:
12414:
12413:
12411:
12410:
12405:
12400:
12394:
12392:
12386:
12385:
12382:
12381:
12379:
12378:
12373:
12368:
12363:
12358:
12353:
12348:
12342:
12340:
12331:
12330:
12325:
12320:
12315:
12310:
12305:
12299:
12298:
12293:
12288:
12283:
12278:
12273:
12268:
12267:
12266:
12261:
12251:
12246:
12241:
12236:
12231:
12226:
12221:
12219:Concept mining
12216:
12211:
12205:
12203:
12197:
12196:
12194:
12193:
12188:
12183:
12178:
12173:
12172:
12171:
12166:
12156:
12151:
12145:
12143:
12139:
12138:
12131:
12130:
12123:
12116:
12108:
12102:
12101:
12068:(8): 451–452.
12053:
12035:
12020:
12005:
11990:
11976:
11973:
11970:
11969:
11940:
11929:
11900:
11886:. 2024-06-14.
11884:huggingface.co
11871:
11840:
11824:huggingface.co
11811:
11782:
11766:huggingface.co
11753:
11724:
11710:
11676:
11642:
11612:
11582:
11551:
11521:
11496:
11480:xai-org/grok-1
11470:
11440:
11411:
11381:
11351:
11322:
11290:
11260:
11224:
11194:
11164:
11142:
11120:
11098:
11079:
11067:huggingface.co
11054:
11032:
11003:
10972:
10942:
10904:
10888:huggingface.co
10870:
10835:
10818:aws.amazon.com
10805:
10784:
10767:Amazon Science
10754:
10732:
10716:huggingface.co
10703:
10636:
10625:. 30 June 2022
10610:
10585:
10555:
10533:
10503:
10473:
10437:
10409:
10387:
10354:
10326:
10300:
10270:
10249:
10225:
10195:
10173:
10148:
10117:
10088:
10059:
10025:
9998:
9972:
9958:. 2022-11-30.
9943:
9920:
9890:
9874:huggingface.co
9858:
9841:lambdalabs.com
9825:
9796:
9782:. 2019-11-05.
9767:
9746:
9716:
9700:huggingface.co
9687:
9658:
9632:
9588:
9567:
9546:
9520:
9490:
9466:
9436:
9407:
9382:
9375:
9345:
9321:
9300:
9267:
9234:
9213:
9192:
9173:
9135:
9109:
9097:
9040:
9010:
8995:. 7 May 2023.
8980:
8937:
8913:
8892:
8871:
8842:
8816:
8794:
8767:
8743:
8714:
8707:
8689:
8682:
8664:
8657:
8639:
8618:
8550:
8520:
8490:
8473:The New Yorker
8456:
8426:
8401:
8382:
8306:
8285:
8264:
8239:
8217:
8196:
8165:
8144:
8114:
8085:
8039:
8018:
7997:
7973:
7949:
7927:
7901:
7878:
7860:
7839:
7818:
7797:
7775:
7730:
7709:
7679:
7645:
7615:
7590:
7580:on 31 Jul 2024
7560:
7538:
7517:
7496:
7466:
7449:Mann, Tobias.
7441:
7420:
7388:
7367:
7346:
7325:
7303:
7282:
7261:
7232:
7187:
7166:
7144:
7122:
7093:
7072:
7047:
7023:
7001:
6980:
6946:
6921:
6892:
6863:
6823:
6808:
6768:
6738:
6712:
6682:
6669:Allamar, Jay.
6661:
6643:Allamar, Jay.
6635:
6590:
6553:
6531:
6509:
6488:
6466:
6444:
6423:
6402:
6378:
6348:
6326:
6301:
6280:
6261:
6228:
6199:
6158:
6125:
6093:
6069:
6039:
6018:
6001:huggingface.co
5988:
5962:
5931:
5901:
5868:
5808:
5787:
5753:Gomez, Aidan N
5740:
5701:
5669:(3): 349–380.
5649:
5622:
5603:(3): 333–347.
5583:
5551:
5519:(2): 127–138.
5493:
5473:
5427:
5413:. 2019-02-14.
5397:
5396:
5394:
5391:
5388:
5387:
5377:
5368:
5359:
5350:
5341:
5331:
5330:
5328:
5325:
5324:
5323:
5316:
5313:
5310:
5309:
5302:
5299:
5296:
5293:
5290:
5287:
5284:
5280:
5279:
5276:
5273:
5270:
5267:
5262:
5257:
5254:
5250:
5249:
5246:
5244:
5242:
5239:
5236:
5231:
5226:
5222:
5221:
5219:
5216:
5213:
5210:
5207:
5202:
5197:
5191:
5190:
5187:
5184:
5182:
5179:
5176:
5173:
5168:
5162:
5161:
5154:
5152:
5150:
5147:
5142:
5132:
5129:
5125:
5124:
5121:
5118:
5116:
5113:
5108:
5099:
5096:
5090:
5089:
5086:
5083:
5080:
5077:
5074:
5071:
5068:
5062:
5061:
5059:
5056:
5053:
5050:
5047:
5042:
5037:
5033:
5032:
5025:
5022:
5019:
5016:
5013:
5008:
5003:
4997:
4996:
4993:
4990:
4987:
4984:
4981:
4978:
4973:
4967:
4966:
4964:
4961:
4958:
4955:
4952:
4947:
4942:
4941:Mixtral 8x22B
4938:
4937:
4930:
4927:
4924:
4921:
4918:
4913:
4908:
4904:
4903:
4896:
4893:
4890:
4887:
4884:
4879:
4874:
4868:
4867:
4860:
4857:
4854:
4851:
4848:
4843:
4838:
4834:
4833:
4830:
4827:
4824:
4821:
4818:
4815:
4810:
4804:
4803:
4801:
4798:
4796:
4793:
4788:
4783:
4781:September 2023
4778:
4774:
4773:
4766:
4763:
4760:
4757:
4754:
4749:
4744:
4738:
4737:
4734:
4731:
4728:
4725:
4722:
4719:
4714:
4708:
4707:
4704:
4701:
4698:
4692:
4687:
4684:
4679:
4675:
4674:
4667:
4664:
4661:
4655:
4650:
4647:
4642:
4635:
4634:
4631:
4628:
4626:
4623:
4620:
4615:
4610:
4606:
4605:
4602:
4599:
4597:
4594:
4589:
4584:
4579:
4575:
4574:
4572:
4569:
4567:
4564:
4559:
4554:
4549:
4543:
4542:
4539:
4536:
4534:
4531:
4526:
4524:Bloomberg L.P.
4521:
4516:
4512:
4511:
4509:
4506:
4503:
4500:
4495:
4490:
4485:
4481:
4480:
4473:
4470:
4467:
4465:
4460:
4455:
4450:
4446:
4445:
4438:
4435:
4432:
4429:
4426:
4423:
4418:
4412:
4411:
4404:
4401:
4398:
4393:
4388:
4383:
4378:
4371:
4370:
4363:
4360:
4358:
4355:
4352:
4349:
4344:
4338:
4337:
4334:
4331:
4329:
4324:
4319:
4314:
4309:
4305:
4304:
4301:
4298:
4295:
4289:
4284:
4279:
4274:
4270:
4269:
4266:
4263:
4261:
4255:
4250:
4244:
4239:
4233:
4232:
4229:
4226:
4224:
4221:
4216:
4213:
4208:
4204:
4203:
4200:
4197:
4195:
4192:
4187:
4182:
4177:
4173:
4172:
4169:
4166:
4163:
4157:
4152:
4147:
4142:
4138:
4137:
4130:
4127:
4124:
4118:
4113:
4110:
4105:
4098:
4097:
4086:
4083:
4080:
4074:
4069:
4064:
4059:
4053:
4052:
4049:
4046:
4043:
4040:
4035:
4030:
4025:
4021:
4020:
4017:
4014:
4011:
4004:
3999:
3996:
3991:
3984:
3983:
3980:
3977:
3974:
3968:
3963:
3958:
3953:
3949:
3948:
3941:
3938:
3935:
3929:
3924:
3921:
3916:
3912:
3911:
3908:
3905:
3903:
3897:
3892:
3887:
3882:
3876:
3875:
3868:
3865:
3863:
3860:
3855:
3850:
3845:
3841:
3840:
3837:
3834:
3832:
3826:
3821:
3812:
3807:
3803:
3802:
3799:
3796:
3793:
3790:
3785:
3780:
3775:
3769:
3768:
3761:
3758:
3756:
3753:
3748:
3743:
3738:
3734:
3733:
3726:
3723:
3720:
3714:
3709:
3706:
3701:
3695:
3694:
3691:
3688:
3685:
3678:
3673:
3668:
3663:
3657:
3656:
3653:
3650:
3647:
3646:billion words
3641:
3636:
3631:
3626:
3620:
3619:
3616:
3613:
3611:
3608:
3605:
3602:
3597:
3591:
3590:
3583:
3580:
3575:
3569:
3564:
3559:
3554:
3548:
3547:
3540:
3537:
3534:
3532:
3527:
3522:
3517:
3511:
3510:
3507:
3504:
3501:
3498:
3495:
3492:
3489:
3473:
3470:
3464:
3463:Political bias
3461:
3451:
3448:
3439:Main article:
3436:
3433:
3404:
3401:
3389:
3386:
3370:
3367:
3328:
3325:
3285:
3282:
3253:
3245:
3242:
3237:
3233:
3229:
3213:Claude Shannon
3200:
3197:
3172:
3152:
3132:
3112:
3092:
3072:
3052:
3049:
3044:
3034:
3029:
3019:
3016:
3013:
3010:
3007:
3002:
2997:
2994:
2991:
2987:
2981:
2978:
2973:
2970:
2967:
2959:
2956:
2953:
2936:
2933:
2931:
2928:
2839:
2836:
2821:Karel programs
2800:
2799:Interpretation
2797:
2795:
2794:
2782:
2779:
2776:
2773:
2770:
2767:
2764:
2744:
2736:
2733:
2725:
2722:
2711:
2699:
2696:
2693:
2690:
2687:
2684:
2681:
2661:
2658:
2650:
2647:
2644:
2641:
2638:
2630:
2627:
2616:
2604:
2601:
2598:
2595:
2592:
2589:
2586:
2566:
2558:
2555:
2547:
2544:
2532:
2519:
2499:
2477:
2476:
2465:
2459:
2417:
2414:
2413:
2412:
2400:
2397:
2392:
2388:
2384:
2381:
2378:
2375:
2372:
2369:
2366:
2363:
2360:
2357:
2354:
2351:
2348:
2345:
2342:
2339:
2328:
2315:
2312:
2307:
2303:
2287:
2286:
2269:
2258:
2245:
2234:
2221:
2210:
2193:
2167:
2160:
2156:
2152:
2145:
2141:
2137:
2132:
2125:
2121:
2117:
2112:
2109:
2106:
2105:
2102:
2099:
2094:
2090:
2086:
2083:
2080:
2079:
2077:
2049:
2048:
2045:
2032:
2020:
2007:
1991:
1978:
1958:Main article:
1955:
1952:
1950:
1947:
1907:
1904:
1901:
1898:
1895:
1892:
1889:
1869:
1849:
1829:
1802:proprioception
1787:
1784:
1765:Post-training
1757:
1754:
1700:
1697:
1668:
1665:
1642:
1639:
1634:
1633:Infrastructure
1631:
1623:regularization
1618:
1617:
1607:
1542:
1539:
1527:Main article:
1524:
1521:
1508:
1505:
1493:Main article:
1490:
1487:
1479:
1476:
1465:Synthetic data
1463:Main article:
1460:
1459:Synthetic data
1457:
1446:Data cleansing
1444:Main article:
1441:
1438:
1421:
1418:
1387:Main article:
1384:
1381:
1363:
1362:
1359:
1356:
1353:
1350:
1347:
1344:
1341:
1338:
1335:
1332:
1329:
1326:
1276:
1273:
1265:
1262:
1243:Apache License
1113:
1110:
966:
965:
963:
962:
955:
948:
940:
937:
936:
933:
932:
927:
926:
925:
915:
909:
906:
905:
902:
901:
898:
897:
892:
887:
882:
877:
872:
867:
861:
858:
857:
854:
853:
850:
849:
844:
839:
834:
832:Occam learning
829:
824:
819:
814:
808:
805:
804:
801:
800:
797:
796:
791:
789:Learning curve
786:
781:
775:
772:
771:
768:
767:
764:
763:
758:
753:
748:
742:
739:
738:
735:
734:
731:
730:
729:
728:
718:
713:
708:
702:
697:
696:
693:
692:
689:
688:
682:
677:
672:
667:
666:
665:
655:
650:
649:
648:
643:
638:
633:
623:
618:
613:
608:
607:
606:
596:
595:
594:
589:
584:
579:
569:
564:
559:
553:
548:
547:
544:
543:
540:
539:
534:
529:
521:
515:
510:
509:
506:
505:
502:
501:
500:
499:
494:
489:
478:
473:
472:
469:
468:
465:
464:
459:
454:
449:
444:
439:
434:
429:
424:
418:
413:
412:
409:
408:
405:
404:
399:
394:
388:
383:
378:
370:
365:
360:
354:
349:
348:
345:
344:
341:
340:
335:
330:
325:
320:
315:
310:
305:
297:
296:
295:
290:
285:
275:
273:Decision trees
270:
264:
250:classification
240:
239:
238:
235:
234:
231:
230:
225:
220:
215:
210:
205:
200:
195:
190:
185:
180:
175:
170:
165:
160:
155:
150:
145:
143:Classification
139:
136:
135:
132:
131:
128:
127:
122:
117:
112:
107:
102:
100:Batch learning
97:
92:
87:
82:
77:
72:
67:
61:
58:
57:
54:
53:
42:
41:
26:
9:
6:
4:
3:
2:
12897:
12886:
12883:
12881:
12880:Deep learning
12878:
12876:
12873:
12872:
12870:
12855:
12852:
12850:
12847:
12845:
12844:Hallucination
12842:
12840:
12837:
12836:
12834:
12830:
12824:
12821:
12819:
12816:
12814:
12811:
12808:
12804:
12801:
12799:
12796:
12795:
12793:
12791:
12785:
12779:
12778:Spell checker
12776:
12774:
12771:
12769:
12766:
12764:
12761:
12759:
12756:
12754:
12751:
12750:
12748:
12746:
12740:
12734:
12731:
12729:
12726:
12724:
12721:
12720:
12718:
12716:
12712:
12706:
12703:
12701:
12698:
12696:
12693:
12691:
12688:
12686:
12683:
12682:
12680:
12678:
12672:
12662:
12659:
12657:
12654:
12652:
12649:
12647:
12644:
12642:
12639:
12637:
12634:
12632:
12629:
12627:
12624:
12623:
12621:
12617:
12611:
12608:
12606:
12603:
12601:
12598:
12596:
12593:
12591:
12590:Speech corpus
12588:
12586:
12583:
12581:
12578:
12576:
12573:
12571:
12570:Parallel text
12568:
12566:
12563:
12561:
12558:
12556:
12553:
12551:
12548:
12547:
12545:
12539:
12536:
12531:
12527:
12521:
12518:
12516:
12513:
12511:
12508:
12506:
12503:
12500:
12496:
12493:
12491:
12488:
12486:
12483:
12481:
12478:
12476:
12473:
12471:
12468:
12467:
12465:
12462:
12458:
12452:
12449:
12447:
12444:
12442:
12439:
12437:
12434:
12432:
12431:Example-based
12429:
12427:
12424:
12423:
12421:
12419:
12415:
12409:
12406:
12404:
12401:
12399:
12396:
12395:
12393:
12391:
12387:
12377:
12374:
12372:
12369:
12367:
12364:
12362:
12361:Text chunking
12359:
12357:
12354:
12352:
12351:Lemmatisation
12349:
12347:
12344:
12343:
12341:
12339:
12335:
12329:
12326:
12324:
12321:
12319:
12316:
12314:
12311:
12309:
12306:
12304:
12301:
12300:
12297:
12294:
12292:
12289:
12287:
12284:
12282:
12279:
12277:
12274:
12272:
12269:
12265:
12262:
12260:
12257:
12256:
12255:
12252:
12250:
12247:
12245:
12242:
12240:
12237:
12235:
12232:
12230:
12227:
12225:
12222:
12220:
12217:
12215:
12212:
12210:
12207:
12206:
12204:
12202:
12201:Text analysis
12198:
12192:
12189:
12187:
12184:
12182:
12179:
12177:
12174:
12170:
12167:
12165:
12162:
12161:
12160:
12157:
12155:
12152:
12150:
12147:
12146:
12144:
12142:General terms
12140:
12136:
12129:
12124:
12122:
12117:
12115:
12110:
12109:
12106:
12091:
12087:
12083:
12079:
12075:
12071:
12067:
12063:
12059:
12054:
12044:
12040:
12036:
12031:
12026:
12021:
12016:
12011:
12006:
12001:
11996:
11991:
11988:
11987:
11982:
11981:Jurafsky, Dan
11979:
11978:
11958:
11954:
11950:
11944:
11938:
11933:
11918:
11914:
11910:
11904:
11889:
11885:
11881:
11875:
11860:
11856:
11855:
11850:
11844:
11829:
11825:
11821:
11815:
11800:
11796:
11792:
11786:
11771:
11767:
11763:
11757:
11742:
11738:
11734:
11728:
11720:
11714:
11707:
11694:
11690:
11686:
11680:
11664:
11660:
11656:
11649:
11647:
11631:
11627:
11623:
11616:
11600:
11596:
11592:
11586:
11570:
11566:
11562:
11555:
11539:
11535:
11531:
11525:
11510:
11506:
11500:
11486:
11482:
11481:
11474:
11458:
11454:
11453:anthropic.com
11450:
11444:
11429:
11425:
11421:
11415:
11400:
11396:
11392:
11385:
11369:
11365:
11364:anthropic.com
11361:
11355:
11340:
11336:
11332:
11326:
11311:
11307:
11303:
11297:
11295:
11278:
11274:
11270:
11264:
11248:
11244:
11243:
11238:
11231:
11229:
11213:
11209:
11205:
11198:
11183:
11179:
11175:
11168:
11159:
11154:
11146:
11137:
11132:
11124:
11115:
11110:
11102:
11096:, 31 May 2023
11095:
11091:
11088:
11083:
11068:
11064:
11058:
11049:
11044:
11036:
11021:
11017:
11013:
11007:
10991:
10987:
10983:
10976:
10961:
10957:
10953:
10946:
10927:
10923:
10922:
10914:
10908:
10893:
10889:
10885:
10879:
10877:
10875:
10858:
10854:
10850:
10844:
10842:
10840:
10823:
10819:
10815:
10809:
10800:
10795:
10788:
10772:
10768:
10764:
10758:
10749:
10744:
10736:
10721:
10717:
10713:
10707:
10691:
10687:
10683:
10679:
10675:
10671:
10667:
10663:
10659:
10655:
10651:
10647:
10640:
10624:
10620:
10614:
10605:
10600:
10592:
10590:
10575:
10571:
10570:
10562:
10560:
10550:
10545:
10537:
10522:
10518:
10514:
10507:
10492:
10488:
10484:
10477:
10461:
10457:
10456:Deepmind Blog
10453:
10446:
10444:
10442:
10426:
10422:
10421:
10413:
10404:
10399:
10391:
10376:
10372:
10368:
10361:
10359:
10352:
10351:
10347:
10344:
10337:
10335:
10333:
10331:
10321:
10316:
10309:
10307:
10305:
10288:
10284:
10280:
10274:
10265:
10260:
10253:
10244:
10239:
10232:
10230:
10213:
10209:
10205:
10199:
10190:
10185:
10177:
10168:
10163:
10155:
10153:
10136:
10132:
10128:
10121:
10112:
10107:
10099:
10097:
10095:
10093:
10078:on 2023-03-09
10077:
10073:
10069:
10063:
10047:
10043:
10039:
10032:
10030:
10020:
10015:
10007:
10005:
10003:
9986:
9982:
9976:
9961:
9957:
9953:
9947:
9938:
9933:
9927:Table D.1 in
9924:
9908:
9904:
9900:
9894:
9879:
9875:
9871:
9865:
9863:
9846:
9842:
9838:
9832:
9830:
9814:
9810:
9806:
9800:
9785:
9781:
9777:
9771:
9762:
9757:
9750:
9734:
9730:
9726:
9720:
9705:
9701:
9697:
9691:
9676:
9672:
9668:
9662:
9648:
9644:
9643:
9636:
9628:
9624:
9619:
9614:
9611:(140): 1–67.
9610:
9606:
9602:
9595:
9593:
9583:
9578:
9571:
9562:
9557:
9550:
9534:
9530:
9524:
9509:
9505:
9501:
9494:
9485:
9480:
9473:
9471:
9454:
9450:
9446:
9440:
9425:
9421:
9417:
9411:
9397:
9393:
9386:
9378:
9372:
9368:
9364:
9360:
9356:
9349:
9341:
9336:
9332:
9325:
9316:
9311:
9304:
9289:
9285:
9281:
9274:
9272:
9252:
9245:
9238:
9229:
9224:
9217:
9208:
9203:
9196:
9188:
9184:
9177:
9161:
9157:
9153:
9149:
9145:
9139:
9124:
9120:
9113:
9106:
9101:
9082:
9078:
9074:
9070:
9066:
9062:
9058:
9051:
9044:
9028:
9024:
9023:Goldman Sachs
9020:
9014:
8998:
8994:
8993:The Economist
8990:
8984:
8976:
8972:
8968:
8964:
8960:
8956:
8952:
8948:
8941:
8932:
8927:
8920:
8918:
8908:
8903:
8896:
8887:
8882:
8875:
8860:
8856:
8852:
8846:
8832:
8828:
8827:
8820:
8806:
8805:
8798:
8789:
8784:
8776:
8774:
8772:
8762:
8757:
8750:
8748:
8732:
8728:
8721:
8719:
8710:
8704:
8700:
8693:
8685:
8679:
8675:
8668:
8660:
8654:
8650:
8643:
8634:
8629:
8622:
8606:
8602:
8598:
8594:
8590:
8585:
8580:
8576:
8572:
8568:
8561:
8554:
8538:
8534:
8533:Time Magazine
8530:
8524:
8508:
8504:
8501:
8494:
8478:
8474:
8470:
8463:
8461:
8444:
8440:
8436:
8430:
8421:
8416:
8408:
8406:
8397:
8393:
8386:
8378:
8374:
8369:
8364:
8360:
8356:
8352:
8348:
8343:
8338:
8334:
8330:
8326:
8319:
8317:
8315:
8313:
8311:
8301:
8296:
8289:
8280:
8275:
8268:
8253:
8249:
8243:
8234:
8229:
8221:
8212:
8207:
8200:
8184:
8180:
8176:
8169:
8162:
8161:
8157:
8154:
8148:
8133:
8129:
8125:
8118:
8103:
8099:
8095:
8089:
8074:
8070:
8066:
8062:
8058:
8054:
8050:
8043:
8034:
8029:
8022:
8013:
8008:
8001:
7987:
7983:
7977:
7968:
7963:
7956:
7954:
7944:
7939:
7931:
7916:
7912:
7905:
7891:
7890:
7882:
7871:
7864:
7855:
7850:
7843:
7834:
7829:
7822:
7813:
7808:
7801:
7792:
7787:
7779:
7764:
7759:
7754:
7750:
7746:
7742:
7734:
7725:
7720:
7713:
7698:
7695:: 2425–2433.
7694:
7690:
7683:
7668:
7664:
7660:
7656:
7649:
7634:
7630:
7626:
7619:
7610:
7605:
7601:
7600:Holtzman, Ari
7594:
7579:
7575:
7571:
7564:
7555:
7550:
7542:
7533:
7528:
7521:
7512:
7507:
7500:
7485:
7481:
7477:
7470:
7456:
7452:
7445:
7436:
7431:
7424:
7409:
7405:
7401:
7395:
7393:
7383:
7378:
7371:
7362:
7357:
7350:
7341:
7336:
7329:
7320:
7315:
7307:
7298:
7293:
7286:
7277:
7272:
7265:
7256:
7251:
7247:
7243:
7236:
7221:
7216:
7211:
7207:
7203:
7199:
7191:
7182:
7177:
7170:
7161:
7156:
7148:
7139:
7134:
7126:
7111:
7107:
7103:
7097:
7088:
7083:
7076:
7067:
7062:
7054:
7052:
7043:
7038:
7034:
7027:
7018:
7013:
7005:
6996:
6991:
6984:
6968:
6964:
6960:
6953:
6951:
6936:
6932:
6925:
6910:
6906:
6902:
6896:
6881:
6877:
6873:
6867:
6848:
6841:
6840:
6832:
6830:
6828:
6819:
6815:
6811:
6809:9781450376976
6805:
6801:
6797:
6792:
6787:
6783:
6779:
6772:
6756:
6752:
6748:
6747:"Rate limits"
6742:
6726:
6722:
6716:
6700:
6696:
6692:
6686:
6672:
6665:
6650:
6646:
6639:
6623:
6619:
6615:
6611:
6607:
6599:
6597:
6595:
6579:
6575:
6571:
6564:
6562:
6560:
6558:
6548:
6543:
6535:
6526:
6521:
6513:
6504:
6499:
6492:
6483:
6478:
6470:
6461:
6456:
6448:
6439:
6434:
6427:
6418:
6413:
6406:
6398:
6393:
6389:
6382:
6374:
6370:
6366:
6359:
6352:
6343:
6338:
6330:
6316:
6312:
6305:
6296:
6291:
6284:
6268:
6264:
6262:9783031231902
6258:
6254:
6250:
6246:
6242:
6235:
6233:
6217:
6213:
6209:
6203:
6192:September 16,
6187:
6182:
6177:
6173:
6169:
6162:
6155:
6145:on 2023-08-17
6144:
6140:
6136:
6129:
6115:
6110:
6106:
6105:
6097:
6089:
6084:
6080:
6073:
6058:
6054:
6050:
6043:
6034:
6029:
6022:
6006:
6002:
5998:
5992:
5976:
5972:
5966:
5950:
5946:
5942:
5935:
5919:
5915:
5911:
5905:
5889:
5885:
5884:
5879:
5872:
5857:
5853:
5849:
5845:
5841:
5836:
5831:
5827:
5823:
5819:
5812:
5803:
5798:
5791:
5773:
5769:
5765:
5758:
5754:
5750:
5744:
5736:
5732:
5728:
5724:
5720:
5716:
5712:
5705:
5690:
5686:
5682:
5677:
5672:
5668:
5664:
5660:
5653:
5645:
5641:
5637:
5633:
5626:
5618:
5614:
5610:
5606:
5602:
5598:
5594:
5587:
5580:
5576:
5571:
5566:
5562:
5555:
5540:
5536:
5532:
5527:
5522:
5518:
5514:
5510:
5506:
5500:
5498:
5486:
5485:
5477:
5459:
5455:
5451:
5444:
5436:
5434:
5432:
5416:
5412:
5408:
5402:
5398:
5381:
5372:
5363:
5354:
5345:
5336:
5332:
5322:
5319:
5318:
5307:
5303:
5300:
5297:
5295:15.6T tokens
5294:
5291:
5288:
5285:
5282:
5281:
5277:
5274:
5271:
5268:
5263:
5261:
5258:
5255:
5252:
5251:
5247:
5245:
5243:
5240:
5237:
5235:
5234:Alibaba Cloud
5232:
5227:
5224:
5223:
5220:
5217:
5214:
5211:
5208:
5206:
5203:
5198:
5196:
5193:
5192:
5188:
5185:
5183:
5180:
5177:
5174:
5169:
5167:
5164:
5163:
5159:
5155:
5153:
5151:
5148:
5143:
5140:
5136:
5133:
5130:
5127:
5126:
5122:
5119:
5117:
5114:
5109:
5107:
5103:
5100:
5097:
5095:
5092:
5091:
5087:
5084:
5081:
5078:
5075:
5072:
5069:
5067:
5064:
5063:
5060:
5057:
5054:
5051:
5048:
5046:
5043:
5040:February 2024
5038:
5035:
5034:
5030:
5026:
5023:
5020:
5017:
5014:
5012:
5009:
5006:February 2024
5004:
5002:
4999:
4998:
4994:
4991:
4988:
4985:
4982:
4979:
4976:December 2023
4974:
4972:
4969:
4968:
4965:
4962:
4959:
4956:
4953:
4951:
4948:
4943:
4940:
4939:
4935:
4931:
4928:
4925:
4922:
4919:
4917:
4914:
4911:December 2023
4909:
4907:Mixtral 8x7B
4906:
4905:
4901:
4897:
4894:
4891:
4888:
4885:
4883:
4880:
4877:December 2023
4875:
4873:
4870:
4869:
4865:
4861:
4858:
4855:
4852:
4849:
4847:
4844:
4841:November 2023
4839:
4836:
4835:
4831:
4828:
4825:
4822:
4819:
4816:
4813:November 2023
4811:
4809:
4806:
4805:
4802:
4799:
4797:
4794:
4789:
4787:
4784:
4779:
4776:
4775:
4771:
4767:
4764:
4761:
4758:
4755:
4753:
4750:
4745:
4743:
4740:
4739:
4735:
4732:
4729:
4726:
4723:
4720:
4715:
4713:
4710:
4709:
4705:
4702:
4699:
4693:
4688:
4685:
4680:
4677:
4676:
4672:
4668:
4665:
4662:
4656:
4651:
4648:
4643:
4640:
4637:
4636:
4633:Multilingual
4632:
4629:
4627:
4624:
4621:
4619:
4616:
4611:
4608:
4607:
4603:
4600:
4598:
4595:
4590:
4588:
4585:
4580:
4578:OpenAssistant
4577:
4576:
4573:
4570:
4568:
4565:
4560:
4558:
4555:
4550:
4548:
4545:
4544:
4540:
4537:
4535:
4532:
4527:
4525:
4522:
4517:
4514:
4513:
4510:
4507:
4504:
4501:
4496:
4494:
4491:
4486:
4483:
4482:
4478:
4475:Trained with
4474:
4471:
4468:
4466:
4461:
4459:
4456:
4451:
4449:Cerebras-GPT
4448:
4447:
4443:
4439:
4436:
4433:
4430:
4427:
4424:
4419:
4417:
4414:
4413:
4409:
4405:
4402:
4399:
4394:
4389:
4387:
4384:
4381:February 2023
4379:
4376:
4373:
4372:
4368:
4364:
4361:
4359:
4356:
4353:
4350:
4347:December 2022
4345:
4343:
4340:
4339:
4335:
4332:
4330:
4325:
4320:
4318:
4315:
4312:November 2022
4310:
4307:
4306:
4302:
4300:CC-BY-NC-4.0
4299:
4296:
4290:
4285:
4283:
4280:
4277:November 2022
4275:
4272:
4271:
4267:
4264:
4262:
4256:
4251:
4249:
4245:
4240:
4238:
4235:
4234:
4230:
4227:
4225:
4222:
4217:
4214:
4209:
4206:
4205:
4201:
4198:
4196:
4193:
4188:
4186:
4183:
4178:
4175:
4174:
4170:
4167:
4164:
4158:
4153:
4151:
4148:
4143:
4140:
4139:
4135:
4131:
4128:
4125:
4119:
4114:
4111:
4106:
4103:
4100:
4099:
4095:
4091:
4087:
4084:
4081:
4075:
4070:
4068:
4065:
4060:
4058:
4055:
4054:
4050:
4047:
4044:
4041:
4036:
4034:
4031:
4028:February 2022
4026:
4023:
4022:
4018:
4015:
4012:
4006:1.56T words,
4005:
4000:
3997:
3992:
3989:
3986:
3985:
3981:
3978:
3975:
3969:
3964:
3962:
3959:
3956:December 2021
3954:
3951:
3950:
3946:
3942:
3939:
3936:
3930:
3925:
3922:
3919:December 2021
3917:
3914:
3913:
3909:
3906:
3904:
3898:
3893:
3891:
3888:
3885:December 2021
3883:
3881:
3878:
3877:
3873:
3869:
3866:
3864:
3861:
3856:
3854:
3851:
3848:December 2021
3846:
3843:
3842:
3838:
3835:
3833:
3829:338.6 billion
3827:
3822:
3820:
3816:
3813:
3808:
3805:
3804:
3800:
3797:
3794:
3791:
3786:
3784:
3781:
3776:
3774:
3771:
3770:
3766:
3763:The first of
3762:
3759:
3757:
3754:
3749:
3747:
3744:
3739:
3736:
3735:
3731:
3727:
3724:
3721:
3715:
3710:
3707:
3702:
3700:
3697:
3696:
3692:
3689:
3686:
3679:
3674:
3672:
3669:
3666:February 2019
3664:
3662:
3659:
3658:
3654:
3651:
3648:
3642:
3637:
3635:
3632:
3627:
3625:
3622:
3621:
3617:
3614:
3612:
3609:
3606:
3603:
3598:
3596:
3593:
3592:
3588:
3584:
3581:
3576:
3570:
3565:
3563:
3560:
3555:
3553:
3550:
3549:
3545:
3541:
3538:
3535:
3533:
3528:
3526:
3523:
3518:
3516:
3513:
3512:
3508:
3505:
3502:
3499:
3496:
3493:
3490:
3487:
3486:
3483:
3479:
3469:
3460:
3456:
3447:
3442:
3432:
3428:
3424:
3422:
3418:
3414:
3409:
3400:
3395:
3385:
3383:
3382:Goldman Sachs
3378:
3377:
3366:
3364:
3359:
3347:
3343:
3341:
3340:
3333:
3324:
3322:
3318:
3312:
3310:
3305:
3301:
3297:
3295:
3291:
3281:
3278:
3277:cross-entropy
3273:
3269:
3265:
3240:
3235:
3231:
3227:
3214:
3210:
3206:
3196:
3193:
3189:
3184:
3170:
3150:
3130:
3110:
3090:
3070:
3042:
3032:
3027:
3008:
3005:
3000:
2995:
2992:
2989:
2985:
2979:
2976:
2971:
2968:
2954:
2951:
2942:
2927:
2925:
2921:
2917:
2913:
2912:
2906:
2905:The NTL Model
2902:
2898:
2897:George Lakoff
2894:
2890:
2884:
2882:
2878:
2877:hallucination
2874:
2873:training data
2870:
2866:
2861:
2859:
2855:
2851:
2846:
2835:
2833:
2829:
2824:
2822:
2817:
2813:
2808:
2806:
2777:
2774:
2771:
2768:
2765:
2728:average
2723:
2720:
2712:
2694:
2691:
2688:
2685:
2682:
2653:correct token
2639:
2636:
2633:average
2628:
2625:
2617:
2599:
2596:
2593:
2590:
2587:
2561:correct token
2550:average
2545:
2542:
2534:
2533:
2531:
2517:
2497:
2488:
2486:
2482:
2474:
2470:
2466:
2463:
2460:
2457:
2453:
2449:
2448:
2447:
2445:
2440:
2438:
2429:
2424:
2420:
2398:
2395:
2390:
2386:
2382:
2379:
2376:
2373:
2370:
2367:
2364:
2361:
2358:
2355:
2352:
2349:
2346:
2343:
2340:
2337:
2329:
2313:
2310:
2305:
2301:
2292:
2291:
2290:
2284:
2267:
2259:
2243:
2235:
2219:
2211:
2208:
2191:
2183:
2182:
2181:
2158:
2154:
2150:
2143:
2139:
2135:
2130:
2123:
2119:
2115:
2110:
2107:
2100:
2097:
2092:
2088:
2084:
2081:
2075:
2065:
2064:learning rate
2062:
2058:
2054:
2046:
2030:
2021:
2005:
1996:
1992:
1976:
1967:
1966:
1965:
1961:
1946:
1944:
1940:
1936:
1932:
1930:
1926:
1920:
1899:
1893:
1887:
1867:
1847:
1827:
1817:
1815:
1811:
1807:
1803:
1799:
1793:
1786:Multimodality
1783:
1780:
1778:
1774:
1769:
1768:
1762:
1753:
1750:
1748:
1744:
1740:
1734:
1732:
1728:
1724:
1720:
1717:
1713:
1712:ReAct pattern
1708:
1706:
1696:
1694:
1690:
1686:
1681:
1679:
1673:
1664:
1662:
1657:
1655:
1647:
1641:Training cost
1638:
1630:
1628:
1624:
1612:
1608:
1605:
1601:
1600:
1599:
1596:
1592:
1590:
1585:
1583:
1578:
1575:
1564:
1560:
1558:
1552:
1548:
1538:
1536:
1530:
1520:
1518:
1514:
1504:
1502:
1496:
1485:
1475:
1473:
1466:
1456:
1452:
1447:
1437:
1434:
1432:
1428:
1427:Shan language
1417:
1415:
1411:
1407:
1403:
1401:
1396:
1390:
1380:
1378:
1374:
1370:
1360:
1357:
1354:
1351:
1348:
1345:
1342:
1339:
1336:
1333:
1330:
1327:
1324:
1323:
1320:
1311:
1306:
1299:
1295:
1291:
1287:
1282:
1271:
1261:
1259:
1255:
1252:variants and
1251:
1246:
1244:
1240:
1236:
1232:
1228:
1223:
1220:
1218:
1214:
1210:
1206:
1202:
1198:
1194:
1190:
1186:
1181:
1179:
1175:
1171:
1167:
1163:
1159:
1150:
1146:
1144:
1139:
1136:
1127:
1118:
1109:
1107:
1103:
1099:
1095:
1091:
1087:
1083:
1079:
1075:
1071:
1067:
1063:
1059:
1055:
1051:
1047:
1043:
1039:
1034:
1032:
1028:
1024:
1020:
1016:
1012:
1008:
1003:
1001:
997:
993:
989:
985:
981:
977:
973:
961:
956:
954:
949:
947:
942:
941:
939:
938:
931:
928:
924:
921:
920:
919:
916:
914:
911:
910:
904:
903:
896:
893:
891:
888:
886:
883:
881:
878:
876:
873:
871:
868:
866:
863:
862:
856:
855:
848:
845:
843:
840:
838:
835:
833:
830:
828:
825:
823:
820:
818:
815:
813:
810:
809:
803:
802:
795:
792:
790:
787:
785:
782:
780:
777:
776:
770:
769:
762:
759:
757:
754:
752:
751:Crowdsourcing
749:
747:
744:
743:
737:
736:
727:
724:
723:
722:
719:
717:
714:
712:
709:
707:
704:
703:
700:
695:
694:
686:
683:
681:
680:Memtransistor
678:
676:
673:
671:
668:
664:
661:
660:
659:
656:
654:
651:
647:
644:
642:
639:
637:
634:
632:
629:
628:
627:
624:
622:
619:
617:
614:
612:
609:
605:
602:
601:
600:
597:
593:
590:
588:
585:
583:
580:
578:
575:
574:
573:
570:
568:
565:
563:
562:Deep learning
560:
558:
555:
554:
551:
546:
545:
538:
535:
533:
530:
528:
526:
522:
520:
517:
516:
513:
508:
507:
498:
497:Hidden Markov
495:
493:
490:
488:
485:
484:
483:
480:
479:
476:
471:
470:
463:
460:
458:
455:
453:
450:
448:
445:
443:
440:
438:
435:
433:
430:
428:
425:
423:
420:
419:
416:
411:
410:
403:
400:
398:
395:
393:
389:
387:
384:
382:
379:
377:
375:
371:
369:
366:
364:
361:
359:
356:
355:
352:
347:
346:
339:
336:
334:
331:
329:
326:
324:
321:
319:
316:
314:
311:
309:
306:
304:
302:
298:
294:
293:Random forest
291:
289:
286:
284:
281:
280:
279:
276:
274:
271:
269:
266:
265:
258:
257:
252:
251:
243:
237:
236:
229:
226:
224:
221:
219:
216:
214:
211:
209:
206:
204:
201:
199:
196:
194:
191:
189:
186:
184:
181:
179:
178:Data cleaning
176:
174:
171:
169:
166:
164:
161:
159:
156:
154:
151:
149:
146:
144:
141:
140:
134:
133:
126:
123:
121:
118:
116:
113:
111:
108:
106:
103:
101:
98:
96:
93:
91:
90:Meta-learning
88:
86:
83:
81:
78:
76:
73:
71:
68:
66:
63:
62:
56:
55:
52:
47:
44:
43:
39:
38:
33:
19:
12758:Concordancer
12498:
12154:Bag-of-words
12093:. Retrieved
12065:
12061:
12046:. Retrieved
12042:
11985:
11961:. Retrieved
11952:
11943:
11932:
11921:. Retrieved
11912:
11903:
11892:. Retrieved
11883:
11874:
11863:. Retrieved
11852:
11843:
11832:. Retrieved
11823:
11814:
11803:. Retrieved
11794:
11785:
11774:. Retrieved
11765:
11756:
11745:. Retrieved
11736:
11727:
11713:
11704:
11697:. Retrieved
11688:
11679:
11667:. Retrieved
11658:
11634:. Retrieved
11625:
11615:
11603:. Retrieved
11594:
11585:
11573:. Retrieved
11564:
11554:
11542:. Retrieved
11533:
11524:
11512:. Retrieved
11508:
11499:
11489:, retrieved
11479:
11473:
11461:. Retrieved
11452:
11443:
11432:. Retrieved
11423:
11414:
11403:. Retrieved
11394:
11384:
11372:. Retrieved
11363:
11354:
11343:. Retrieved
11334:
11325:
11314:. Retrieved
11305:
11281:. Retrieved
11272:
11263:
11251:. Retrieved
11240:
11216:. Retrieved
11207:
11197:
11186:. Retrieved
11177:
11167:
11145:
11123:
11101:
11082:
11071:. Retrieved
11069:. 2023-06-09
11066:
11057:
11035:
11024:. Retrieved
11015:
11006:
10994:. Retrieved
10985:
10975:
10964:. Retrieved
10955:
10945:
10933:. Retrieved
10919:
10907:
10896:. Retrieved
10887:
10861:. Retrieved
10852:
10826:. Retrieved
10817:
10808:
10787:
10775:. Retrieved
10766:
10757:
10735:
10724:. Retrieved
10715:
10706:
10694:. Retrieved
10653:
10649:
10639:
10627:. Retrieved
10622:
10613:
10578:, retrieved
10568:
10536:
10525:. Retrieved
10516:
10506:
10495:. Retrieved
10486:
10476:
10464:. Retrieved
10455:
10429:. Retrieved
10419:
10412:
10390:
10379:. Retrieved
10370:
10341:
10291:. Retrieved
10282:
10273:
10252:
10216:. Retrieved
10207:
10198:
10176:
10139:. Retrieved
10130:
10120:
10080:. Retrieved
10076:the original
10071:
10062:
10050:. Retrieved
10041:
9989:. Retrieved
9975:
9964:. Retrieved
9955:
9946:
9937:2005.14165v4
9923:
9911:. Retrieved
9902:
9893:
9882:. Retrieved
9873:
9849:. Retrieved
9840:
9817:. Retrieved
9808:
9799:
9788:. Retrieved
9779:
9770:
9749:
9737:. Retrieved
9728:
9719:
9708:. Retrieved
9699:
9690:
9679:. Retrieved
9670:
9661:
9651:, retrieved
9641:
9635:
9608:
9604:
9582:1810.04805v2
9570:
9549:
9537:. Retrieved
9523:
9512:. Retrieved
9503:
9493:
9484:1810.04805v2
9457:. Retrieved
9448:
9439:
9428:. Retrieved
9419:
9410:
9399:. Retrieved
9395:
9385:
9358:
9348:
9330:
9324:
9315:2303.16281v2
9303:
9292:. Retrieved
9283:
9258:. Retrieved
9237:
9216:
9195:
9176:
9164:. Retrieved
9147:
9138:
9126:. Retrieved
9122:
9112:
9107:, p. 8.
9100:
9088:. Retrieved
9060:
9056:
9043:
9031:. Retrieved
9022:
9013:
9001:. Retrieved
8992:
8983:
8950:
8946:
8940:
8895:
8874:
8863:. Retrieved
8854:
8845:
8835:, retrieved
8826:openai/evals
8825:
8819:
8809:, retrieved
8803:
8797:
8734:. Retrieved
8731:The Gradient
8730:
8698:
8692:
8673:
8667:
8648:
8642:
8621:
8609:. Retrieved
8570:
8566:
8553:
8541:. Retrieved
8532:
8523:
8511:. Retrieved
8502:
8493:
8481:. Retrieved
8472:
8447:. Retrieved
8438:
8429:
8395:
8385:
8332:
8328:
8288:
8267:
8256:. Retrieved
8254:. 2023-01-21
8252:The Gradient
8251:
8242:
8220:
8199:
8187:. Retrieved
8178:
8168:
8151:
8147:
8136:. Retrieved
8127:
8117:
8106:. Retrieved
8097:
8088:
8077:. Retrieved
8052:
8042:
8021:
8000:
7989:. Retrieved
7985:
7976:
7930:
7920:14 September
7918:. Retrieved
7914:
7904:
7894:, retrieved
7888:
7881:
7863:
7842:
7821:
7800:
7778:
7767:. Retrieved
7748:
7744:
7733:
7712:
7701:. Retrieved
7692:
7682:
7671:. Retrieved
7662:
7658:
7648:
7637:. Retrieved
7628:
7618:
7593:
7582:. Retrieved
7578:the original
7573:
7563:
7541:
7520:
7499:
7488:. Retrieved
7479:
7469:
7458:. Retrieved
7454:
7444:
7423:
7412:. Retrieved
7403:
7370:
7349:
7328:
7306:
7285:
7264:
7245:
7235:
7224:. Retrieved
7205:
7201:
7190:
7169:
7147:
7125:
7114:. Retrieved
7105:
7096:
7075:
7032:
7026:
7004:
6983:
6971:. Retrieved
6962:
6938:. Retrieved
6934:
6924:
6913:. Retrieved
6904:
6895:
6884:. Retrieved
6875:
6866:
6854:. Retrieved
6838:
6781:
6771:
6759:. Retrieved
6750:
6741:
6729:. Retrieved
6715:
6703:. Retrieved
6694:
6685:
6674:. Retrieved
6664:
6653:. Retrieved
6638:
6626:. Retrieved
6609:
6582:. Retrieved
6573:
6534:
6512:
6491:
6469:
6447:
6426:
6405:
6387:
6381:
6364:
6351:
6329:
6318:. Retrieved
6314:
6304:
6283:
6271:. Retrieved
6244:
6220:. Retrieved
6216:the original
6211:
6208:"OpenAI API"
6202:
6190:. Retrieved
6171:
6161:
6153:
6147:. Retrieved
6143:the original
6138:
6128:
6118:, retrieved
6103:
6096:
6078:
6072:
6061:. Retrieved
6052:
6042:
6021:
6009:. Retrieved
6000:
5991:
5979:. Retrieved
5974:
5965:
5953:. Retrieved
5934:
5922:. Retrieved
5904:
5892:. Retrieved
5883:The Guardian
5881:
5871:
5860:. Retrieved
5825:
5821:
5811:
5790:
5779:. Retrieved
5767:
5763:
5743:
5718:
5714:
5704:
5693:. Retrieved
5666:
5662:
5652:
5635:
5625:
5600:
5596:
5586:
5560:
5554:
5543:. Retrieved
5516:
5512:
5483:
5476:
5465:. Retrieved
5453:
5449:
5419:. Retrieved
5410:
5401:
5380:
5371:
5362:
5353:
5344:
5335:
5181:4.8T Tokens
5149:380B Tokens
5085:Proprietary
5024:Proprietary
4986:1.4T tokens
4895:Proprietary
4829:Proprietary
4765:Proprietary
4733:Proprietary
4671:Bard chatbot
4669:Was used in
4666:Proprietary
4658:3.6 trillion
4630:Proprietary
4571:Proprietary
4538:Proprietary
4515:BloombergGPT
4437:proprietary
4396:1.4 trillion
4333:proprietary
4327:1.3 trillion
4248:Hugging Face
4228:Proprietary
4129:Proprietary
4085:Proprietary
4077:1.4 trillion
4016:Proprietary
3994:January 2022
3979:Proprietary
3940:Proprietary
3932:1.6 trillion
3867:Proprietary
3810:October 2021
3725:proprietary
3600:October 2019
3587:Encoder-only
3557:October 2018
3500:Corpus size
3491:Release date
3481:
3466:
3457:
3453:
3450:Stereotyping
3444:
3429:
3425:
3410:
3406:
3397:
3374:
3372:
3369:Wider impact
3361:
3349:
3344:
3337:
3334:
3330:
3320:
3316:
3313:
3306:
3302:
3298:
3287:
3274:
3270:
3266:
3202:
3185:
2938:
2916:Vyvyan Evans
2909:
2885:
2862:
2853:
2845:"understand"
2841:
2825:
2809:
2802:
2489:
2480:
2478:
2441:
2433:
2419:
2288:
2050:
1993:size of the
1963:
1954:Scaling laws
1933:
1921:
1818:
1795:
1781:
1767:quantization
1764:
1763:
1759:
1751:
1735:
1729:
1725:
1721:
1709:
1702:
1682:
1674:
1670:
1658:
1651:
1636:
1619:
1597:
1593:
1586:
1579:
1570:
1554:
1532:
1516:
1510:
1498:
1468:
1453:
1449:
1435:
1423:
1409:
1405:
1399:
1392:
1375:that is not
1366:
1337: ->
1334: texts
1312:
1278:
1275:Tokenization
1247:
1225:Since 2022,
1224:
1221:
1182:
1156:At the 2017
1155:
1140:
1132:
1104:models, and
1035:
1004:
975:
971:
969:
837:PAC learning
524:
373:
368:Hierarchical
300:
254:
248:
12715:Topic model
12595:Text corpus
12441:Statistical
12308:Text mining
12149:AI-complete
11699:16 February
11669:13 December
11605:12 December
11575:12 December
11565:VentureBeat
11544:12 December
11514:12 December
11463:12 December
11374:12 December
10956:THE DECODER
10042:VentureBeat
9063:(2): 1–18.
8736:January 14,
6761:January 20,
6731:January 20,
6705:18 February
6053:NVIDIA Blog
5981:January 20,
5955:January 20,
5924:January 20,
5828:: 842–866.
5721:(2): 8–12.
5253:Nemotron-4
5218:Apache 2.0
5128:Fugaku-LLM
5115:12T Tokens
5098:March 2024
5070:March 2024
4963:Apache 2.0
4929:Apache 2.0
4859:Apache 2.0
4800:Apache 2.0
4770:IBM Watsonx
4742:Granite 13b
4609:Jurassic-2
4601:Apache 2.0
4508:Apache 2.0
4472:Apache 2.0
4351:Independent
4292:106 billion
4258:350 billion
4160:180 billion
4121:768 billion
4048:Apache 2.0
4008:168 billion
3971:300 billion
3900:400 billion
3798:Apache 2.0
3717:300 billion
3652:Apache 2.0
3615:Apache 2.0
3582:Apache 2.0
3572:3.3 billion
2805:black boxes
2672:, then the
1925:Google PaLM
1756:Compression
1258:state space
1108:'s models.
721:Multi-agent
658:Transformer
557:Autoencoder
313:Naive Bayes
51:data mining
12869:Categories
12436:Rule-based
12318:Truecasing
12186:Stop words
12048:2024-05-05
12030:2306.13549
12015:2307.10169
12000:2303.18223
11963:2024-07-23
11923:2024-06-15
11894:2024-06-15
11865:2024-06-17
11834:2024-04-28
11805:2024-04-28
11776:2024-05-17
11747:2024-03-04
11636:2024-05-05
11626:mistral.ai
11595:mistral.ai
11491:2024-03-19
11434:2023-10-06
11405:2024-08-11
11360:"Claude 2"
11345:2024-05-28
11316:2023-07-19
11218:2023-07-24
11208:TechCrunch
11188:2023-07-24
11158:2304.07327
11136:2303.10845
11114:2303.17564
11073:2023-06-20
11048:2306.01116
11026:2023-04-03
10966:2024-07-26
10898:2023-06-20
10799:2208.01448
10748:2211.09085
10726:2023-03-13
10604:2206.14858
10580:2023-03-18
10549:2205.01068
10527:2023-03-12
10497:2023-03-09
10431:2022-12-19
10403:2201.08239
10381:2023-03-09
10320:2203.15556
10264:2212.08073
10243:2112.00861
10189:2112.12731
10167:2201.11990
10111:2304.03208
10082:2023-02-28
10019:2101.00027
9966:2023-01-13
9884:2024-07-24
9819:2023-03-13
9809:openai.com
9790:2019-11-14
9761:1906.08237
9710:2024-08-05
9681:2024-04-04
9653:2024-04-04
9618:1910.10683
9561:2209.14500
9514:2023-06-20
9430:2023-03-18
9420:openai.com
9401:2023-12-29
9340:2305.18189
9294:2023-12-29
9228:2302.05733
9207:2401.05566
9185:. SFGATE.
9090:2024-01-20
8931:1905.07830
8907:2109.07958
8886:2206.04615
8865:2024-07-24
8837:2024-05-28
8811:2024-05-28
8788:2303.18223
8761:1905.10044
8633:2307.03987
8611:15 January
8584:2202.03629
8420:2303.12712
8342:2210.13966
8300:2301.05217
8279:2305.11169
8258:2023-06-12
8233:2210.13382
8211:2304.15004
8138:2023-06-27
8108:2023-06-27
8079:2023-06-27
8033:2303.07971
8012:2304.00612
7991:2023-06-24
7967:2210.14891
7943:2203.15556
7915:TechCrunch
7896:2023-07-02
7854:2303.08774
7833:2306.02858
7812:2304.08485
7791:2303.03378
7769:2023-07-02
7758:2204.14198
7724:2301.12597
7703:2023-07-02
7673:2023-07-02
7639:2023-07-02
7609:2305.14314
7584:2024-07-31
7554:2306.03078
7532:2210.17323
7511:1802.05668
7490:2023-06-14
7460:2024-05-17
7435:2304.03442
7414:2023-06-09
7382:2306.01711
7361:2305.14992
7340:2303.11366
7319:2302.01560
7297:2305.15486
7276:2210.03629
7255:2201.07207
7226:2023-06-12
7215:2005.11401
7181:2305.15334
7160:2303.16434
7138:2303.09014
7116:2023-06-12
7087:2211.10435
7066:2001.08361
7042:2310.03715
7017:2304.01373
6995:2004.08900
6963:TechCrunch
6940:2024-07-24
6915:2024-07-24
6886:2024-07-24
6791:2104.10810
6751:openai.com
6676:2023-08-01
6655:2023-07-29
6584:2023-03-09
6547:2006.16668
6525:1701.06538
6503:2212.10560
6482:2203.02155
6460:2404.14219
6438:2005.14165
6417:2404.07965
6397:2309.05463
6342:2104.08758
6320:2024-08-05
6295:2305.15425
6222:2023-04-30
6181:2305.15425
6149:2023-08-17
6120:2024-09-08
6114:2206.02608
6088:2312.00752
6063:2023-07-25
6033:2305.13048
5894:20 January
5862:2024-01-21
5835:2002.12327
5781:2024-01-21
5695:2024-06-07
5570:cs/0108005
5545:2023-03-09
5467:2023-03-14
5421:2019-08-25
5393:References
5286:July 2024
5283:Llama 3.1
5269:9T Tokens
5256:June 2024
5241:3T Tokens
5175:Microsoft
5171:April 2024
5102:Databricks
5073:Anthropic
5001:Gemini 1.5
4980:Microsoft
4950:Mistral AI
4945:April 2024
4916:Mistral AI
4872:Gemini 1.0
4817:Anthropic
4808:Claude 2.1
4786:Mistral AI
4777:Mistral 7B
4721:Anthropic
4695:2 trillion
4613:March 2023
4582:March 2023
4552:March 2023
4519:March 2023
4488:March 2023
4453:March 2023
4421:March 2023
4342:Neuro-sama
4199:Apache 2.0
4108:April 2022
4062:March 2022
4057:Chinchilla
4033:EleutherAI
3783:EleutherAI
3746:EleutherAI
3741:March 2021
3682:10 billion
3476:See also:
3290:benchmarks
3248:Perplexity
2962:Perplexity
2941:perplexity
2935:Perplexity
2930:Evaluation
2479:Schaeffer
1949:Properties
1798:"modality"
1790:See also:
1582:Gemini 1.5
1545:See also:
1482:See also:
1369:compresses
1300:, such as
1292:(BPE) and
1268:See also:
1239:Mistral AI
1219:of GPT-4.
1217:parameters
1213:multimodal
1106:Mistral AI
1056:; used in
1027:ontologies
1015:fine-tuned
990:tasks. As
984:generation
706:Q-learning
604:Restricted
402:Mean shift
351:Clustering
328:Perceptron
256:regression
158:Clustering
153:Regression
12745:reviewing
12543:standards
12541:Types and
12090:259713140
12082:2731-0574
10996:March 28,
10935:March 14,
10686:257380916
10597:Models".
10569:YaLM 100B
10208:Anthropic
10204:"Product"
9991:March 12,
9981:"GPT Neo"
9739:2 January
9627:1533-7928
9539:March 13,
9459:2 January
9077:259213212
8975:257403466
8855:imbue.com
8601:246652372
8189:March 16,
8069:102353817
7986:Jason Wei
6876:imbue.com
6818:211040895
6618:2835-8856
5852:211532403
5802:1409.0473
5735:1541-1672
5685:0891-2017
5617:0891-2017
5535:248377870
5229:June 2024
5131:May 2024
5106:Mosaic ML
5052:6T tokens
4747:July 2023
4717:July 2023
4682:July 2023
4618:AI21 Labs
4273:Galactica
4242:July 2022
4211:June 2022
4180:June 2022
4176:YaLM 100B
3890:Anthropic
3872:Ernie Bot
3815:Microsoft
3778:June 2021
3732:in 2022.
3629:June 2019
3520:June 2018
3494:Developer
3427:actions.
3373:In 2023,
3241:
3033:∣
3009:
2986:∑
2972:−
2955:
2889:cognition
2871:by their
2869:justified
2858:Shoggoths
2769:
2686:
2640:
2591:
2475:proverbs.
2473:Kiswahili
2350:β
2338:α
2144:β
2124:α
1747:functions
1513:bootstrap
1343: of
1294:WordPiece
1286:embedding
1174:attention
1098:Anthropic
1072:(used in
986:or other
865:ECML PKDD
847:VC theory
794:ROC curve
726:Self-play
646:DeepDream
487:Bayes net
278:Ensembles
59:Paradigms
12661:Wikidata
12641:FrameNet
12626:BabelNet
12605:Treebank
12575:PropBank
12520:Word2vec
12485:fastText
12366:Stemming
11957:Archived
11917:Archived
11888:Archived
11859:Archived
11828:Archived
11799:Archived
11770:Archived
11741:Archived
11693:Archived
11663:Archived
11630:Archived
11599:Archived
11569:Archived
11538:Archived
11485:archived
11457:Archived
11428:Archived
11426:. 2023.
11399:Archived
11395:IBM Blog
11368:Archived
11339:Archived
11310:Archived
11308:. 2023.
11277:Archived
11247:Archived
11212:Archived
11182:Archived
11090:Archived
11020:Archived
10990:Archived
10986:Cerebras
10960:Archived
10926:Archived
10924:. 2023.
10892:Archived
10857:Archived
10828:13 March
10822:Archived
10777:12 March
10771:Archived
10720:Archived
10690:Archived
10678:36890378
10629:20 March
10574:archived
10521:Archived
10491:Archived
10460:Archived
10425:Archived
10375:Archived
10346:Archived
10293:20 March
10287:Archived
10218:14 March
10212:Archived
10141:13 March
10135:Archived
10052:13 March
10046:Archived
9985:Archived
9960:Archived
9913:13 March
9907:Archived
9878:Archived
9851:13 March
9845:Archived
9813:Archived
9784:Archived
9733:Archived
9704:Archived
9675:Archived
9647:archived
9533:Archived
9508:Archived
9453:Archived
9424:Archived
9288:Archived
9251:Archived
9187:Archived
9160:Archived
9081:Archived
9027:Archived
8997:Archived
8967:36882584
8859:Archived
8831:archived
8605:Archived
8577:: 1–38.
8537:Archived
8507:Archived
8477:Archived
8443:Archived
8441:. 2023.
8377:36943882
8368:10068812
8183:Archived
8156:Archived
8132:Archived
8102:Archived
8073:Archived
7784:Model".
7763:Archived
7697:Archived
7667:Archived
7633:Archived
7484:Archived
7408:Archived
7220:Archived
7110:Archived
6967:Archived
6909:Archived
6880:Archived
6847:Archived
6755:Archived
6725:Archived
6699:Archived
6649:Archived
6628:19 March
6622:Archived
6578:Archived
6273:3 August
6267:Archived
6186:Archived
6057:Archived
6011:June 12,
6005:Archived
5949:Archived
5918:Archived
5914:Euronews
5888:Archived
5856:Archived
5772:Archived
5689:Archived
5539:Archived
5513:Daedalus
5507:(2022).
5458:Archived
5415:Archived
5315:See also
5298:440,000
5289:Meta AI
5272:200,000
5212:Unknown
5209:Unknown
5200:May 2024
5082:Unknown
5079:Unknown
5076:Unknown
5066:Claude 3
5018:Unknown
5015:Unknown
4957:Unknown
4923:Unknown
4889:Unknown
4886:Unknown
4862:Used in
4853:Unknown
4823:Unknown
4820:Unknown
4795:Unknown
4768:Used in
4759:Unknown
4756:Unknown
4727:Unknown
4724:Unknown
4712:Claude 2
4645:May 2023
4625:Unknown
4622:Unknown
4458:Cerebras
4431:Unknown
4357:Unknown
4145:May 2022
4067:DeepMind
4042:825 GiB
4024:GPT-NeoX
3961:DeepMind
3792:825 GiB
3755:825 GiB
3704:May 2020
3684:tokens)
3403:Security
3192:test set
2469:Hinglish
2437:break(s)
1773:codebook
1667:Tool use
1654:A100-GPU
1420:Problems
1349: "
1279:Because
1260:model).
288:Boosting
137:Problems
12832:Related
12798:Chatbot
12656:WordNet
12636:DBpedia
12510:Seq2seq
12254:Parsing
12169:Trigram
11849:"Qwen2"
11791:"Phi-3"
11719:"Gemma"
11706:tokens.
11424:Mistral
11306:Meta AI
11283:May 18,
10863:9 March
10853:Meta AI
10696:9 March
10658:Bibcode
10466:9 March
9899:"gpt-2"
9725:"xlnet"
9260:24 June
9166:18 June
9148:Science
9128:18 June
9033:18 June
9003:18 June
8543:12 June
8513:12 June
8483:12 June
8449:12 June
8347:Bibcode
6973:9 March
6172:NeurIPS
5575:Bibcode
5215:Unknown
5141:, etc.
5135:Fujitsu
5055:Unknown
5021:Unknown
4960:Unknown
4926:Unknown
4892:Unknown
4856:Unknown
4837:Grok-1
4826:Unknown
4762:Unknown
4730:Unknown
4697:tokens
4686:Meta AI
4678:Llama 2
4660:tokens
4547:PanGu-Σ
4434:Unknown
4386:Meta AI
4354:Unknown
4297:unknown
4294:tokens
4207:Minerva
4162:tokens
4136:chips.
4123:tokens
4090:Sparrow
4079:tokens
4010:tokens
3973:tokens
3943:Sparse
3934:tokens
3902:tokens
3831:tokens
3737:GPT-Neo
3730:ChatGPT
3719:tokens
3680:40GB (~
3604:Google
3506:License
3421:ChatGPT
3380:time."
3224:Entropy
3209:entropy
3188:overfit
2816:Othello
2755:, then
2577:, then
2481:et. al.
2061:log-log
1806:AlexNet
1627:testing
1589:ChatGPT
1431:Myanmar
1340:series
1307:), and
1205:ChatGPT
1170:Seq2seq
1158:NeurIPS
1121:models.
1112:History
1094:Watsonx
1090:Granite
1058:ChatGPT
1046:GPT-3.5
870:NeurIPS
687:(ECRAM)
641:AlexNet
283:Bagging
12805:(c.f.
12463:models
12451:Neural
12164:Bigram
12159:n-gram
12095:2 July
12088:
12080:
11953:GitHub
11854:GitHub
11689:Google
11335:GitHub
11273:Google
11253:18 May
11016:tii.ae
10921:OpenAI
10684:
10676:
10650:Nature
9956:OpenAI
9903:GitHub
9780:OpenAI
9729:GitHub
9625:
9529:"BERT"
9449:GitHub
9373:
9075:
8973:
8965:
8705:
8680:
8655:
8599:
8573:(12).
8375:
8365:
8067:
6905:GitHub
6856:24 May
6816:
6806:
6695:Google
6616:
6315:Medium
6259:
5850:
5733:
5683:
5615:
5533:
5411:OpenAI
5260:Nvidia
5225:Qwen2
5158:Fugaku
4649:Google
4639:PaLM 2
4557:Huawei
4484:Falcon
4425:OpenAI
4367:Twitch
4317:Amazon
4215:Google
4185:Yandex
4134:TPU v4
4112:Google
3998:Google
3952:Gopher
3923:Google
3880:Claude
3819:Nvidia
3708:OpenAI
3671:OpenAI
3634:Google
3574:words
3562:Google
3525:OpenAI
3509:Notes
2944:token:
2854:really
2428:breaks
1943:Gemini
1699:Agency
1611:masked
1517:Hamlet
1402:-grams
1377:jagged
1325:token
1193:OpenAI
1102:Claude
1070:Gemini
1066:Google
1054:GPT-4o
1038:OpenAI
1031:biases
663:Vision
519:RANSAC
397:OPTICS
392:DBSCAN
376:-means
183:AutoML
12854:spaCy
12499:large
12490:GloVe
12086:S2CID
12025:arXiv
12010:arXiv
11995:arXiv
11153:arXiv
11131:arXiv
11109:arXiv
11043:arXiv
10929:(PDF)
10916:(PDF)
10794:arXiv
10743:arXiv
10682:S2CID
10599:arXiv
10544:arXiv
10398:arXiv
10315:arXiv
10259:arXiv
10238:arXiv
10184:arXiv
10162:arXiv
10106:arXiv
10014:arXiv
9932:arXiv
9756:arXiv
9613:arXiv
9577:arXiv
9556:arXiv
9479:arXiv
9335:arXiv
9310:arXiv
9254:(PDF)
9247:(PDF)
9223:arXiv
9202:arXiv
9084:(PDF)
9073:S2CID
9053:(PDF)
8971:S2CID
8926:arXiv
8902:arXiv
8881:arXiv
8783:arXiv
8756:arXiv
8628:arXiv
8597:S2CID
8579:arXiv
8563:(pdf)
8439:ZDNET
8415:arXiv
8337:arXiv
8295:arXiv
8274:arXiv
8228:arXiv
8206:arXiv
8065:S2CID
8028:arXiv
8007:arXiv
7962:arXiv
7938:arXiv
7873:(PDF)
7849:arXiv
7828:arXiv
7807:arXiv
7786:arXiv
7753:arXiv
7719:arXiv
7604:arXiv
7549:arXiv
7527:arXiv
7506:arXiv
7430:arXiv
7377:arXiv
7356:arXiv
7335:arXiv
7314:arXiv
7292:arXiv
7271:arXiv
7250:arXiv
7210:arXiv
7176:arXiv
7155:arXiv
7133:arXiv
7082:arXiv
7061:arXiv
7037:arXiv
7012:arXiv
6990:arXiv
6850:(PDF)
6843:(PDF)
6814:S2CID
6786:arXiv
6542:arXiv
6520:arXiv
6498:arXiv
6477:arXiv
6455:arXiv
6433:arXiv
6412:arXiv
6392:arXiv
6361:(PDF)
6337:arXiv
6290:arXiv
6176:arXiv
6109:arXiv
6083:arXiv
6028:arXiv
5848:S2CID
5830:arXiv
5797:arXiv
5775:(PDF)
5760:(PDF)
5565:arXiv
5531:S2CID
5488:(PDF)
5461:(PDF)
5446:(PDF)
5327:Notes
5166:Phi-3
5036:Gemma
4971:Phi-2
4920:46.7
4700:21000
4663:85000
4587:LAION
4416:GPT-4
4375:LLaMA
4237:BLOOM
4194:1.7TB
4126:29250
3988:LaMDA
3907:beta
3862:4 Tb
3853:Baidu
3773:GPT-J
3699:GPT-3
3661:GPT-2
3639:0.340
3624:XLNet
3567:0.340
3530:0.117
3515:GPT-1
3063:here
3023:token
2713:When
2618:When
2535:When
2380:410.7
2368:406.4
2207:FLOPs
1935:GPT-4
1929:LLaMA
1716:agent
1661:FLOPs
1574:GPT-2
1429:from
1414:GPT-3
1373:array
1328:izer
1254:Mamba
1235:LLaMA
1231:BLOOM
1209:GPT-4
1197:GPT-3
1189:GPT-2
1185:GPT-1
1082:LLaMA
1050:GPT-4
980:model
885:IJCAI
711:SARSA
670:Mamba
636:LeNet
631:U-Net
457:t-SNE
381:Fuzzy
358:BIRCH
12619:Data
12470:BERT
12097:2023
12078:ISSN
11701:2024
11671:2023
11607:2023
11577:2023
11546:2023
11516:2023
11509:x.ai
11465:2023
11376:2023
11285:2023
11255:2023
11242:CNBC
10998:2023
10937:2023
10865:2023
10830:2023
10779:2023
10698:2023
10674:PMID
10631:2023
10468:2023
10295:2023
10220:2023
10143:2023
10054:2023
9993:2023
9915:2023
9853:2023
9741:2024
9623:ISSN
9541:2023
9461:2024
9371:ISBN
9262:2024
9168:2023
9130:2023
9035:2023
9005:2023
8963:PMID
8738:2024
8703:ISBN
8678:ISBN
8653:ISBN
8613:2023
8545:2023
8515:2023
8485:2023
8451:2023
8373:PMID
8191:2023
8128:ICLR
7922:2024
7693:ICCV
6975:2023
6858:2022
6804:ISBN
6763:2024
6733:2024
6707:2024
6630:2023
6614:ISSN
6275:2023
6257:ISBN
6194:2023
6013:2024
5983:2024
5957:2024
5926:2024
5896:2024
5731:ISSN
5681:ISSN
5613:ISSN
5306:H100
5292:405
5186:MIT
5104:and
5094:DBRX
4992:MIT
4983:2.7
4954:141
4864:Grok
4850:314
4846:x.AI
4562:1085
4505:2800
4400:6300
4282:Meta
4150:Meta
4102:PaLM
4082:6805
4013:4110
3976:5833
3937:5600
3927:1200
3817:and
3760:MIT
3722:3640
3690:MIT
3552:BERT
3544:GPUs
3539:MIT
3488:Name
3472:List
3415:and
3363:BERT
3309:MMLU
2490:Let
2399:1.69
2356:0.28
2344:0.34
2283:nats
1741:for
1710:The
1604:GPTs
1549:and
1358:ens
1305:BERT
1233:and
1178:BERT
1078:Meta
1060:and
1052:and
998:and
895:JMLR
880:ICLR
875:ICML
761:RLHF
577:LSTM
363:CURE
49:and
12651:UBY
12070:doi
10666:doi
10654:615
9363:doi
9152:doi
9065:doi
8955:doi
8589:doi
8363:PMC
8355:doi
8333:120
8057:doi
6796:doi
6369:doi
6249:doi
5840:doi
5723:doi
5671:doi
5640:doi
5605:doi
5521:doi
5517:151
5265:340
5238:72
5205:IBM
5178:14
5111:136
4989:419
4791:7.3
4752:IBM
4653:340
4469:270
4287:120
4253:175
4219:540
4190:100
4165:310
4155:175
4116:540
4045:740
4002:137
3966:280
3858:260
3824:530
3795:200
3751:2.7
3712:175
3676:1.5
3649:330
3607:11
3319:in
3232:log
3203:In
3006:log
2952:log
2766:log
2683:log
2637:log
2588:log
1941:'s
1678:API
1472:Phi
1383:BPE
1355:ok
1319:as
1256:(a
1201:API
1100:'s
1088:'s
1086:IBM
1080:'s
1076:),
1068:'s
1064:),
1042:GPT
1040:'s
976:LLM
621:SOM
611:GAN
587:ESN
582:GRU
527:-NN
462:SDL
452:PGD
447:PCA
442:NMF
437:LDA
432:ICA
427:CCA
303:-NN
12871::
12084:.
12076:.
12064:.
12060:.
12041:.
11955:.
11951:.
11915:.
11911:.
11882:.
11857:.
11851:.
11826:.
11822:.
11793:.
11768:.
11764:.
11739:.
11735:.
11703:.
11687:.
11661:.
11657:.
11645:^
11628:.
11624:.
11593:.
11567:.
11563:.
11536:.
11532:.
11507:.
11455:.
11451:.
11422:.
11397:.
11393:.
11366:.
11362:.
11337:.
11333:.
11304:.
11293:^
11271:.
11245:.
11239:.
11227:^
11210:.
11206:.
11180:.
11176:.
11065:.
11018:.
11014:.
10988:.
10984:.
10958:.
10954:.
10918:.
10890:.
10886:.
10873:^
10851:.
10838:^
10816:.
10765:.
10718:.
10714:.
10688:.
10680:.
10672:.
10664:.
10652:.
10648:.
10621:.
10588:^
10572:,
10558:^
10519:.
10515:.
10489:.
10485:.
10458:.
10454:.
10440:^
10373:.
10369:.
10357:^
10329:^
10303:^
10281:.
10228:^
10210:.
10206:.
10151:^
10133:.
10129:.
10091:^
10070:.
10044:.
10040:.
10028:^
10001:^
9954:.
9905:.
9901:.
9876:.
9872:.
9861:^
9839:.
9828:^
9811:.
9807:.
9778:.
9731:.
9727:.
9702:.
9698:.
9673:.
9669:.
9621:.
9609:21
9607:.
9603:.
9591:^
9506:.
9502:.
9469:^
9451:.
9447:.
9418:.
9394:.
9369:.
9357:.
9333:,
9286:.
9282:.
9270:^
9158:.
9146:.
9121:.
9079:.
9071:.
9059:.
9055:.
9025:.
9021:.
8991:.
8969:.
8961:.
8949:.
8916:^
8857:.
8853:.
8770:^
8746:^
8729:.
8717:^
8603:.
8595:.
8587:.
8571:55
8569:.
8565:.
8531:.
8505:.
8475:.
8471:.
8459:^
8437:.
8404:^
8394:.
8371:.
8361:.
8353:.
8345:.
8331:.
8327:.
8309:^
8250:.
8181:.
8177:.
8130:.
8126:.
8100:.
8096:.
8071:.
8063:.
8051:.
7984:.
7952:^
7913:.
7761:.
7749:35
7747:.
7743:.
7691:.
7663:25
7661:.
7657:.
7627:.
7572:.
7478:.
7453:.
7406:.
7402:.
7391:^
7244:.
7218:.
7206:33
7204:.
7200:.
7108:.
7104:.
7050:^
7035:,
6965:.
6961:.
6949:^
6933:.
6907:.
6903:.
6878:.
6874:.
6826:^
6812:.
6802:.
6794:.
6780:.
6753:.
6749:.
6693:.
6647:.
6620:.
6612:.
6608:.
6593:^
6576:.
6572:.
6556:^
6390:,
6363:.
6313:.
6265:.
6255:.
6243:.
6231:^
6210:.
6184:.
6174:.
6170:.
6152:.
6137:.
6107:,
6081:,
6055:.
6051:.
6003:.
5999:.
5973:.
5947:.
5943:.
5912:.
5886:.
5880:.
5854:.
5846:.
5838:.
5824:.
5820:.
5768:30
5766:.
5762:.
5729:.
5719:24
5717:.
5713:.
5687:.
5679:.
5667:29
5665:.
5661:.
5634:.
5611:.
5601:29
5599:.
5595:.
5573:,
5563:,
5537:.
5529:.
5515:.
5511:.
5496:^
5454:33
5452:.
5448:.
5430:^
5409:.
5160:.
5145:13
5137:,
4902:.
4772:.
4690:70
4673:.
4592:17
4529:50
4498:40
4479:.
4463:13
4444:.
4391:65
4369:.
4322:20
4096:.
4072:70
4038:20
3895:52
3687:28
3644:33
3595:T5
3546:.
3264:.
3183:.
3015:Pr
2834:.
2732:Pr
2646:Pr
2554:Pr
2044:),
1990:),
1361:"
1352:t
1331::
1096:,
1048:,
970:A
890:ML
12809:)
12532:,
12501:)
12497:(
12127:e
12120:t
12113:v
12099:.
12072::
12066:2
12051:.
12033:.
12027::
12018:.
12012::
12003:.
11997::
11966:.
11926:.
11897:.
11868:.
11837:.
11808:.
11779:.
11750:.
11673:.
11639:.
11609:.
11579:.
11548:.
11518:.
11467:.
11437:.
11408:.
11378:.
11348:.
11319:.
11287:.
11257:.
11221:.
11191:.
11161:.
11155::
11139:.
11133::
11117:.
11111::
11076:.
11051:.
11045::
11029:.
11000:.
10969:.
10939:.
10901:.
10867:.
10832:.
10802:.
10796::
10781:.
10751:.
10745::
10729:.
10700:.
10668::
10660::
10633:.
10607:.
10601::
10552:.
10546::
10530:.
10500:.
10470:.
10434:.
10406:.
10400::
10384:.
10323:.
10317::
10297:.
10267:.
10261::
10246:.
10240::
10222:.
10192:.
10186::
10170:.
10164::
10145:.
10114:.
10108::
10085:.
10056:.
10022:.
10016::
9969:.
9940:.
9934::
9917:.
9887:.
9855:.
9822:.
9793:.
9764:.
9758::
9743:.
9713:.
9684:.
9629:.
9615::
9585:.
9579::
9564:.
9558::
9517:.
9487:.
9481::
9463:.
9433:.
9404:.
9379:.
9365::
9337::
9318:.
9312::
9297:.
9264:.
9231:.
9225::
9210:.
9204::
9170:.
9154::
9132:.
9093:.
9067::
9061:1
9037:.
9007:.
8977:.
8957::
8951:7
8934:.
8928::
8910:.
8904::
8889:.
8883::
8868:.
8791:.
8785::
8764:.
8758::
8740:.
8711:.
8686:.
8661:.
8636:.
8630::
8615:.
8591::
8581::
8547:.
8517:.
8487:.
8453:.
8423:.
8417::
8398:.
8379:.
8357::
8349::
8339::
8303:.
8297::
8282:.
8276::
8261:.
8236:.
8230::
8214:.
8208::
8193:.
8141:.
8111:.
8082:.
8059::
8036:.
8030::
8015:.
8009::
7994:.
7970:.
7964::
7946:.
7940::
7924:.
7875:.
7857:.
7851::
7836:.
7830::
7815:.
7809::
7794:.
7788::
7772:.
7755::
7727:.
7721::
7706:.
7676:.
7642:.
7612:.
7606::
7587:.
7557:.
7551::
7535:.
7529::
7514:.
7508::
7493:.
7463:.
7438:.
7432::
7417:.
7385:.
7379::
7364:.
7358::
7343:.
7337::
7322:.
7316::
7300:.
7294::
7279:.
7273::
7258:.
7252::
7229:.
7212::
7184:.
7178::
7163:.
7157::
7141:.
7135::
7119:.
7090:.
7084::
7069:.
7063::
7039::
7020:.
7014::
6998:.
6992::
6977:.
6943:.
6918:.
6889:.
6860:.
6820:.
6798::
6788::
6765:.
6735:.
6709:.
6679:.
6658:.
6632:.
6587:.
6550:.
6544::
6528:.
6522::
6506:.
6500::
6485:.
6479::
6463:.
6457::
6441:.
6435::
6420:.
6414::
6394::
6375:.
6371::
6345:.
6339::
6323:.
6298:.
6292::
6277:.
6251::
6225:.
6178::
6111::
6085::
6066:.
6036:.
6030::
6015:.
5985:.
5959:.
5928:.
5898:.
5865:.
5842::
5832::
5826:8
5805:.
5799::
5784:.
5737:.
5725::
5698:.
5673::
5646:.
5642::
5619:.
5607::
5577::
5567::
5548:.
5523::
5470:.
5424:.
5049:7
3788:6
3578:9
3536:1
3321:n
3317:n
3252:)
3244:(
3236:2
3228:=
3171:i
3151:i
3131:i
3111:i
3091:i
3071:N
3051:)
3048:)
3043:i
3028:i
3018:(
3012:(
3001:N
2996:1
2993:=
2990:i
2980:N
2977:1
2969:=
2966:)
2958:(
2781:)
2778:y
2775:,
2772:x
2763:(
2743:)
2735:(
2724:=
2721:y
2698:)
2695:y
2692:,
2689:x
2680:(
2660:)
2657:)
2649:(
2643:(
2629:=
2626:y
2603:)
2600:y
2597:,
2594:x
2585:(
2565:)
2557:(
2546:=
2543:y
2518:y
2498:x
2435:"
2396:=
2391:0
2387:L
2383:,
2377:=
2374:B
2371:,
2365:=
2362:A
2359:,
2353:=
2347:,
2341:=
2314:6
2311:=
2306:0
2302:C
2268:L
2244:D
2220:N
2209:.
2192:C
2159:0
2155:L
2151:+
2140:D
2136:B
2131:+
2120:N
2116:A
2111:=
2108:L
2101:D
2098:N
2093:0
2089:C
2085:=
2082:C
2076:{
2031:D
2006:N
1977:C
1906:)
1903:)
1900:y
1897:(
1894:E
1891:(
1888:f
1868:y
1848:f
1828:E
1609:"
1410:n
1406:n
1400:n
974:(
959:e
952:t
945:v
525:k
374:k
301:k
259:)
247:(
34:.
20:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.