ChatGPT- What, How, Why, and Where?

8 minute read

Published:

ChatGPT is now the talk of the town. A language model many people claim to be the answer to all of our problems. Let us understand what it is, how we can use it, and if it can solve all our problems. We will also look into other GPT models built by OpenAI.

What is ChatGPT?

ChatGPT is a Generative Language model and GPT stands for Generative Pre-Trained Transformers. Let us break this into parts and look at them one at a time.

Language Models

A language model in this case is a machine learning model designed to represent the language domain. It models the entire language it was trained on, to then be used to perform tasks like Question answering, summarization, or identifying similar texts. There are many ways to build a Language model. In the case of ChatGPT and other GPT models, neural networks and deep learning was used to build the language model.

Transformers

Transformers are models that learn context and understanding through sequential data analysis. They take sequences of data as input and generate sequences of output data. Transformers are commonly used as the architecture for language models because they are really good at modeling long-range dependencies in text (they can capture complex relationships between different parts of an input text), which make them really good at understanding and modeling grammar in natural language.

Generative Language model and Generative Pre-Trained Transformer

Generative Language models are Language models that can generate text. Generative Pre-Trained Transformers are Language models, that use the Transformer Architecture to generate text and are already trained on some data.

How do GPT models generate text?

One token at a time.

The models do not generate an entire sentence in a single shot. Given a text input, it outputs probabilities for all the possible tokens. It picks the token with the highest probability (that crosses a set threshold) as the most probable next token. This token is appended to the input text and used to generate the next token. This process continues till a stopping condition is met. Some examples of stopping conditions are a stopping word, a limit on the number of tokens to be generated, and no token having a high enough probability to be considered a valid prediction.

Why ChatGPT?

Why is ChatGPT so good at this, and why is everyone using it and not building their own model?

One of the main reasons for this is their data volume and training strategy. ChatGPT was trained on a really large volume of data. This allowed it to learn rich representations of language which can be used to finetune for a variety of downstream tasks.

The methods section on this page describes the training methods for ChatGPT. The data collection step involves people labeling the data with desired output and the training strategy involves humans validating the model outputs for identifying the best outputs which are used to train a reward model. This model is then used to train the actual model. This method is called Reinforcement Learning from Human Feedback.

Training a model on such large volumes of data and hiring human labelers can be expensive and time-consuming. So a lot of people and organizations just pay to use the OpenAI APIs for their use cases.

You can use ChatGPT to train your own model. We’ll get to that at the end of this post.

Where and How can ChatGPT and other GPT models be used?

These models can be accessed using OpenAI’s APIs. The models that are currently available can be found here. You can run small experiments with the models in the OpenAI Playground found here. You can find the preset I used to experiment with GPT to write this post here.

You can use these generative models to perform a variety of tasks like question-answering or extracting data from natural language. You can do a lot of things with the latest GPT models as long as you are very specific and clear about what you want.

GPT Example

Limitations

GPT models are essentially language models. They are really good at understanding language and predicting the next word in a sentence. They are and can be used for a variety of other tasks, but might not be perfect at them.

Consider the above screenshot of the Entity Extraction example. You will notice that it missed a “ after the order ID. You might face issues like this.

It has limited knowledge of the world and events after 2021. But OpenAI recently launched plugins for ChatGPT which should help improve this to some extent.

The model can hallucinate. Hallucination is when a model generates random data with high confidence because the model deemed it to be plausible. This can lead to unexpected results in many cases.

Cost

GPT models are priced based on the number of tokens in the prompt and the number of generated tokens (as of 28 March 2023). So large volumes of text can get expensive. To keep your costs low, you can be smart about how you use the GPT models.

For example, let us assume that we have all of Wikipedia as our Knowledge base and we need an answer to the question “Who is the author of Berserk?”. GPT can answer this question directly, but for the sake of this example, we are going to assume that it can’t and we need to provide it with some data as the context.

Now there are several ways to do it

  1. Provide all of Wikipedia as the context to the question (ridiculous, but still an option)
  2. Create a fine-tuned model using all of Wikipedia and then use the question as a prompt
  3. Identify which part of Wikipedia might have an answer to this question (there are a lot of cheaper ways to do this) then only use that part as the context for the prompt
  4. Use a combination of 2 and 3

Option 1 is going to be incredibly expensive. Option 2 will make sense if the data is static and you are getting a lot of questions on the data because using Option 3 will make it expensive as you need to send the context every time the query is asked. Option 3 will be a good choice if you have low overall traffic. Option 4 will be a good choice if you get a lot of traffic, but it is focused mostly on certain parts of the data. So you can use a model finetuned on the data that gets a lot of queries and send context with the queries on the rest of the data.

Primitive RAG

These are just some ways to cut costs when using OpenAI APIs.

Token level operations

The model will not be able to perform operations on the token itself. For example, it cannot count the number of words in a sentence, it cannot give words that end with specific characters, or find numbers that add up to another number.

Token level operation

Here’s more for you, ChatGPT can’t do anything related to words itself. For example, it can’t count words, syllables, lines, sentences. It can’t encrypt and decrypt messages properly. It can’t draw ASCIII art. It can’t make alliterations. It can’t find words restricted in a sense, like second syllable, last letter etc.

Any prompt that restricts the building blocks of ChatGPT, which are the words, aka tokens, are the limitations of that ChatGPT. Ask it to make essays, computer programs, analysis of poems, philosophies, alternate history, Nordic runes, and it’ll happily do it for you. Just don’t touch the words.
u/kaenith108 on Reddit

To summarize this post (yes, I used GPT for this), ChatGPT is a Generative Language Model and GPT stands for Generative Pre-Trained Transformers. It is a machine learning model designed to represent the language domain and can generate text. It was trained on a large volume of data and its training strategy involves humans validating the model outputs for identifying the best outputs which are used to train a reward model. ChatGPT and other GPT models can be used to perform a variety of tasks like question-answering or extracting data from natural language. However, it has some limitations like limited knowledge of the world and events after 2021, and the ability to hallucinate. Moreover, it can get expensive to use due to its pricing structure, but there are ways to reduce the cost by being intelligent about how you use the GPT models.

P.S. I had some conversations with ChatGPT to better understand it before writing this post. You can find the screenshots conversations here.