Teaching Eleuther A I's G P T J 6 B To Reason

18 May 2022

I came across this wonderful and intriguing Twitter thread from Peter Welinder (VP Product, OpenAI) about GPT3.

This fascinating thread from Peter (Twitter handle @npew ) pushed me to use #EleutherAI GPTJ-6B (hosted on @HuggingFace) for a smaller and similar experiment of my own and the results are highly comparable. And in my humble opinion, this ā€˜smaller’ 6 Billion parameter model might be comparable in performance to mighty 175 Bilion parameter GPT-3.

Here I have tried using GPTJ to find words within a random mix of letters. I suppose this is not as easy a task for LLM as it is for humans. Some samples are -

Mere prompting the GPTJ-6B model was not enough in the given case. As you can see below the model was unable to spot ā€˜insomnia’ and ā€˜protest’ even after increasing number of examples in prompt -

However, following the ā€œTeaching GPT to Reasonā€ approach advocated by Peter in his GPT3 post and keeping in mind the effects of tokenization, I was able to prompt GPTJ-6B to catch the correct words successfully -

Though this was a very small experiment with some cherrypicked examples, my takeaway is that large language models can actually be ā€˜taught’ in their own ways.

By intuitively understanding their operating logic we can break down a problem in steps and guide the model towards the correct answer.

I have also tweeted this in form of a Twitter thread and it can be found over here -