Recently Google announced their new large language model (LLM) Gemini which arrived in 3 different sizes: Nano, Pro and Ultra. Today we will analyze the latency of Gemini Pro when compared to chatGPT 3.5 turbo since both models show similar output quality.
Source: https://redblink.com/chatgpt-vs-gemini/
For this quick experiment, I will do 10 API calls in 3 different scenarios.
I will measure the first token latency and the complete response latency. I will use the OpenAI and VertexAI python SDK. Also, I will use Gemini us-central1 servers since I assume that OpenAI servers are probably located in the USA.
Scenario 1 prompt:
Please classify if the following sentence is a fact or an opinion: The sky is blue. USE only 1 word
Scenario 2 prompt:
Please write a 2 line text explaining the benefits of artificial intelligence.
Scenario 3 prompt:
Please write a 500-word essay explaining the benefits of artificial intelligence.