Latency analysis Google Gemini Pro vs chatgpt 3.5

Recently Google announced their new large language model (LLM) Gemini which arrived in 3 different sizes: Nano, Pro and Ultra. Today we will analyze the latency of Gemini Pro when compared to chatGPT 3.5 turbo since both models show similar output quality.

Source: https://redblink.com/chatgpt-vs-gemini/

Experiment

For this quick experiment, I will do 10 API calls in 3 different scenarios.

Scenario 1: Output a maximum of 1 token. This is useful for scenarios in which we use the LLM to do some classification.
Scenario 2: Output near 40 tokens. This is useful for scenarios where you need the LLM to answer questions.
Scenario 3: Output near 500 tokens. This is useful for scenarios where you need the LLM to write a more complex text for you like an article.

I will measure the first token latency and the complete response latency. I will use the OpenAI and VertexAI python SDK. Also, I will use Gemini us-central1 servers since I assume that OpenAI servers are probably located in the USA.

Results - No streaming

Scenario 1 prompt:

Please classify if the following sentence is a fact or an opinion: The sky is blue. USE only 1 word

Gemini: 1.3344 seconds
OpenAI: 1.0590 seconds

Scenario 2 prompt:

Please write a 2 line text explaining the benefits of artificial intelligence.

Gemini: 1.8561 seconds
OpenAI: 0.7676 seconds

Scenario 3 prompt:

Please write a 500-word essay explaining the benefits of artificial intelligence.

Gemini: 1.6719 seconds
OpenAI: 0.5814 seconds

Untitled