Leveraging Gemini-1.5-Pro-Latest for Smarter Eating | by Mary Ara | Aug, 2024

Published:


Learn how to use Google’s Germini-1.5-pro-latest model to develop a generative AI app for calorie counting

Towards Data Science
Photo by Pickled Stardust on Unsplash

Have you ever wondered the amount of calories you consume when you eat your dinner, for example? I do that all the time. Wouldn’t it be wonderful if you could simply pass a picture of your plate through an app and get an estimate of the total number of calories before you decide how far in you want to dip?

This calorie counter app that I created can help you achieve this. It is a Python application that uses Google’s Gemini-1.5-Pro-Latest model to estimate the number of calories in food items.

The app takes two inputs: a question about the food and an image of the food or food items, or simply, a plate of food. It outputs an answer to the question, the total number of calories in the image and a breakdown of calories by each food item in the image.

In this article, I will explain the entire end-to-end process of building the app from scratch, using Google’s Gemini-1.5-pro-latest (a Large Language generative AI model released by Google), and how I developed the front-end of the application using Streamlit.

It is worth noting here that with advancements in the world of AI, it is incumbent on data scientists to gradually shift from traditional deep learning to generative AI techniques in order to revolutionize their role. This is my main purpose of educating on this subject.

Let me start by briefly explaining Gemini-1.5-pro-latest and the streamlit framework, as they are the major components in the infrastructure of this calorie counter app.

Gemini-1.5-pro-latest is an advanced AI language model developed by Google. Since it is the latest version, it has enhanced capabilities over previous versions in the light of faster response times and improved accuracy when used in natural language processing and building applications.

This is a multi-modal model that works with both texts and images — an advancement from Google Gemini-pro model which only works with text prompts.

The model works by understanding and generating text, like humans, based on prompts given to it. In this article, this model will be used to to generate text for our calories counter app.

Gemini-1.5-pro-latest can be integrated into other applications to reinforce their AI capabilities. In this current application, the model uses generative AI techniques to break the uploaded image into individual food items . Based on its contextual understanding of the food items from its nutritional database, it uses image recognition and object detection to estimate the number of calories, and then totals up the calories for all items in the image.

Streamlit is an open-source Python framework that will manage the user interface. This framework simplifies web development so that throughout the project, you do not need to write any HTML and CSS codes for the front end.

Let us dive into building the app.

I will show you how to build the app in 5 clear steps.

1. Set up your Folder structure

For a start, go into your favorite code editor (mine is VS Code) and start a project file. Call it Calories-Counter, for example. This is the current working directory. Create a virtual environment (venv), activate it in your terminal, and then create the following files: .env, calories.py, requirements.txt.

Here’s a recommendation for the look of your folder structure:

Calories-Counter/
├── venv/
│ ├── xxx
│ ├── xxx
├── .env
├── calories.py
└── requirements.txt

Please note that Gemini-1.5-Pro works best with Python versions 3.9 and greater.

2. Get the Google API key

Like other Gemini models, Gemini-1.5-pro-latest is currently free for public use. Accessing it requires that you obtain an API key, which you can get from Google AI Studio by going to “Get API key” in this link. Once the key is generated, copy it for subsequent use in your code. Save this key as an environment variable in the .env file as follows.

GOOGLE_API_KEY="paste the generated key here"

3. Install dependencies

Type the following libraries into your requirements.txt file.

  • streamlit
  • google-generativeai
  • python-dotenv

In the terminal, install the libraries in requirements.txt with:

python -m pip install -r requirements.txt

4. Write the Python script

Now, let’s start writing the Python script in calories.py. With the following code, import all required libraries:

# import the libraries
from dotenv import load_dotenv
import streamlit as st
import os
import google.generativeai as genai
from PIL import Image

Here’s how the various modules imported will be used:

  • dotenv — Since this application will be configured from a Google API key environment variable, dotenv is used to load configuration from the .env file.
  • Streamlit — to create an interactive user interface for front-end
  • os module is used to handle the current working directory while performing file operations like getting the API key from the .env file
  • google.generativeai module, of course, gives us access to the Gemini model we’re about to use.
  • PIL is a Python imaging library used for managing image file formats.

The following lines will configure the API keys and load them from the environment variables store.

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

load_dotenv()

Define a function that, when called, will load the Gemini-1.5-pro-latest and get the response, as follows:

def get_gemini_reponse(input_prompt,image,user_prompt):
model=genai.GenerativeModel('gemini-1.5-pro-latest')
response=model.generate_content([input_prompt,image[0],user_prompt])
return response.text

In the above function, you see that it takes as input, the input prompt that will be specified further down in the script, an image that will be supplied by the user, and a user prompt/question that will be supplied by the user. All that goes into the gemini model to return the response text.

Since Gemini-1.5-pro expects input images in the form of byte arrays, the next thing to do is write a function that processes the uploaded image, converting it to bytes.

def input_image_setup(uploaded_file):
# Check if a file has been uploaded
if uploaded_file is not None:
# Read the file into bytes
bytes_data = uploaded_file.getvalue()

image_parts = [
{
"mime_type": uploaded_file.type, # Get the mime type of the uploaded file
"data": bytes_data
}
]
return image_parts
else:
raise FileNotFoundError("No file uploaded")

Next, specify the input prompt that will determine the behaviour of your app. Here, we are simply telling Gemini what to do with the text and image that the app will be fed with by the user.

input_prompt="""
You are an expert nutritionist.
You should answer the question entered by the user in the input based on the uploaded image you see.
You should also look at the food items found in the uploaded image and calculate the total calories.
Also, provide the details of every food item with calories intake in the format below:

1. Item 1 - no of calories
2. Item 2 - no of calories
----
----

"""

The next step is to initialize streamlit and create a simple user interface for your calorie counter app.

st.set_page_config(page_title="Gemini Calorie Counter App")
st.header("Calorie Counter App")
input=st.text_input("Ask any question related to your food: ",key="input")
uploaded_file = st.file_uploader("Upload an image of your food", type=["jpg", "jpeg", "png"])
image=""
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Image.", use_column_width=True) #show the image

submit=st.button("Submit & Process") #creates a "Submit & Process" button

The above steps have all the pieces of the app. At this point, the user is able to open the app, enter a question and upload an image.

Finally, let’s put all the pieces together such that once the “Submit & Process” button is clicked, the user will get the required response text.

# Once submit&Process button is clicked
if submit:
image_data=input_image_setup(uploaded_file)
response=get_gemini_reponse(input_prompt,image_data,input)
st.subheader("The Response is")
st.write(response)

5. Run the script and interact with your app

Now that the app development is complete, you can execute it in the terminal using the command:

streamlit run calories.py

To interact with your app and see how it performs, view your Streamlit app in your browser using the local url or network URL generated.

This how your Streamlit app looks like when it is first opened on the browser.

Demo image of the initial display of the Calorie Counter App: Photo by author.

Once the user asks a question and uploads an image, here is the display:

Demo image of the Calorie Counter App with user input question and user uploaded image: Photo by author. The food image loaded in the app: Photo by Odiseo Castrejon on Unsplash

Once the user pushes the “Submit & Process” button, the response in the image below is generated at the bottom of the screen.

Demo image of the Calories Counter App with the generated response: Photo by author

For external access, consider deploying your app using cloud services like AWS, Heroku, Streamlit Community Cloud. In this case, let’s use Streamlit Community Cloud to deploy the app for free.

On the top right of the app screen, click ‘Deploy’ and follow the prompts to complete the deployment.

After deployment, you can share the generated app URL to other users.

Just like other AI applications, the results outputed are the best estimates of the model, so, before completely relying on the app, please note the following as some of the potential risks:

  • The calorie counter app may misclassify certain food items and thus, give the wrong number of calories.
  • The app does not have a reference point to estimate the size of the food — portion — based on the uploaded image. This can lead to errors.
  • Over-reliance on the app can lead to stress and mental health issues as one may become obsessed with counting calories and worrying about results that may not be too accurate.

To help reduce the risks that come with using the calorie counter, here are possible enhancements that could be integrated into its development:

  • Adding contextual analysis of the image, which will help to gauge the size of the food portion being analysed. For instance, the app could be built such that a standard object like a spoon, included in the food image, could be used as a reference point for measuring the sizes of the food items. This will reduce errors in resulting total calories.
  • Google could improve the diversity of the food items in their training set to reduce misclassification errors. They could expand it to include food from more cultures so that even rare African food items will be identified.

Related Updates

Recent Updates