Portfolio Project: RAG for LLMs¶

Rob Boswell¶


The following portfolio project creates Retrieval-Augmented Generation (RAG) for a large language model (LLM). Please note that this is only a proof-of-concept version to illustrate one way to create RAG. Specifically, I use Meta's Llama 3.1-8B Instruct LLM (i.e., the 8 billion parameter version of Llama 3.1) in combination with a RAG system that uses Python libraries from LangChain, NVIDIA NIM, and Streamlit.¶

Because I am using the smallest version of the Llama 3.1 model series, the language model's abilities are likely significantly limited compared to a model I would normally use for deployment in industry.¶

This example version of RAG searches for answers to a user's query based only on the pdf documents the user has stored in a specific directory file on their computer. The code could be modified to also search for internal files of other types (e.g., text files, Microsoft Word files, and graph databases), and to search for sources on the internet as well.¶

Note: This project is for demontration purposes and is not intended to explain in detail the specifics of how the RAG process works.¶

In [ ]:
import os
In [ ]:
os.chdir('C:/Users/rsb84/Documents/GitHub/portfolio/LLMs/RAG/nvidia_nim-langchain-llama-3.1-8b/')
In [ ]:
# In an Anaconda command prompt, run the following lines in order to create a new conda environment and to make it accessible as a Jupyter Notebook kernel:

# conda create --name my_rag_env python=3.11
# conda activate my_rag_env
# conda install ipykernel
# conda -m ipykernel install --user --name my_rag_env --display-name "Python (my_rag_env)"
In [ ]:
import sys
print(sys.executable)
C:\Users\rsb84\anaconda3\envs\my_rag_env\python.exe
In [ ]:
# Since it is possible that if you run the following code directly inside a Jupyter Notebook cell some packages may 
# unintentionally be installed in different environments, it is best to instead run the following code in the Anaconda prompt, 
# after you create and activate the my_rag_env environment:

# pip install openai python-dotenv langchain_nvidia_ai_endpoints langchain_community faiss-cpu streamlit jupyter-server-proxy pypdf

# A more efficient way to do the same thing would be to create a txt file (e.g., "requirements.txt") with each of the above 
# packages listed on a separate line, then save the file, and run this code in the anaconda prompt: 

# pip install -r requirements.txt
In [ ]:
# Alternatively, using conda install instead of pip install can reduce the likelihood of package dependency conflicts.
# First install all packages conda install will permit using conda-forge. For any remaining packages not available in 
# conda-forge, use pip install:

# conda install -c conda-forge openai python-dotenv faiss-cpu streamlit jupyter-server-proxy pypdf
# pip install langchain_nvidia_ai_endpoints langchain_community
In [ ]:
# First, save your .env document containing your API token(s) in the working directory. Then, run this:
from dotenv import load_dotenv
load_dotenv()
Out[ ]:
True
In [ ]:
# Uncomment out the following code and run it
# nvidia_api_key = os.getenv('NVIDIA_API_KEY')
In [ ]:
# Show the current working directory
print(f"Current working directory: {os.getcwd()}")
Current working directory: C:\Users\rsb84\Documents\GitHub\portfolio\LLMs\RAG\nvidia_nim-langchain-llama-3.1-8b
In [ ]:
# Define the Streamlit app code as a string
streamlit_code = """
import streamlit as st
import os
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA
from langchain_core.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains import create_retrieval_chain
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFDirectoryLoader
# from langchain_community.document_loaders import TextLoader
# from langchain_community.document_loaders import Docx2txtLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

llm = ChatNVIDIA(model = "meta/llama-3.1-8b-instruct") # This model gets called from NVIDIA NIM Inferencing

# Define the folder path and index name for saving/loading the FAISS index
FAISS_FOLDER_PATH = "faiss_index"
FAISS_INDEX_NAME = "index"

# Create vector embeddings for all pdf files in your knowledge base, and store them in a vector database
def vector_embedding():
    if "vectors" not in st.session_state:
        # Check if the FAISS index for the vector embeddings already exists. If so, load it.
        if os.path.exists(os.path.join(FAISS_FOLDER_PATH, f"{FAISS_INDEX_NAME}.faiss")):
            st.session_state.embeddings = NVIDIAEmbeddings()
            # Load the saved FAISS index
            st.session_state.vectors = FAISS.load_local(
                FAISS_FOLDER_PATH, st.session_state.embeddings, index_name=FAISS_INDEX_NAME, allow_dangerous_deserialization=True
            ) 
            # Note: setting allow_dangerous_deserialization=True can be done if you are certain the files you are searching for 
            # your context do not contain malicious code (e.g., internal company files are usually safe). But if searching 
            # through pdf files found on the internet, e.g., be careful.
        else:
            # If the index doesn't exist, create the embeddings and vector store
            st.session_state.embeddings = NVIDIAEmbeddings()  # Will create the embeddings used to convert the text into vectors
            st.session_state.loader = PyPDFDirectoryLoader("C:/Users/rsb84/Documents/GitHub/portfolio/LLMs/RAG/pdf/")
            st.session_state.docs = st.session_state.loader.load()
            st.session_state.text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)
            st.session_state.final_documents = st.session_state.text_splitter.split_documents(st.session_state.docs)
            st.session_state.vectors = FAISS.from_documents(st.session_state.final_documents, st.session_state.embeddings)
            # Save the FAISS index for future use
            st.session_state.vectors.save_local(FAISS_FOLDER_PATH, index_name=FAISS_INDEX_NAME)

st.title("NVIDIA NIM RAG Demo")

# Initialize the document processing on app load
if 'initialized' not in st.session_state:
    vector_embedding()

prompt = ChatPromptTemplate.from_template(
    \"""
    Answer the question based on the provided context only.
    Please provide the most accurate response based on the question.
    <context>
    {context}
    <context>
    Questions: {input}
    \"""
)

prompt1 = st.text_area("Enter Your Question related to Your Documents:", height=150)

# Create an optional button to use to perform the search
search_button = st.button("Search")

# If you enter the prompt, this next code chunk creates a document chain. This first line ensures the vector embeddings have 
# already been created before user can enter their question
if (prompt1 and 'vectors' in st.session_state) or search_button:
    document_chain = create_stuff_documents_chain(llm, prompt)
    retriever = st.session_state.vectors.as_retriever() # When we use the vectors database as_retriever(), this becomes an 
    # interface to retrieve all the data from the database
    retrieval_chain = create_retrieval_chain(retriever, document_chain)
    response = retrieval_chain.invoke({'input': prompt1})
    st.write(response['answer'])

    # This will display all of the context used, using streamlit expander()
    with st.expander("Document Similarity Search"):
        # Identify relevant chunks
        for i, doc in enumerate(response["context"]):
            st.write(doc.page_content)
            st.write("----------------------------------")

from openai import OpenAI

client = OpenAI(
  base_url = "https://integrate.api.nvidia.com/v1",
  api_key = nvidia_api_key
)

completion = client.chat.completions.create(
  model="meta/llama-3.1-70b-instruct",
  messages=[{"role":"user","content":"List the 5 least wealthy countries, and the main reasons each struggles with poverty."}],
  temperature=0.2,
  top_p=0.7,
  max_tokens=1024,
  stream=True
)

for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

"""

# Specify the filename
filename = 'app.py'

# Write the code to a Python script file
with open(filename, 'w') as file:
    file.write(streamlit_code)

# Verify the file has been created
if os.path.exists(filename):
    # Get the full path of the file
    full_path = os.path.abspath(filename)
    print(f"File '{filename}' has been created successfully.")
    print(f"Full path of the file: {full_path}")

    # Verify the file content
    with open(filename, 'r') as file:
        content = file.read()

    # Check if the content matches the expected code
    if content == streamlit_code:
        print("File content verified successfully.")
    else:
        print("Warning: File content does not match expected content.")
else:
    print(f"Error: File '{filename}' could not be created.")
File 'app.py' has been created successfully.
Full path of the file: C:\Users\rsb84\Documents\GitHub\portfolio\LLMs\RAG\nvidia_nim-langchain-llama-3.1-8b\app.py
File content verified successfully.
In [ ]:
# Set Streamlit configuration environment variables
os.environ["STREAMLIT_EMAIL_ADDRESS"] = ""
os.environ["STREAMLIT_SERVER_RUN_ON_SAVE"] = "true"
In [ ]:
from streamlit.file_util import get_streamlit_file_path
import shutil

# Path to the directory where the app.py file is located
app_directory = 'C:/Users/rsb84/Documents/GitHub/portfolio/LLMs/RAG/nvidia_nim-langchain-llama-3.1-8b/'

# Set the absolute path for the .streamlit directory
streamlit_dir = os.path.join(app_directory, ".streamlit")

# Ensure the .streamlit directory exists
os.makedirs(streamlit_dir, exist_ok=True)

# Path to the credentials.toml file
credentials_file = os.path.join(streamlit_dir, "credentials.toml")

# Create credentials.toml with the desired configuration if it doesn't exist
if not os.path.exists(credentials_file):
    with open(credentials_file, 'w') as f:
        f.write("[general]\nshowWarningOnDirectExecution = false\n")
In [ ]:
import subprocess
import time
from IPython.display import IFrame, display

# Path to the directory where the app.py file is located
app_directory = 'C:/Users/rsb84/Documents/GitHub/portfolio/LLMs/RAG/nvidia_nim-langchain-llama-3.1-8b/'

# Ensure we are in the right directory
os.chdir(app_directory)

# Define the port for the Streamlit app
streamlit_port = 8501

# Start the Streamlit app as a subprocess
streamlit_process = subprocess.Popen(
    [sys.executable, "-m", "streamlit", "run", "app.py", os.devnull, "--server.port=8501", "--server.address=localhost"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE
)

# Wait for the server to start
time.sleep(200)

# Attempt to capture and display any output from the Streamlit process
stdout, stderr = streamlit_process.communicate(timeout=10)
print("Streamlit stdout:\n", stdout.decode(errors='replace'))
print("Streamlit stderr:\n", stderr.decode(errors='replace'))

# Create an iframe pointing to the proxied service
iframe_url = f"http://localhost:{streamlit_port}/"
display(IFrame(src=iframe_url, width='100%', height='800px'))
Streamlit stdout:
 
Streamlit stderr:
 2024-08-07 13:11:40.877 Port 8501 is already in use

The error that appears above occurs because this html file is a static representation of the chatbot interface that was generated from my code - not because of the code itself.¶

Here is a screenshot of the output that resulted when I ran the above code and entered the following example query. Notice that after the chatbot answers the query, it displays the context its response is based on that it found in the pdf documents stored on my computer:¶


image.png

context.png

image.png

image.png

image.png