OpenAI Utility

The openai_util.py file contains utility functions for interacting with the OpenAI API in the Bayard app. It provides functionality to generate model outputs based on user input and a set of relevant documents retrieved from Elasticsearch.

Configuration

The configuration section of the code loads the necessary environment variables using the dotenv library. The OpenAI API key is retrieved from the OPENAI_API_KEY environment variable.

python
dotenv.load_dotenv()

The initialize_openai function is responsible for initializing the OpenAI client with the API key. It sets the api_key attribute of the openai module with the retrieved API key and returns the openai client instance.

python
def initialize_openai():
    openai.api_key = os.environ.get("OPENAI_API_KEY")
    return openai

The openai_client variable is then assigned the initialized OpenAI client instance.

python
openai_client = initialize_openai()

Functions

`generate_model_output(input_text, filtered_docs, max_hits=5, max_tokens=4096)`

The generate_model_output function is the main entry point for generating model outputs based on user input and relevant documents. It takes three parameters:

input_text: The user's input text or query.

filtered_docs: A list of filtered documents retrieved from Elasticsearch based on the user's input.

max_hits (optional): The maximum number of documents to consider for generating the model output. Default is 5.

max_tokens (optional): The maximum number of tokens allowed in the generated model output. Default is 4096.

The function starts by defining a set of system instructions that provide guidance to the AI assistant on how to respond to the user's query. The instructions specify the desired behavior, such as seamlessly weaving information from the documents into the response, maintaining a conversational tone, and focusing on directly addressing the user's information need.

python
system_instructions = """
You are an AI assistant designed to help users explore and understand an extensive academic corpus on LGBTQ+ topics.
...
"""

Next, the function processes the filtered_docs list and extracts relevant information from each document, such as the title, authors, abstract, classification, concepts, emotion, year published, download URL, sentiment, categories, and unique identifier. The extracted information is stored in a list of dictionaries called relevant_documents.

python
relevant_documents = []
if filtered_docs:
    for doc in filtered_docs[:max_hits]:
        document_info = {
            "title": doc.get("title", ""),
            "authors": doc.get("authors", []),
            ...
        }
        relevant_documents.append(document_info)

The model_input variable is then constructed by concatenating the retrieved documents' information and a prompt for the AI assistant to provide a helpful response based on the documents.

python
model_input = "Retrieved Documents:\\n" + json.dumps(relevant_documents, indent=2)
model_input += "\\n\\nBased on the retrieved documents, provide a helpful response.\\n\\nResponse:"

The OpenAI API is then used to generate a response using the specified model (retrieved from the OPENAI_MODEL_ID environment variable) and the constructed model_input. The chat.completions.create method is called with the appropriate parameters, including the system instructions, user input, maximum tokens, and temperature.

python
response = openai_client.chat.completions.create(
    model=os.getenv('OPENAI_MODEL_ID'),
    messages=[
        {"role": "system", "content": system_instructions},
        {"role": "user", "content": model_input}
    ],
    max_tokens=max_tokens,
    temperature=0.75
)

The generated model output is extracted from the API response and stored in the model_output variable. The output_data dictionary is then constructed, containing the relevantDocuments and modelOutput fields.

python
model_output = response.choices[0].message.content
output_data = {
    "relevantDocuments": relevant_documents,
    "modelOutput": model_output
}

Finally, the output_data dictionary is serialized to JSON and returned as the result of the generate_model_output function.

Usage

To use the OpenAI functionality provided by the openai_util.py file, you can call the generate_model_output function with the user's input text and the filtered documents retrieved from Elasticsearch. For example:

python
input_text = "What are the key challenges faced by the LGBTQ+ community?"
filtered_docs = search_elasticsearch(input_text)
model_output = generate_model_output(input_text, filtered_docs)

The function will generate a model output based on the user's input and the relevant documents, using the OpenAI API. The generated output will be returned as a JSON string containing the relevantDocuments and modelOutput fields.

By leveraging the power of OpenAI's language models and the retrieved documents from Elasticsearch, the openai_util.py file enables the generation of informative and contextually relevant responses to user queries within the Bayard app. It provides a seamless integration between the document retrieval process and the generation of natural language responses, enhancing the user's experience and understanding of LGBTQ+ topics.