logo

Elasticsearch Utility

Elasticsearch Utility

The elasticsearch_util.py file contains utility functions for interacting with Elasticsearch in the Bayard app. It provides functionality to search for relevant documents based on user input using the Elasticsearch search API. The file is responsible for establishing a connection to the Elasticsearch cluster, constructing search queries, executing searches, and processing the search results.

Configuration

The configuration section of the code retrieves the necessary information for connecting to the Elasticsearch cluster. The Elasticsearch URL and API key are obtained from environment variables named ES_URL and ES_API_KEY, respectively. These environment variables should be properly set to ensure a successful connection to the Elasticsearch cluster.
python
ES_URL = os.environ.get("ES_URL") ES_API_KEY = os.environ.get("ES_API_KEY")
After retrieving the URL and API key, an Elasticsearch client instance named es_client is created using the Elasticsearch class from the elasticsearch library. The client is initialized with the provided URL and API key, allowing the application to interact with the Elasticsearch cluster.
python
es_client = Elasticsearch(ES_URL, api_key=ES_API_KEY)

Functions

search_elasticsearch(user_input)

The search_elasticsearch function is the main entry point for performing searches in Elasticsearch. It takes a single parameter, user_input, which represents the user's search query. The function constructs a search query using the text_expansion query type with the ELSER model, executes the search, and processes the search results.
The search query is defined in the search_body dictionary, which specifies the text_expansion query type and the content_embedding field. The model_id parameter indicates the specific ELSER model to be used for the search, and the model_text parameter is set to the user_input provided by the user. The size parameter determines the maximum number of documents to be returned in the search results.
python
search_body = { "query": { "text_expansion": { "content_embedding": { "model_id": ".elser_model_2_linux-x86_64", "model_text": user_input } } }, "size": 3 }
The search is executed using the search method of the es_client instance. The search is performed on the "bayardcorpus" index, and the search_body dictionary is passed as the request body.
python
search_results = es_client.search(index="bayardcorpus", body=search_body)
After executing the search, the function retrieves the search hits from the hits field of the search results. It then processes each hit, extracting relevant fields such as the document title, abstract, authors, classification, concepts, year published, download URL, emotion, sentiment, categories, and unique identifier. The extracted information is stored in a dictionary named filtered_doc.
To avoid duplicate results based on the document title, the function maintains a set called seen_titles. If a document title has already been encountered, it is skipped to ensure uniqueness in the search results.
The processed documents are appended to the filtered_docs list, which is returned by the function. If no documents are found, an empty list is returned.
In case an error occurs during the search process, the function catches the exception, prints an error message, and returns None to indicate a failure.

Usage

To use the Elasticsearch functionality provided by the elasticsearch_util.py file, you can simply call the search_elasticsearch function with the user's search query as the argument.
For example:
python
user_input = "LGBTQ+ rights" search_results = search_elasticsearch(user_input)
The function will execute the search in Elasticsearch using the provided user input and return a list of filtered documents that match the search query. Each document in the list is represented as a dictionary containing various fields such as the document title, abstract, authors, and more.
By leveraging the power of Elasticsearch and the ELSER model, the elasticsearch_util.py file enables efficient and relevant document searching within the Bayard app. It provides a convenient interface for retrieving documents based on user input and allows for further processing and utilization of the search results in the application.