On January 4, 2024, CMU professor Andy Pavlo, known for his database lectures, published his 2023 database review, primarily focusing on the rise of vector databases. These innovative data storage solutions have taken center stage. As the popularity of generative artificial intelligence (AI) models continues to soar, the spotlight has shifted to include databases with vector storage and search capabilities, which provide a cost-effective mechanism for extending the capabilities of a foundation model (FM). With advancements in distributed computing, cloud-centered architectures, and specialized hardware accelerators, databases with vector search are likely to become even more powerful and scalable.
In this post, we explore the key factors to consider when selecting a database for your generative AI applications. We focus on high-level considerations and service characteristics that are relevant to fully managed databases with vector search capabilities currently available on AWS. We examine how these databases differ in terms of their behavior and performance, and provide guidance on how to make an informed decision based on your specific requirements. By understanding these essential aspects, you will be well-equipped to choose the most suitable database for your generative AI workloads, achieving optimal performance, scalability, and ease of implementation.
Retrieval Augmented Generation (RAG) is the process of optimizing the output of a large language model (LLM), so it references an authoritative knowledge base outside of its training data sources before generating a response. Databases extend the already powerful capabilities of LLMs to specific domains or an organization's internal knowledge base, all without the need to retrain the model. It's a cost-effective approach to improving LLM output so it remains relevant, accurate, and useful in various contexts.
The following diagram illustrates the RAG workflow.
When selecting a database for RAG, one of the first considerations is deciding which database is right for your use case. The discussion around databases and generative AI has been vast and multifaceted, and this post seeks to simplify this discussion, focusing primarily on the decision-making process of which database with vector search capabilities is right for you. If you're seeking guidance on LLMs, refer to Generative AI with LLMs.
As of publishing this post, AWS offers the following databases with the full suite of vector capabilities, including vector storage, retrieval, index, and search. AWS also offers databases that have integration with Amazon Bedrock Knowledge Bases.
In the following sections, we explore key considerations that can help you choose the right database for your generative AI applications:
Opting for familiar technology, when possible, will ultimately save developer hours and reduce complexity. When developer teams are already familiar with a particular database engine, using the same database engine for vector search leverages existing knowledge for a streamlined experience. Instead of learning a new skillset, developers can utilize their current skills, tools, frameworks, and processes to include a new feature of an existing database engine.
For example, a team of database engineers may already manage a set of 100 relational databases hosted on Aurora PostgreSQL. If they are exploring supporting a new database with vector search requirement for their applications, they should first start with evaluating the pgvector extension on their existing Aurora PostgreSQL databases.
Similarly, if the team is working with graph data, they can consider using Neptune Analytics, which seamlessly integrates with their existing AWS infrastructure and provides powerful graph querying and visualization features.
In cases where the team deals with JSON documents and needs a scalable, fully managed document database, Amazon DocumentDB offers a compatible and familiar MongoDB experience, allowing them to use their existing skills and knowledge.
If the team is experienced with Redis OSS and needs a highly scalable, in-memory database for real-time applications, consider using MemoryDB. MemoryDB provides a familiar Redis OSS-compatible interface, allowing the team to use their existing Redis OSS knowledge and client libraries while benefiting from the fully managed, durable, and scalable capabilities of MemoryDB.
In these examples, the database engineering team doesn't need to onboard a new database software, which would include adding more developer tooling and integrations. Instead, they can focus on enabling new capabilities within their domain of expertise. Development best practices, operations, and query languages will remain the same, reducing the number of new variables in the equation of successful outcomes. Another benefit of using your existing database is that it aligns with the application's requirements for security, availability, scalability, reliability, and performance. Moreover, your database administrators can use familiar technology, skills, and programming tools.
If your current tech stack lacks vector search support, you can take advantage of serverless offerings to help fill the gap in your vector search needs. For example, OpenSearch Serverless with a quick create experience on the Amazon Bedrock console lets you get started without having to create or manage a cluster. Although onboarding a new technology is inevitable in this case, opting for serverless minimizes the management overhead.
Beyond familiarity, the expected process to actually implement a given database is the next primary consideration in the evaluation process. A seamless integration process can minimize disruption and accelerate time to value for your database. In this section, we explore how to evaluate databases across some core implementation focus areas such as vectorization, management, access control, compliance, and interface.
The foremost consideration for implementation is the process of populating your database with vector embeddings. If your data isn't represented as a vector, you'll need to use an embedding model to convert it into vectors and store it into a database enabled for vectors. For example, when you use OpenSearch Service as a knowledge base in Amazon Bedrock, your generative AI application can take unstructured data stored in Amazon Simple Storage Service (Amazon S3), convert it to text chunks and vectors, and store it in OpenSearch Service automatically.
The day-to-day management needs to be a consideration for overall implementation. You need to select a database that won't overburden your existing team's database management workload. For example, instead of taking on the management overhead of a self-managed database on Amazon Elastic Compute Cloud (Amazon EC2), you can opt for a serverless database, such as Aurora Serverless v2 or OpenSearch Serverless, that supports vector search.
Access control is a critical consideration when integrating vector search into your existing infrastructure. In order to adhere to your current security standards, you should thoroughly evaluate the access control mechanisms offered by potential databases. For instance, if you have robust role-based access control (RBAC) for your non-vector Amazon DocumentDB implementation, choosing Amazon DocumentDB for vector search is ideal because it already aligns with your established access control requirements.
Compliance certifications are key evaluation criteria for a chosen database. If a database doesn't meet essential compliance needs for your application, it's a non-starter. AWS is backed by a deep set of cloud security tools, with over 300 security, compliance, and governance services and features. In addition, AWS supports 143 security standards and compliance certifications, including PCI-DSS, HIPAA/HITECH, FedRAMP, GDPR, FIPS 140-2, and NIST 800-171, helping adhere to compliance requirements for virtually every regulatory agency around the globe, making sure your databases can meet your security and compliance needs.
How your generative AI application will interact with your database is another implementation consideration. This may have implications for the general usability of your database. You need to evaluate how you will connect to and interact with the database, choosing an option with a simple, intuitive interface that helps meet your needs. For instance, Neptune Analytics simplifies vector search through API calls and stored procedures, making it an attractive choice if you prioritize a streamlined, user-friendly interface. For more details, refer to working with vector similarity in Neptune Analytics.
Integrations with Amazon Bedrock Knowledge Bases is important if you're looking to automate the data ingestion and runtime orchestration workflows. Both Aurora PostgreSQL-Compatible and OpenSearch Service are integrated with Amazon Bedrock Knowledge Bases, with more to come. Similarly, integration with Amazon SageMaker enables a seamless, scalable, and customizable solution for building applications that rely on vector similarity search, personalized recommendations, or other vector-based operations, while using the power of machine learning (ML) and the AWS environment. In addition to Aurora PostgreSQL and OpenSearch Service, Amazon DocumentDB and Neptune Analytics are integrated with SageMaker.
Additionally, open source frameworks like LangChain and LlamaIndex can be helpful for building LLM applications because they provide a powerful set of tools, abstractions, and utilities that simplify the development process, improve productivity, and enable developers to focus on creating value-added features. LangChain seamlessly integrates with various AWS databases, storage systems, and external APIs, in addition to being integrated with Amazon Bedrock. This integration allows developers to easily use AWS databases and Bedrock's models within the LangChain framework. LlamaIndex supports Neptune Analytics as a vector store and graph databases for building GraphRAG applications. Similarly, Hugging Face is a popular platform that provides a wide range of pre-trained models, including BERT, GPT, and T5. Hugging Face is integrated with AWS services, allowing you to deploy models on AWS infrastructure and use them with databases like OpenSearch Service, Aurora PostgreSQL-Compatible, Neptune Analytics, or MemoryDB.
Scalability is a key factor when evaluating databases, enabling production applications to run efficiently without disruption. The scalability of databases for vector search is tied to their ability to support high-dimensional vectors and vast numbers of embeddings. Different databases have different means of scaling to support increased utilization, for example the scaling mechanisms and engineering of Aurora PostgreSQL will function differently than scaling on OpenSearch Service or MemoryDB. Understanding the scaling mechanisms of a database is essential to planning for continued growth of your applications. If we consider the example of a music company looking to build a rapidly growing music recommendation engine, OpenSearch Service makes it straightforward to operationalize the scalability of the recommendation engine by providing scale-out distributed infrastructure. Similarly, a global financial services company can use Amazon Aurora Global Database to build a scalable and resilient vector search solution for personalized investment recommendations. By deploying a primary database in one AWS Region and replicating it to multiple secondary Regions, the company can provide high availability, disaster recovery, and global application access with minimal latency, while delivering accurate and personalized recommendations to clients worldwide.
AWS databases offer an extensive set of database engines with diverse scaling mechanisms to help meet the scaling demands of nearly any generative AI requirement.
Another critical consideration is database performance. As organizations strive to extract valuable insights from vast amounts of high-dimensional vector data, the ability to run complex vector searches and operations at scale becomes paramount. When evaluating the performance of databases with vector capabilities, it's important to evaluate the following characteristics:
For example, a global bank building a real-time recommendation engine for their financial instruments needs to stay below their established end-to-end latency budget, while delivering highly relevant vector search results and experiences for users at single-digit millisecond latencies for tens of thousands of concurrent users. In this scenario, MemoryDB is the right choice. The choice of indexing technique also significantly impacts query performance. Approximate nearest neighbor (ANN) based techniques such as Hierarchical Navigable Small World (HNSW) and inverted file with flat compression (IVFFlat) trade off search performance, returning the most relevant results over other k-NN techniques. Understanding these indexing methods is essential for choosing the best database for your specific use case and performance requirements. For example, Aurora Optimized Reads with NVMe caching using HNSW indexing can provide up to a nine-times increase in average query throughput compared to instances without Optimized Reads.
AWS offers databases that can help fulfill the vector search performance needs of your application. These databases provide various performance optimization techniques and advanced monitoring tools, allowing organizations to fine-tune their vector data management solutions, address performance bottlenecks, and achieve consistent performance at scale. By using AWS databases, businesses can unlock the full potential of their vector search requirements, enabling real-time insights, personalized experiences, and innovative AI and ML applications that drive growth and innovation.
Choosing a fully managed database on AWS to run your vector workloads, supported by the pillars of the Well-Architected Framework, offers significant advantages. It combines the scalability, security, and reliability of AWS infrastructure with the operational excellence and best practices around management, allowing businesses to use well-established database technologies while benefiting from streamlined operations, reduced overhead, and the ability to scale seamlessly as their needs evolve. The following are the characteristics that we have observed as pivotal in the database evaluation process:
Selecting the right database for your generative AI application is crucial for success. While factors like familiarity, ease of implementation, scalability, and performance are important, we recognize the need for more specific guidance in this rapidly evolving space. Based on our current understanding and available options within the AWS managed database portfolio, we recommend the following:
The generative AI landscape is dynamic and continues to evolve rapidly. We encourage testing different database services with your specific datasets and ML algorithms with consideration for how your data will grow over time so your solution can scale seamlessly with your workload.
AWS offers a diverse range of database options, each with unique strengths. By leveraging AWS's powerful ecosystem, organizations can empower their applications with efficient and scalable databases featuring vector storage and search capabilities, driving innovation and competitive advantage. AWS is committed to helping you navigate this journey. If you have further questions or need assistance in designing your optimal path forward for generative AI, don't hesitate to contact our team of experts.
Shayon Sanyal is a Principal Database Specialist Solutions Architect and a Subject Matter Expert for Amazon's flagship relational database, Amazon Aurora. He has over 15 years of experience managing relational databases and analytics workloads. Shayon's relentless dedication to customer success allows him to help customers design scalable, secure and robust cloud native architectures. Shayon also helps service teams with design and delivery of pioneering features, such as Generative AI.
Graham Kutchek is a Database Specialist Solutions Architect with expertise across all of Amazon's database offerings. He is an industry specialist in media and entertainment, helping some of the largest media companies in the world run scalable, efficient, and reliable database deployments. Graham has a particular focus on graph databases, vector databases, and AI recommendation systems.