An Introduction to Vector Databases

Vector databases have been hitting the headlines recently. Pinecone raised a $100 million Series B at a $750 million valuation. Weaviate announced a $50M round of funding. There are other similar databases like Chroma, Qdrant, Milvus, Typesense – the list goes on. Fuelled by generative AI, these databases are fast-tracking the automation process and making the lives of AI engineers and data scientists easier.

What Are Vector Databases?

Source

Vector databases are databases that specialize in the housing or development of vectors - where vectors are essentially a concept where it takes a piece of data and converts it into a string of numbers. Once the data is transformed into vectors, or vector embeddings that are stored in a database, we can perform searches and operations on them. What is unique about these databases is that they store the data only in the form of numbers which the computer understands. This drastically increases the robustness and decreases both computational and time complexity. It also helps to build relationships between various entities in the database. It ensures that data is organized in an intelligent manner.

Why Do We Need to Shift from Relational Databases?

Data is everywhere. Take, for instance, an employee database:

‍

This database is an example of a traditional database. If you query for a specific department or any other parameter, you will get back the result easily. Traditional databases store data in rows and columns. They have different data types. They have both numerical and categorical data types mixed together.

Now, we have all kinds of unstructured data flooding the internet. It becomes hard to browse through all of them at once and extract the desired data of your choice. Even major search engines fail to generate the desired results when we query for a specific topic.

In comes vector databases. Before the advent of vector databases, data analysts and scientists had to go through tons of data-wrangling processes before actually working on the data. The generative nature of vector databases has solved this problem to a great extent. Hence, they save a lot of time and energy.

What Happens Under the Hood?

‍

Vector databases use algorithms like Approximate Nearest Neighbours (ANN) to query through the docs. Above, we can see how same-colored data points are linked together with one another. This facilitates storage, retrieval, and other similarity search operations.

Text, images, or audio data that are stored in the form of embeddings are capable of retaining their meaning by saving similar data closer to each other. This ensures long-term memory.

Major Players in the Industry

The following table depicts the major players in the industry of vector databases:

Source

The below plot depicts the interest in vector databases over time.

Source

Build an Image Search Engine from Scratch on E2E Cloud

We will build an image search engine that takes an image as an input and will return a similar image as an output. We will be using the Weaviate vector database, which is open-source. Follow the steps below:

Step 1: Setting Up on the E2E Cloud

Create a vGPU NVIDIA A100 node:

Go to your local terminal, and log in to your account via ssh:

ssh account_name@ip_address‍

Enter the password when prompted to. Then add your public ssh keys and disable password login.

Step 2: Installation

It is always a good practice to update the machine:

apt update

Moving on, install the stable version of Node.JS using the following commands:

curl -sL https://deb.nodesource.com/setup_18.x | sudo -E bash
sudo apt-get install -y nodejs

Install npm:

sudo npm install -g npm@latest

Docker comes pre-installed in the machine. Now, we are done with the installation process.

Step 3: Setting Up Weaviate

Now to run Weaviate, note that it provides a docker-compose wizard from where we can select options.

Click Next:

Disable these options and set them to false.

Copy the docker-compose command:

curl -o docker-compose.yml "https://configuration.weaviate.io/v2/docker-compose/docker-compose.yml?generative_cohere=false&generative_openai=false&generative_palm=false&image_neural_model=pytorch-resnet50&media_type=image&modules=modules&ref2vec_centroid=false&reranker_cohere=false&runtime=docker-compose&weaviate_version=v1.20.5"

It will generate the following output:

Step 4: Install the Weaviate Client

npm init -y
npm i weaviate-ts-client

Step 5: Initialize the Client

import weaviate from 'weaviate-ts-client';
const client = weaviate.client({
   scheme: 'http',
   host: 'localhost:8080',
});
const schemaRes = await client.schema.getter().do();
console.log(schemaRes)

Step 6: Create a Schema

const schemaConfig = {
const schemaConfig = {
    'class': 'Meme',
    'vectorizer': 'img2vec-neural',
    'vectorIndexType': 'hnsw',
    'moduleConfig': {
        'img2vec-neural': {
            'imageFields': [
                'image'
            ]
        }
    },
    'properties': [
        {
            'name': 'image',
            'dataType': ['blob']
        },
        {
            'name': 'text',
            'dataType': ['string']
        }
    ]
}

await client.schema
    .classCreator()
    .withClass(schemaConfig)
    .do();

  'class': 'Meme',

  'vectorizer': 'img2vec-neural',

  'vectorIndexType': 'hnsw',

  'moduleConfig': {

      'img2vec-neural': {

          'imageFields': [

              'image'

          ]

      }

  },

  'properties': [

      {

          'name': 'image',

          'dataType': ['blob']

      },

      {

          'name': 'text',

          'dataType': ['string']

      }

  ]

}

await client.schema

  .classCreator()

  .withClass(schemaConfig)

  .do();

Step 7: Store an Image

const img = readFileSync('./img/hi-mom.jpg');

const b64 = Buffer.from(img).toString('base64');

await client.data.creator()

.withClassName('Meme')

.withProperties({

  image: b64,

  text: 'matrix meme'

})

.do();

Step 8: Query an Image

const test = Buffer.from( readFileSync('./test.jpg') ).toString('base64');

const resImage = await client.graphql.get()

.withClassName('Meme')

.withFields(['image'])

.withNearImage({ image: test })

.withLimit(1)

.do();

// Write result to filesystem

const result = resImage.data.Get.Meme[0].image;

writeFileSync('./result.jpg', result, 'base64');

Step 9: Create a Directory and Store Some Meme Images

Insert at least 5 meme images in this directory and name the directory, ‘img’. Also, add a test image to query and find a similar image.

Step 10: Run index.js

node index.js

This will show the following output:

An image result.jpg will be created in the directory.

Our test image is:

‍

Result.jpg is one of the images in our database. Hence, it returns a similar output.

Conclusion

In this article, we saw how to create an image search engine on our own. As time passes, we can add more images to the database and populate it. Then our search engine will become more efficient. With the power of generative AI, we have seen how E2E Cloud can be used to create our own search engine powered by Weaviate.

References

[1] https://unzip.dev/p/0x014-vector-databases/

[2] [Ep. 127] Vector databases in AI | Next in Tech

[3] I built an image search engine

An Introduction to Vector Databases

What Are Vector Databases?

Why Do We Need to Shift from Relational Databases?

What Happens Under the Hood?

Major Players in the Industry

Build an Image Search Engine from Scratch on E2E Cloud

Step 1: Setting Up on the E2E Cloud

Step 2: Installation

Step 3: Setting Up Weaviate

Step 4: Install the Weaviate Client

Step 5: Initialize the Client

Step 6: Create a Schema

Step 7: Store an Image

Step 8: Query an Image

Step 9: Create a Directory and Store Some Meme Images

Step 10: Run index.js

Conclusion

References

Related Articles

Making AI Deployment Affordable and Scalable: Cost Efficiency of Quantization

Interpretable vs. Black-Box Models: A Comprehensive Exploration on Early Prediction under Uncertainty

Generative AI in Healthcare: Applications, Benefits, and Its Future

Company

Legal & Policies

Investor Relations

Resources