Elasticsearch can add a wonderful working value to any relational database management system which is in vogue now-a-days, but it needs to be used with a working knowledge of “why to use” and “how to use” before being used with any SQL server.
SQL server may be storing data in any form which can be either relational or multi-dimensional. In case it is multi-dimensional form, services of SQL Server Analysis Services (SSAS) may be used. The default search service of SQL server, Full Text Search service may provide solutions, but proves to be ineffective in cases when the query definitions are more complex and when an exhaustive search has to be made over huge sets of data. Elasticsearch is a search service but has features like Facets and Aggregation Search in its kitty, which can add dynamism to the problems related to data analysis. Many e-commerce sites have a feature of dynamically creating categories as one sifts through the range of products. For such features, one can use the Elasticsearch .
Now the real question: How to use Elasticsearch with an SQL server? As of today, the best means to import data from SQL server to Elasticsearch index is to use “Elasticsearch JDBC River”. In the context of database management systems, a river is basically a library to read data from the external source into the cluster.
The steps to install are as follows:
- First of all, you need to install Elasticsearch and all the related plug-ins and libraries. And since Elasticsearch is written in Java, you need to have JDK installed in your machine and the path must be set properly. You can download the latest archived file, available in zip file format (or tar file format in case you use Linux-based systems). Un-compress the file and switch to work from inside the directory. It is better to use command-line interface for this matter.
- Switch to the “bin” folder and once there type “Elasticsearch -f” to start the installation in the foreground.
- You can choose to install the “JDBC River” available freely from GitHub repository or install it through the “plug-in” command which comes along with the bundle you just downloaded.
- If there is no SQL Server JDBC driver in the path, you have the option of either using jTDS driver or using a JDBC driver available from Microsoft itself.
Configuring the JDBC River
Once that is done, input of the command “sql” will let you start doing everything which is valid in SQL. You can add joins, aliases, conditions and work out the ordering as well.
The command “poll” defines the time interval between which data is brought from the database.
The command “index” allows the user to configure and customize the metadata for the indexes.
Every time a new PUT request is sent, the field “_version” increments automatically.
The best way to start the process of indexing is to start the server again, as the river is started immediately after the restart. But there is a downside. It can result in entire data set being reloaded in the index. If only the IDs are being brought in, all the records that already exist, would receive an update. But if Elasticsearch generates IDs automatically, then there would be a set of new data which would be a duplicate of already existing data. So the river has to be used very carefully. Once the data is loaded into the index, the river may be deleted, if there is no further intention of the data migration.