Architectural Considerations for using Elasticsearch

If you have decided that Elasticsearch will be considered in the solution design, then it becomes imperative that one knows certain aspects of Elasticsearch from the perspective of the design.

Some of the most important initial considerations from the point of view of architectural designs are enumerated below:

  1. Elasticsearch is a database system which is document oriented and is very useful when the application demands scalability of the system and queries are based on data content and not on the value of keys.
  2. The data storage by Elasticsearch is done using JSON (JavaScript Object Notation) documents. JSON is a data-interchange format, lightweight in nature, having a limited number of data-types. So if you need a system which unique and strong data typed, then you should reconsider your strategy for storage or migration of the data.
  3. While using Elasticsearch, always remember that its REST-based Application Programming Interfaces can be seen by both database and web servers and hence the consideration to place in a particular zone needs extra thought.
  4. Elasticsearch does not come packaged with any module responsible for authentication. Hence, Elasticsearch should always be kept preferably in the back-end and not in the front-end.
  5. Elasticsearch does not come along with any development editors, hence, the plug-ins need to be developed from Elasticsearch Application programming interfaces or installed and used from the library that is available online.
  6. Elasticsearch front-end programming wrappers are widely popular because of the familiar syntax, which in turn eliminates the need to learn Elasticsearch APIs. But the downside with the wrapper frameworks is that they need to constantly update to keep pace with Elasticsearch API, which can have a minimum of three or four releases of product per year, some being major changes.
  7. Elasticsearch has been programmed to have a default behaviour and manages all the data in an automated manner and would continue doing so, till you override with custom settings explicitly. As an example, if a document is indexed with some ID and suppose the same document is re-inserted in the same place, it would be done but with an incremented version number, without raising an exception. As another example, Elasticsearch will manage any data on any shard placed at any of the nodes. So if you do not specify the node, the document could be placed on any shard, on any node or even on any machine.
  8. Elasticsearch is known to be a Multi Version Concurrency Control system, which actually means that upon receiving an update request, Elasticsearch would insert a new record with a version number that is incremented, without any actual update action. If you do not explicitly configure it to remove old records, this behaviour may lead to increased requirements for storage in case of frequent updates of a large number of records.
  9. Elasticsearch integrates like a “river” with external systems. But same is not the case if it has to bring in data from Elasticsearch in other relational database management systems. Some community plug-ins may help in importing data, once. But if you want continuous data extraction or data loading into Elasticsearch, a database system with distributed ETL is more advisable.
  10. With respect to data management settings, Elasticsearch can be very rigid and the settings should be given due consideration even before system creation. For an example, if an index is created with a default setting of a number of shards, then this number remains constant throughout the life of the index, irrespective of the number of records.