Azure Blob storage is Microsoft’s object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn’t adhere to a particular data model or definition, such as text or binary data. Any file, image, text can be uploaded to blob store.

Image for post
Image for post

There could be some scenarios where data need to be encrypted before getting uploaded to blob store. …


We already have POI utilities provided by Apache to write/read an excel file in Java. If there are less than 50K rows to be inserted in a file, any excel utility can be used. But, when we have a use-case to insert a million rows in an excel file, we need to look at the utilities to find the best one for our use-case.

Image for post
Image for post

Let’s take an example to understand it better. Suppose our Java application generates a report for the documents queried from Elasticsearch by a user. The given query has 1 million documents to be returned, so the query should be made in batches to avoid such a big response(1 million documents) from Elasticsearch. Batch size is defined (let’s say 5000). …


Every Java application requires memory to run on JVM. This memory is taken from the available RAM of the system where the application is running. There are 2 kinds of memory: Stack and Heap.

Image for post
Image for post

Stack

It is the region of the RAM which is used to store the temporary variables or primitive data types in Java. It also stores the references for the objects that are physically created in heap.

It stores the variables created by the functions in the Last-in-First-out (LIFO) format and frees all the allocated memory when the function exits.

The stack is managed for each thread in Java so its scope is within the thread. It is smaller as compared to the heap’s size. …


Kafka is an open-source stream processing platform. It is developed to provide high throughput and low latency to handle real-time data.

Before we read about how to make our Kafka producer/consumer production-ready, Let’s first understand the basic terminologies of Kafka.

Kafka topic: It refers to a family or name to store a specific kind of message or a particular stream of data. A topic contains messages of a similar kind.

Kafka partition: A topic can be partitioned into one or more partitions. It is again to segregate the messages from one topic into multiple buckets. …


We live in an era where applications run on a huge volume. To make our applications search-efficient and space-economic, we need to truncate the aged data from the data store. Removing old data helps in reducing the search space where the query will be running to retrieve the results plus requires less hardware to store the documents. Removing an individual document from an Elasticsearch index is quite an expensive operation. Elasticsearch provides a better way to achieve this. …


We learned about Docker in this article how Docker helps us to have an easy and efficient deployment of an application. Dockers are scalable, but it requires a manual effort to achieve it. There are some problems we encounter if we don't use any docker container orchestrator.

  1. Containers could not communicate with each other.
  2. Traffic distribution becomes a big problem.
  3. Container management is overhead to manage the cluster manually.
  4. Auto-scaling is not possible.
Image for post
Image for post

In a production environment, we really need to think about these problems to have a robust, highly available, economical application. Here containers orchestrator comes to rescue us. There are many orchestrators available today where Kubernetes from Google is the most famous and used one. …


Java is a nice language that offers sequential, parallel, and asynchronous programming by creating lightweight processes (known as Threads) programmatically. It helps us to write an efficient program to achieve something.

Image for post
Image for post

Let’s first understand what are these 3 ways of a program?

  1. Sequential program: A list of N tasks is given, the program picks each of those tasks sequentially and performs some action. Suppose each task takes 1 second to complete, then our sequential program will take N seconds to complete.
  2. Parallel program: We are not happy with the sequential program and we want to improve its performance, here parallel program employs threads to divide the list of tasks and threads start working on these tasks parallelly to complete in lesser time.


Functional programming is a well-known concept in Javascript and Python. But, it’s a fancy thing for a java developer. Prior to Java 8, Java has been a pure OOP where objects are first-class citizens. We used to do imperative coding in Java. From version 8 onwards, Java brings in the new capabilities to do functional programming. However, Java is not a functional programming language like Javascript.

Image for post
Image for post

What is functional programming?

In order to understand functional programming, we need to understand some basic concepts first.

  1. Imperative programming: In this programming paradigm, we need to tell what is required to be done and how can it be done? …


In my previous post, I tried to explain how to implement pagination in Cassandra? Here, We’ll be looking at how data is written, read, updated, and deleted in Cassandra. Cassandra is a horizontally scalable NoSQL database.

Image for post
Image for post

Before we talk about how does Cassandra maintain data, we first need to understand basic terminologies:

  1. CommitLog: This is an append-only log for all changes local to the Cassandra node. Any data written to Cassandra will first be written to a commitlog before being written to a memtable. It helps to restore the data in case of any node failure. CommitLog provides optimization in data upsert operations. Cassandra does not need to flush every update to disk without thinking of data loss in case of failure. You may be wondering if Cassandra has to write updates to disk only then why not SSTables directly? The CommitLog is optimized for writing. Unlike SSTables which store rows in sorted order, the CommitLog stores updates in the order which they were processed by Cassandra. …


Image for post
Image for post

Before we start talking about docker, we need to understand the problem which is solved by Docker efficiently and economically. Before Docker gained popularity, Companies used to use virtualization for running multiple applications as the different applications might need different sets of libraries and OS to run.

Why do organizations prefer virtual machines instead of provisioning actual servers to host their applications?

  1. Better response time. Virtualization can improve application performance, and the provisioning of virtual machines takes only minutes, rather than weeks. At progressed levels of implementation, self-provisioning by users is even possible.
  2. Improved application availability. When physical servers have problems, need routine maintenance, or require upgrades, the result is costly downtime. With virtual servers, applications can be readily moved between hosts to keep everyone up and running. …

About

Shivanshu Goyal

Software Engineer @Walmartlabs, India

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store