The market is abuzz with terms like NoSQL, Big Data, NewSQL, Database Appliance, etc. Often, IT decision makers can get very confused with all the noise. They do not understand why they should consider a newer, alternative database when RDBMSs have been around for 20+ years. However, many leading enterprises are already using alternative databases and are saving money, innovating more quickly, and completing projects they could not pursue before as a result. Let’s discuss how one can determine if NoSQL is a fit for current or future applications.
Nature of dataThe first consideration that needs to be made when selecting a database is the characteristics of the data you are looking to leverage. If the data has a simple tabular structure, like an accounting spreadsheet, then the relational model could be adequate.
Data such as geo-spatial, engineering parts, or molecular modeling, on the other hand, tends to be very complex. It may have multiple levels of nesting and the complete data model can be complicated. Such data has, in the past, been modeled into relational tables, but has not fit into that two-dimensional row-column structure naturally.
In similar cases today, one should consider NoSQL databases as an option. Multi-level nesting and hierarchies are very easily represented in the JavaScript Object Notation (JSON) format used by some NoSQL products.
The next question to ask is "what is the volatility of the data model?" Is the data model likely to change and evolve or is it most likely going to stay the same? Generally speaking, all the facts about the data model are not known at design time, so some flexibility is needed. This presents many issues to the relational database management system (RDBMS) users of the world.
During my time at IBM, we spent many hours cautioning users to design the schema right the first time, as revisions made later slowed or stopped the database from operating. For that reason, any potential changes made down the road had to be minimal. The issue of schema-rigidity still rings true today, leading to little flexibility when it comes to application development and evolution.
This "get it right first" approach may have worked in the old world of static schema, but it will not be suitable for the new world of dynamic schema, where changes need to be made daily, if not hourly, to fit the ever changing data model. It is no wonder that many NoSQL users are Web-centric businesses which require a greater amount of flexibility.
Application development (high coding velocity & agility)The key constituency of the DBMS is the application developer community. In the past, the industry delineated the database administrator (DBA) from the application developer. The new world blurs such distinctions and demands very little dependency on dedicated DBAs. The software developer becomes the most important user.
As a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.
The developer requires high coding velocity and great agility in the application building process. NoSQL databases have proven to be a better choice in that regard, using object-focused technologies such as JSON, for example. Even if you are a SQL shop, the incremental time to learn emerging database technologies will save lots of development cost over time.
The learning curve on JSON, for example, is quite fast and programmers can build a prototype in days and weeks. Since many NoSQL offerings include an open system, the community provides many productivity tools, another big advantage over single-vendor proprietary products. Some organizations, such as MongoDB, even offer free courses online that train employees and interested users in how to use the technology.
Operational issues (scale, performance, and high availability)I know from experience that as a database grows in size or the number of users multiplies, many RDBMS-based sites suffer serious performance issues.
Next, consultants are brought in to look at the problem and provide solutions. Vertical scaling is usually recommended at high cost. As processors are added, linear scaling occurs, up to a point where other bottlenecks can appear. Many commercial RDBMS products offer horizontal scaling (clustering) as well, but these are bolted-on solutions and can be very expensive and complex.
If an organization is facing such issues, then it should consider NoSQL technologies, as many of them were designed specifically to address these scale (horizontal scaling or scale-out using commodity servers) and performance issues. Just like Google’s HDFS horizontal scaling architecture for distributed systems in batch processing, these newer NoSQL technologies were built to host distributed databases for online systems. Redundancy (in triplicate) is implemented here for high availability.
A common complaint about NoSQL databases is that they forfeit consistency in favor of high availability. However, this can't be said for all NoSQL databases. In general, one should consider an RDBMS if one has multi-row transactions and complex joins. In a NoSQL database like MongoDB, for example, a document (aka complex object) can be the equivalent of rows joined across multiple tables, and consistency is guaranteed within that object.
NoSQL databases, in general, avoid RDBMS functions like multi-table joins that can be the cause of high latency. In the new world of big data, NoSQL offers choices of strict to relaxed consistency that need to be looked at on a case-by-case basis.
Data warehousing & analyticsRDBMSes are ideally suited for complex query and analysis. Originally DB2 and Oracle were mostly used for query-intensive workloads. Data from production systems were extracted and transformed (via ETL processes) and loaded into an RDBMS for slicing and dicing. Even in today’s world, Hadoop data is sometimes loaded back to an RDBMS for reporting purposes. So an RDBMS is a good choice if the query and reporting needs are very critical.
Real time analytics for operational data is better suited to a NoSQL setting. Further, in cases where data is brought together from many upstream systems to build an application (not just reporting), NoSQL is a must. Today, BI tool-support for NoSQL is new, but growing rapidly.
Co-existence of RDBMS and NoSQL databasesIBM just announced the implementation of the MongoDB API, data representation, query language and wire protocol, thus establishing a way for mobile and other next-generation applications to connect with enterprise database systems such as IBM’s DB2 relational database and its WebSphere eXtreme Scale data grid. This could usher in a new wave of flexible applications that add significant value by spanning multiple data systems.
Oracle also introduced its NoSQL product last year.
Data exchange and interoperability will continue to evolve as other industry leaders follow in IBM's footsteps and the functionality of NoSQL databases will continue to evolve over time. Fortune 1000 companies will be well-advised to look at NoSQL database solutions to meet their needs in a data-intensive business world.
The rapid adoption of these alternative databases in just a few years is a testament to their attractiveness to the new world of Big Data, where agility, performance, and scalability reign supreme.
SECOND
SECOND
Overview of SQL and NoSQL
Relational database (RDBMS) like SQL has been the primary model for database management during the past few decades. But today, non-relational, “NoSQL” databases are gaining prominence as an alternative model for database management. But let’s discuss why this evolution in database management is happening.
SQL (Structured Query Language) is the standard programming language used to communicate with a relational database. You can use SQL for different database operations. It is used to manage, store, and retrieve data in relational databases through applications and queries either on the same computer or over those on the network. An SQL server consists of a relational database which comprises of a set of tables containing data with predefined categories or columns. It contains structured data like names, email addresses, and phone numbers. A relational database matches data by using common characteristics found in the dataset and the resulting group is termed as Schema.
As we now need to handle immense amount of data of different categories, the “one size fits all” approach of SQL is in question. This has led to the emergence of NoSQLcommonly referred to as “Not Only SQL”. With NoSQL, unstructured data can be stored across multiple processing nodes and it does not require fixed table schemas, usually avoids join operations, and typically scales horizontally.
NoSQL – A Substitute for the Limitations of SQL?
In present day, as we are handling s humongous amount of data, data being organized and well-structured actually creates a problem, especially at extremely large volumes. The structured approach of RDBMS database like SQL slows down performance as data volume or size gets bigger and it is also not scalable to meet the needs of Big Data.
So NoSQL was conceived as a completely different framework of databases that allows for high-performance, agile processing of information at a much bigger scale. This is the database well-adapted to the high demands of big data. The new version of NoSQL runs the database MongoDB, which stores unstructured data. This means that you don’t need to know in advance exactly what kind of data you’ll be collecting and storing. You can collect a lot more data of different kinds and can access and analyze data much faster.
NoSQL is centered on the concept of distributed databases, where unstructured data may be stored across multiple processing nodes, and often across multiple servers. This distributed architecture allows NoSQL databases to be horizontally scalable; as data continues to explode, just add more hardware to keep up, with no slowdown in performance.
Benefits of NoSQL
Elastic scaling
Earlier with relational database or RDBMS, database administrators always relied onscaling up or buying bigger, expensive, multiple servers as database load increased rather than scaling out or distributing the database across multiple hosts. The new breed of NoSQL databases are designed to expand transparently and horizontally to take advantage of new nodes, and they’re usually designed with low-cost commodity hardware in mind. For NoSQL, servers can be added or removed from the data layer without application downtime.
Bigger Data Handling Capability
RDBMS capacity has been growing to match the increase in volumes of data, but the limitations of data volumes that can be handled by a single RDBMS are intolerable for some enterprises. Hadoop, an enabler of certain types of NoSQL distributed databases, allow data to be spread across thousands of servers with little reduction in performance and it outstrips that which can be handled by the biggest RDBMS.
Maintaining NoSQL Servers is Cheaper
Maintaining high-end RDBMS systems is expensive and can be only done with the assistance of expensive, highly trained DBAs. On the other hand, NoSQL databases require less management. Features like automatic repair, easier data distribution, and simpler data models make administration and tuning requirements lesser in NoSQL.
Lesser Server Cost
NoSQL databases typically use clusters of cheap commodity servers to manage the exploding data and transaction volumes, while RDBMS tends to rely on expensive proprietary servers and storage systems. So the storing and processing data cost per gigabyte in case of NoSQL can be many times lesser than the cost of RDBMS.
No Schema or Fixed Data model
Data can be inserted in a NoSQL database without first defining a rigid database schema. So the format or data model being inserted can be changed any time, without application disruption. This provides immense application and business flexibility. On the contrary, change management is a big headache in SQL. Here, even minor changes to the data model have to be carefully managed and may necessitate downtime or reduced service levels.
Integrated Caching Facility
In order to increase data output and performance advance NoSQL techniques cache data in system memory. This is in contrast to SQL database where this has to be done using separate infrastructure.
Limitations of NoSQL
Though NoSQL database has generated a lot of enthusiasm, but there are several obstacles it has to overcome it becomes appealing to mainstream companies.
NoSQL alternatives and solutions are still in nascent and pre-production stages and many key features are yet to be implemented.
Customer support is also better in RDBMS systems like SQL and vendors provide a higher level of enterprise support. In contrast, NoSQL system support is provided by small start-up companies without the global reach, resources, or credibility of Oracle, Microsoft, or IBM, the big names associated with SQL.
NoSQL databases have evolved to meet the scaling demands of modern Web 2.0 applications and are oriented to meet the demands of these applications. They offer few facilities for ad-hoc query and analysis. It is much easier to code an SQL query, but in NoSQL even a simple query requires significant programming expertise and commonly used BI tools do not provide connectivity to NoSQL.
Conclusion
SQL and NoSQL have been great inventions over time in the area of data management and have been used to keep data storage and retrieval optimized and smooth. It’s still hard to criticize one and completely go with the other option. Both technologies are best in what they do and it is up to a developer to put them to better use depending on the business situations and needs. Though NoSQL databases are becoming important part of the database landscape, however, enterprises should proceed with caution and be aware about the legitimate limitations associated with these databases.
No comments:
Post a Comment