Wednesday, November 26, 2014

SQL or No SQL- Basics

I am gonna write a multi-part series on this topic as it is a vast topic to cover. I want to start with basics and then dig deep on technologies, architecture choices, security etc.

Basics:

SQL: I think every developer, architect has used this type of schema in some shape of form. Examples of such dbs are MYSQL, Oracle, MS SQL Server,  Sybase etc. They are better called as RDBMS or Relational databases. The primary idea of such dbs are ACID properties.

A-Atomicity,
C- Consistency
I- Isolation
D-Durability

In short they are transactional and relational in nature allowing SQLs to be written on them which allows joins, sub-queries on multiple tables.

No SQL: No SQL dbs have become popular in last 7-10 years and they are mainly as name suggest non RDBMS dbs. They don't adhere to strict relational properties and are more like key value kind of storage. The reason for their popularity has been the growth in data and with huge data growth the RDBMS has lot of issues in scaling and becoming distributed. They have to keep the ACID properties in check while they grow and scale out and become bottlenecks for fast applications. Hence No SQL: Example of such dbs are MongoDB, Cassandra, CouchDB etc.
The guiding property for such DBs is CAP theorem

  • C​onsistency(Eventually)
  • A​vailability
  • tolerance to network P​artitioning

I will cover little more advanced stuff about No SQL in next writeup.

Monday, November 10, 2014

Data accuracy and trends- How to measure the anomaly and resolve it

There is always a disconnect between applications team and Data-warehouse teams in an enterprise about the accuracy of the data from source systems. Data team has its own arguments with their own merits as to why they need accurate and predictable data from source systems and applications team have their own as to why it cant always be 100%.

Primary reason for application team to have problem providing 100% accurate data is the continuous evolution of business applications and systems and ongoing bugs  in applications which tend to create anomaly in the data. Data team has its argument that unless they get accurate data its hard to do meaningful analysis and build reports on it.

There are some ideas in my mind how this gap can be bridged if not eliminated totally:

  1. How much accuracy is a good level of accuracy: Lets face it that unless you are in some kind of transactional application such as banking application you will always have to rely on un predictable and random end user behavior. This aggravates when its a B2C application where primarily data is generated by end user behavior and interaction with the application. There will always be edge cases, forget the bugs in application( which would also never go to  zero). that means its a good point for two teams to get together and define the business level adherence to the accuracy of the data.
  2. Fix it by process: Another way to reduce data in accuracy in an ever evolving business application is to have a tighter integration of data team with app development team in their development process. Its always a good idea to have these two teams communicate to each other on an ongoing basis of up coming development plan and changes to the application.
  3. QA it!. Another way to reduce data gaps introduced by bugs in any release of an application to have QA plan and execute test plans with reporting teams input. Any good QA org should build test cases keeping data needs in mind and with data teams input.
Please provide other ideas if you may have.