Friday, December 12, 2014

SQL or No SQL- Big Data angle

Continuing from my previous post lets look at what is big data and how modern No SQL DBs help in capturing this data.
Big data as name suggests is vast amount of data, Data which is measured in TBs or PBs. With the cost of storage and network drastically reduced over past few years and expansion of cloud has helped companies to capture as much data as they can. The new paradigm is "Just collect it" which means just gather any and all kinds of data we will figure out later how to slice and dice it and how to use it for analysis.  This 'just collect it' principal is also a factor in driving No SQL dbs as most of this data is un-structured or semi structured like raw text in form of tweets, user comments or data like log files etc.
The three important aspect in fulfilling the business needs of  big data is Volume(Amount of data), Velocity(speed of storing and retrieving) and Variety(types of structured, unstructured data).
Business needs for big data can broadly be classified into two: Operational and Analytical

I will talk about more in detail about how these two needs are solved by technology products and how NO SQL helps in next post.

Wednesday, November 26, 2014

SQL or No SQL- Basics

I am gonna write a multi-part series on this topic as it is a vast topic to cover. I want to start with basics and then dig deep on technologies, architecture choices, security etc.


SQL: I think every developer, architect has used this type of schema in some shape of form. Examples of such dbs are MYSQL, Oracle, MS SQL Server,  Sybase etc. They are better called as RDBMS or Relational databases. The primary idea of such dbs are ACID properties.

C- Consistency
I- Isolation

In short they are transactional and relational in nature allowing SQLs to be written on them which allows joins, sub-queries on multiple tables.

No SQL: No SQL dbs have become popular in last 7-10 years and they are mainly as name suggest non RDBMS dbs. They don't adhere to strict relational properties and are more like key value kind of storage. The reason for their popularity has been the growth in data and with huge data growth the RDBMS has lot of issues in scaling and becoming distributed. They have to keep the ACID properties in check while they grow and scale out and become bottlenecks for fast applications. Hence No SQL: Example of such dbs are MongoDB, Cassandra, CouchDB etc.
The guiding property for such DBs is CAP theorem

  • C​onsistency(Eventually)
  • A​vailability
  • tolerance to network P​artitioning

I will cover little more advanced stuff about No SQL in next writeup.

Monday, November 10, 2014

Data accuracy and trends- How to measure the anomaly and resolve it

There is always a disconnect between applications team and Data-warehouse teams in an enterprise about the accuracy of the data from source systems. Data team has its own arguments with their own merits as to why they need accurate and predictable data from source systems and applications team have their own as to why it cant always be 100%.

Primary reason for application team to have problem providing 100% accurate data is the continuous evolution of business applications and systems and ongoing bugs  in applications which tend to create anomaly in the data. Data team has its argument that unless they get accurate data its hard to do meaningful analysis and build reports on it.

There are some ideas in my mind how this gap can be bridged if not eliminated totally:

  1. How much accuracy is a good level of accuracy: Lets face it that unless you are in some kind of transactional application such as banking application you will always have to rely on un predictable and random end user behavior. This aggravates when its a B2C application where primarily data is generated by end user behavior and interaction with the application. There will always be edge cases, forget the bugs in application( which would also never go to  zero). that means its a good point for two teams to get together and define the business level adherence to the accuracy of the data.
  2. Fix it by process: Another way to reduce data in accuracy in an ever evolving business application is to have a tighter integration of data team with app development team in their development process. Its always a good idea to have these two teams communicate to each other on an ongoing basis of up coming development plan and changes to the application.
  3. QA it!. Another way to reduce data gaps introduced by bugs in any release of an application to have QA plan and execute test plans with reporting teams input. Any good QA org should build test cases keeping data needs in mind and with data teams input.
Please provide other ideas if you may have.

Monday, October 20, 2014

Five basic and fundamental SEO tips as Google recommends and wants it

Here are five basic tips which I use for all my websites and they are SEO 101 tips recommended by Google
  1. Define a good title on your website which reflects the website's  nature and content: This is what google uses to put as heading in its search result display. e.g. for   <title>Relevant and intelligently picked daily technology news and articles. It cuts the noise and shows most relevant news item in the tech world. </title>
  2. Use good and relevant domain name and keep your url structures also user readable with descriptive text. e.g.
  3. Use meta tags as good description on your site. Google uses this tag to show detail about your site in its search result list e.g <meta name="description" content="Technology, News, Landscape, hand picked and relevant daily technology news, news about startups, daily technical landscape">
  4. If possible prepare sitemap for your site and keep the site structure simple and easy to understand for googlebot to navigate and crawl
  5. Final and the most important basic concept. Keep your site with good, relevant content and keep it updated and manage it actively.

Friday, October 10, 2014

How to crack Amazon web services (AWS) certification for Cloud Architect

I recently became an AWS certified Cloud solution architect. I will try to put my experiences and some tidbits on how one can achieve this certification.

I have been involved with AWS platform for more than a year and have been hands on it. I architect-ed a high load highly perform-ant customer portal on AWS using many AWS components and services. Some of the items included EC2, RDS, S3, Cloudfront, Elasticache, DynamoDB, IAM. Its very easy to use and configure cloud platform. After working on AWS for more than a year I decided to get a certification done. Following is my list of things to follow if you want to get AWS certification.

  1. Read and understand various AWS offerings and services: First thing you need to do is to understand what AWS is all about, what it has to offer, how and where its services can be used.
  2. Get Hands On: Nothing can replace a hands on experience while learning any new thing which also includes AWS. AWS offers a free account creation with a free tier limited time package to try things on. If you or your company already has an AWS account where you can try things, its even better.
  3. Understand AWS certification requirements: Following link can help a person to understand what does AWS look in its certification process.
  4. Read about all AWS services and technology in depth: AWS in my opinion has one of the best documented content on its services on the web. For each services specially read two important sections, Product details, and FAQs. e.g. for EC2 read and Also understand some key concepts in details such as VPC, Cloudfront etc.
  5. Practice questions: Try various free sites such as to take some practice questions. Once comfortable take AWS offered practice exam which mimics real exam but with smaller set of questions.
  6. Crack It and share!

Thursday, October 2, 2014

Centralized or Distributed?

In a corporation, I always try to struggle to find a good answer about whether its good to consolidate technology and create centralized services or distribute it and let individual teams create and manage their own applications even if there are some common features and synergy in them.

There are pros and cons to both the approaches.
Centralized services provide synergy in terms of non repeatable work or in other words not reinventing the wheel. They also provide more security to underlying data with only few apps, services and people having access to it. On the other hand they also come with their own baggage. Primarily its a bigger architectural challenges to create dependency on centralized services which puts load on them and failure of those services impact various dependent services. Even it also creates a red-tape with in a corporation where consumer teams become unnecessarily dependent on centralized services team and make them slow and curtails the creativity and productivity.

Distributed services while ushering productivity and creativity also provide the ownership to each team to build a best experience for their stakeholders and customers with out depending on central services to meet their one off requirements. They are also more aligned with service oriented architecture and more architectural sound since they only impact one part of business or a company not the entire company. Just like centralized services they also come with their cons which are reinventing the wheel, less secure, more management issues of data.

In the end, I think there is not a good answer to this problem and each case should be evaluated and looked at its own merit but personally I always keep a little bias towards distributed services and start from that angle and evaluate whether I need to move towards centralization or remain on the side of distribution.

This argument I believe can never be settled just like in government their are always two sides to government power? Strong Federal or Strong States?

The next big thing(s)

There are lot of trends in the technology space but three stands out so far which may be the next big thing(s).
Wait for it !
Here is my list

1) Big Data or just DATA: This one is obvious, but I believe it is still at its infancy or may be in its toddler years. Data was always there and anything or everything is data.  e.g., our heartbeat, our pulse count, food we ate today, or miles we drove today, all this is data which can be stored, crunched and analyzed. For the first time in history we have the power to store, compute and analyze and make deductions out of this data.

2) Virtual Reality: Lot of action in this space by companies like Google, Facebook, Apple and that seems to be one of the next big thing in experience.

3) Internet Of Things:  All the wearable, smart-watches, connected to each other, It seems Bill gates prophecy  may finally be coming to reality with Business @ the Speed of Thought