The Cowboy Problem: How not to get shot - Part 1

Apr 30, 2020 2:54:44 AM / by Prad Upadrashta posted in Data Science

0 Comments

The Cowboy  Problem Part 1

“My name is Sherlock Holmes. It is my business to know what other people don’t know.” – Sherlock Holmes

Many people think that Data Science looks like this:

Read More

(How) Can Business Problem be Solved Using Data Science?

Apr 1, 2020 3:04:13 PM / by Prad Upadrashta posted in Data Science

0 Comments

The goal of data science is to translate ALL business problems into scientific problems which can be managed and/or improved in a systematic data-driven way. It is the job of a data scientist to do exactly this. All business problems will benefit from the application of a more scientific and data-driven approach.

Read More

The Most Valuable Skills to Learn as a Data Scientist Today

Nov 14, 2019 10:00:00 AM / by Prad Upadrashta posted in Data Science

1 Comment

There are a certain set of timeless skills that one should aim to develop over the course of their career in the field of data science:

Read More

The Magic of Data Science (Part 2)

Sep 12, 2019 3:21:41 AM / by Prad Upadrashta posted in Data Science

0 Comments

A two-blog series on what data science can do, with real world examples.

Read More

The Magic of Data Science

Aug 20, 2019 2:40:53 AM / by Prad Upadrashta posted in Data Science

0 Comments

A two-blog series on what data science can do, with real world examples.

Read More

Predictive Analytics in the Retail Industry

Jul 26, 2019 4:48:07 AM / by Garima Jain posted in Data Science, Intelligence

6 Comments


Introduction

 

Read More

Big Data Analytics and augmented patient care

Jul 25, 2019 7:42:46 AM / by Aradhana Pandey posted in Data Science, Intelligence

4 Comments


Why Big Data in Healthcare is so required?

 

Read More

Governed Data Lake for Customer Critical Data Analytics

Jul 25, 2019 7:31:21 AM / by Vaishakh.R posted in Data Management, Data Science

1 Comment


Overview

Retail chains that have brick and mortar stores as well as online platforms often struggle in identifying the customers visiting their site. Even with all the information available at their disposal, the probability of identifying the customers accessing their website is a mere 30%.

Read More

The Brutal Truth about Data Science and Data Scientists (Part 2)

Jul 12, 2019 9:01:18 AM / by Prad Upadrashta posted in Data Science

0 Comments

A two-series blog by Prad Upadrashta, Ph.D., Chief Data Scientist, Mastech InfoTrellis

Read More

The Brutal Truth about Data Science and Data Scientists

Jun 26, 2019 5:29:00 AM / by Prad Upadrashta posted in Data Science

0 Comments

A two-series blog by Prad Upadrashta, Ph.D., Chief Data Scientist, Mastech InfoTrellis

Most data scientists and the organizations that employ them don’t seem to understand how data science is actually done, nor what it is exactly. They sort of jumped on the bandwagon — without really understanding it, nor why it was important to them in a very visceral way.

Many organizations approach data science as though it was a marketing tool — relabeling things that they already do as ‘data science’ as it involves the use of data. That is not real data science, and it completely misses the point of engaging in data science. It would be the equivalent of comparing kids playing in their sandboxes with the operations of the oil majors when they are scouting for oil. The core value of data science, which appears to be overlooked, is the word science.
Blog-Image

Science is not merely predictive — at its heart, it is explanatory as well as diagnostic. Science leads to engineering — a systematic mathematical approach to creating technology solutions based on the exploitation of some natural phenomenon.

Winning Kaggle competitions is not data science; though, it is a reasonable start, I suppose – even though the best models in Kaggle are actually built by machines running genetic algorithms, where natural selection drives the outcome. For all its limitations, Kaggle is certainly a good training ground to get one’s feet wet.

Data science is about understanding the underlying generative process, or mechanism, that results in the data that you observe. It is about exploiting that knowledge to derive statistically significant pockets of value, to drive operational change into an enterprise, resulting in the creation of measurable ROI. It is about systematically driving the decision-making process, in a repeatable, scalable, and iterative way.

When you can translate business voodoo into an engineered revenue stream — that is when you can claim you have done real data science — it means you fundamentally understand how your business works at a very granular level.

Yes, “80%+ of the job of a data scientist is cleaning” as it is oft repeated — but that isn’t just some low-level thoughtless job — cleaning intelligently requires you to understand the solution as you refine the solution iteratively by paying careful attention to: + what matters and + why it matters and + how it matters. The word cleaning should be eliminated in favor of the word curation.

If you don’t understand the endgame, you will inevitably botch the launch off the starting line — and then wonder why you don’t see any results for all the work you’ve put in. You are constructing a well curated data set that conforms to a certain standard of quality to ensure that your model reflects the simple truth you are trying to uncover, capture, and/or replicate. This requires some intuition about what you are modeling and its inherently complex, possibly layered, structure. Merely curve fitting and claiming that “you have a model” is barely table stakes, and certainly offers no sustainable competitive advantage over your competition. The real question is whether you understand the science of your business.

You need to know when you are throwing out the baby with the bathwater. There is a fine line between feature engineering and data cleansing — you might just be cleansing out the most important stuff that is telling you what is really going on! So, NO, any random fresh graduate is unlikely to get this right — it’s just not that simple. It is actually telling that many data scientists I interview don’t understand that data cleansing is also modeling in a very real sense — because to identify noise, you have to have a model of the signal! There’s a reason why companies still pay top dollar for the 0.1% talent pool.

To read the next part of the blog, click here.


The author, “Prad” for short, is a senior analytics executive and experienced data science practitioner with a distinguished track-record of driving AI thought leadership, strategy, and innovation at enterprise scale. His focus areas are Artificial Intelligence, Machine/Deep Learning, Blockchain, IIoT/IoT, and Industry 4.0.

Read More