Guiding Principles for Data Science Stack
We should have guiding principles for data science tools/platforms (data science stack) for enterprise/large organizations.
When hired at an organization where the AI maturity level is low, the AI/ML tools and platforms at your disposal as a data scientist will not be optimal. You may find that other data scientists in such environments either take what limited tools are given to them on premises (on prem) or in the cloud, or they may resort to their own tools and platforms not endorsed or supported by IT.
As such, it is up to the data scientist leadership of the organization to bring forth the vision to build an optimal data science stack for the enterprise, supported by both the business and IT. To make this happen, the lead data scientists should have guiding principles (high level) for data science tools/platforms (data science stack) for their target state:
Principle 1: Focus on Platform as a Service, where the vendor delivers hardware and software to users via cloud.
The key here is the ability to scale your infrastructure and capacity according to the needs of the organization. Let someone else do the heavy lifting for hardware and software configuration and scalability, which allows you to focus on your data science lifecycle and solutions architecture for your end-to-end AI application, at enterprise.
Principle 2: Begin to move away from on prem for AI/ML.
On prem tools will become obsolete soon, and organizations need to move to the cloud. AI is cloud-first in most organizations.
Principle 3: If you must stay on prem due to institutional, bureaucratic, privacy, security, or governance barriers, then careful consideration must be given to Spark clusters, distributed computing, and GPU vs CPU:
Distributed computing: If you want to incorporate Big Data provisioning and processing, then you will need distributed computing. You will need a system with Hadoop clusters, where data from HDFS (distributed disks) would feed the Spark clusters. And then you would take the Spark cluster, utilizing both CPU and GPU in distributed fashion (distributed RAM), to train ML models. These Spark clusters with associated data lakes will suffice to showcase data science algorithms.
GPU vs CPU: If you can’t go distributed computing, then pick GPU over CPU, while still retaining some CPU for data engineering and training simple ML algorithms with small datasets (and multi-core with larger datasets).
Even if you are stuck building a data science stack on prem, you should only build it with the premise that the on prem stack will become obsolete. A plan needs to be made for moving the organization to the cloud anyways.
Principle 4: Data science in industry is moving to Azure, AWS, and GCP.
Your organization should have a plan to upskill your data scientists, data engineers, ML engineers, and cloud engineers on these platforms for AI/ML. If you can’t upskill your workforce, then consider bringing in consultants to fill those cloud AI/ML roles. AWS, Azure, and GCP are foundational platforms for enterprise data science for the foreseeable future. Get AWS/Azure/GCP certifications for cloud AI/ML for data science professionals.
Each of the Big 3 cloud AI platforms has their strengths for powering your data science stack. GCP excels at big data processing. Azure provides high level services for AI/ML and pipeline automation. AWS is highly customizable and secure for organizations with high security needs (such as hospitals and government agencies).
If you follow these 4 principles, it will help you and your organization build out an optimal data science stack that fits your business use cases, and is supported by IT at enterprise.
If you're looking for support from me, here are a few options:
Enterprise Data Science Consultancy: With my consult team comprised of a Senior Data Scientist/ML Engineer, Senior Data Engineer, and Senior Cloud Engineer, we will help you architect and build your Enterprise Data Science platform, and transfer knowledge to your IT team to maintain and optimize it. If you don’t have an MLOps team, I will help you build one. Please get in touch about this consultancy here
Coaching and Mentorship: I offer coaching and mentorship; book a coaching session here