Defining the Data Science Cluster for your Enterprise Cloud Roadmap
When organizations are adopting AI, it is important to have a roadmap to get to your target state. Increasingly, data analytics platforms are going to cloud. Here we define the Data Science cluster.
At AI-naive organizations, they often hire a platoon of data scientists, and come up with a plethora of AI Proofs of Concept (PoCs). But how do you build complete AI systems, that are utilized in operations, and deployed beyond your desktop/local environments? Well, first off, you should hire an AI strategist (like yours truly), so that they can perform their environmental scan of your enterprise, and access your current state, and then access your target state that aligns with your enterprise’s mission statement and strategic vision.
So after the environmental scan and gap analysis, your AI strategist will come up with an AI roadmap. And since many enterprises are going to the cloud, that AI roadmap will eventually align and map up to the Cloud roadmap for your enterprise. There are 3 main components of a data and AI system for the enterprise. This includes:
Data Landing (data lakehouse in the cloud),
Data Science Cluster, and
Enterprise application to consume the outputs.
Most organizations are already mature in Data Landing and building/maintaining enterprise applications, so I will not go into detail in those capabilities. However, I notice that many enterprises do not completely design/build out their Data Science cluster. Thereby, I will focus this article on properly defining this Data Science cluster to ensure that enterprise AI/ML capabilities are properly defined in the cloud roadmap for your organizations.
Before defining your AI/ML capabilities on the cloud roadmap, make sure you partner with the data team and application team to defined the data landing capabilities and application capabilities (respectively) on the cloud roadmap. Then you want to make sure your DS cluster is able to input data from the data lakehouse, and that the predictions from the DS cluster is consumed by the enterprise applications, or some downstream business facts table or an API.
After you determine how your DS cluster fits into the overall cloud roadmap, then you can begin to define the capabilities in your DS cluster. From your environmental scan, you may have validated one or two AI projects that were prototyped, and now the organization wants to operationalize them. These projects would be great pathfinders to validate the capabilities you want for your DS cluster. For an AI/ML operations capability, you want to utilize your first AI MVP (Minimal Viable Product) to be built on a MVPlat (Minimal Viable Platform) as the pathfinder for this new capability. As we don’t want to boil the ocean, consider starting with a small enough AI project that will still be useful from an operational perspective to the organization. Your CEO, CIO, CTO, CDO (or CAIO) can help you pick. You will also need one of these CxOs to give you Authority to Operate, to proceed with such project as a pathfinder.
Once you pick the pathfinder with ATO, you can own the AI/ML swimlane on the roadmap, and then start placing the capabilities on it based on this MVP pathfinder. Complete AI/ML systems have similar components, and you can start with the minimal components that comprise a complete AI/ML system:
Central repository: stores ML code as infrastructure
Feature pipeline: wrangles/transforms the data into features
ML training pipeline: utilizes features and target to train ML model
Inference pipeline: ML model is served in production (features)
So the above components can be the initial boxes, the initial capabilities, that you outline in your cloud roadmap in the AI/ML swimlane. Further down on your timeline, you can consider incorporating a larger AI project to operationalize, and test the above boxes/capabilities with this new use case. Now that you have this larger AI project further down your timeline, you can add more components for your DS cluster, which makes it more enterprise (for use by more than one use case), beyond the MVP. For this larger AI project, consider adding the following components (which were optional in the MVP) to the above components:
Feature store: stores features from feature pipeline, for use in other pipelines/environments/projects
ML model registry: stores trained ML models for use in other pipelines/environments/projects
So when you want to define an Enterprise DS cluster in the AI/ML swimlane, it can look like this:
Central repository (dev/prod)
Feature pipeline (dev)
ML training pipeline (dev)
Inference pipeline (prod)
Feature store (dev/prod)
ML model registry (dev/prod)
When you reuse and build towards these enterprise capabilities, you are well on your way to build towards your target state, which is the Enterprise AI/ML operations capability, which you can place right after your pathfinders on your roadmap. Once you see this through to implementation of this cloud roadmap to completion to the target state, then you can place another feather in your cap and look for the next opportunity to build Enterprise AI/ML operations capabilities at another AI-naive organization.
If you're looking for support, here is how to contact me:
Coaching and Mentorship: I offer coaching and mentorship; book a coaching session here