
A Guide to CPMAI Phase 2: Data Understanding for Project Leaders
In our last discussion, we laid out the CEO’s AI Blueprint. We identified the “why,” the “what,” and the business ROI. But if Phase 1 (Business Understanding) is the architectural drawing of a skyscraper, Phase 2 (Data Understanding) is the geological survey of the land where you intend to build.
You wouldn’t pour a concrete foundation on a swamp just because the blueprints look pretty. Yet, in the rush to “get to the AI,” many organizations skip the survey and head straight to the construction site. According to the CPMAI (Cognitive Project Management for AI) methodology, this is where the most expensive mistakes are made.
Phase 2 isn’t about “cleaning” data—that’s for later. It’s about interrogating it. It’s the reality check that determines if your AI dreams are supported by data facts.
The “Forefathers” of Data Discovery: The Four Pillars
To understand your data, you have to look at it through four distinct lenses. These aren’t just checkboxes; they are the filters that catch project-killing risks before they become line items on a budget.
1. Data Collection: Finding the Ingredients
Think of this as an inventory check. Before a chef can cook a five-star meal, they need to know if the ingredients are in the pantry, at the local market, or stuck on a delivery truck.
In AI projects, your “ingredients” are often scattered. You might have structured data in a SQL database, unstructured PDFs in a legacy folder, and real-time streams coming from an API. In this stage, we identify the sources. We aren’t moving the data yet; we’re just mapping out where it lives and how hard it will be to get our hands on it.
2. Data Description: The Nutritional Label
Once you find the ingredients, you need to read the labels. This is the metadata phase. How much data do we actually have? Is it measured in gigabytes or petabytes? Is the format consistent, or are we dealing with a chaotic mix of CSVs, JSONs, and handwritten notes?
If your “Blueprint” requires 10 years of historical customer data to predict churn, but your “Description” reveals you only started saving logs 14 months ago, you’ve just hit your first major pivot point. It’s better to know this on Tuesday than six months into a failed deployment.
3. Data Exploration: The Taste Test
You can’t trust a label blindly; you have to taste the soup. Data exploration involves initial visualizations—scatter plots, histograms, and basic queries.
Are there obvious patterns? Are there massive gaps where data just… disappeared for three months in 2024? Exploration helps you see if the data actually correlates with the business problem you defined in Phase 1. If you’re trying to predict machine failure, but your exploration shows that the “temperature” sensor was broken and reporting a constant 0°C for half the year, your model won’t be “smart”—it will be delusional.
4. Verifying Quality: Hunting for Skeletons
This is the most critical pillar. We are looking for the “truth” of the data. We look for:
- Missing Values: How many “Nulls” are we hiding?
- Outliers: Are those $1,000,000 sales entries real, or did someone trip on the keyboard?
- Inconsistencies: Does “Country” appear as “USA,” “U.S.A.,” and “United States”?
The Strategic “Go/No-Go” Decision
The biggest misconception in AI project management is that every project should move forward. In the CPMAI framework, Phase 2 is actually a filter.
Sometimes, the data understanding phase reveals that the data is too “noisy,” too biased, or simply non-existent. A “No-Go” decision in Phase 2 is a massive win for the organization. It prevents the “Sunk Cost Fallacy” where teams spend millions trying to fix data that was never viable to begin with.
If the data is 40% missing and the sources are legally locked behind third-party privacy walls, your “Blueprint” needs a redesign. You don’t ignore the swamp; you change the build site or drain the land.
The Phase 3 Trap: Why “Understanding” Must Precede “Preparation”
There is a massive temptation to start fixing things the moment you see them. You see a misspelled city name and you want to write a script to fix it. Stop.
In CPMAI, Phase 2 is about assessment, not remediation.
If you start cleaning (Phase 3) before you fully understand the scope of the mess (Phase 2), you end up in a “Whack-a-Mole” cycle. You fix one column, only to realize later that the entire data source is irrelevant to the business goal. By separating “Understanding” from “Preparation,” you ensure that every hour of engineering time spent in the next phase is actually moving the needle toward the ROI you promised the CEO.
Your Phase 2 Checklist: What Success Looks Like
If you are a project leader trying to get an AI project off the ground, you shouldn’t leave Phase 2 without these three “artifacts”:
- The Data Requirements Report: A definitive list of what data we need vs. what we have.
- The Exploration Report: Visual evidence that the data can actually answer the questions we are asking.
- The Quality Assessment: A brutally honest score of how “clean” or “dirty” the data is, highlighting the risks for the next phase.
Conclusion: Build on Bedrock, Not Sand
A “Blueprint” is a statement of intent, but “Data Understanding” is a statement of reality. By treating Phase 2 with the respect it deserves, you aren’t just managing a project—you’re de-risking a business investment.
You can have the most sophisticated neural network in the world, but if the data feeding it is fundamentally misunderstood, the output will be “precisely wrong.” Take the time to audit your ingredients. Your future ROI depends on it.
