Data Is the Foundation
Every AI project depends on data. Not "big data" — the right data, in the right format, accessible to the right systems. Most AI project failures trace back to data problems that were ignored or underestimated at the start.
Before you commit budget and time to an AI initiative, run through this checklist. Be honest with yourself.
The 12 Questions
Data Availability
1. Can you access the data you need in under a day?
Not "does the data exist somewhere" but "can you actually get it into a format your AI system can use?" If getting data requires manual exports, email requests to other departments, or custom scripts that only one person understands, you have an availability problem.
2. Is the data machine-readable?
PDFs, scanned documents, and screenshots are not machine-readable (without OCR). Excel files with merged cells and colour-coded categories are barely machine-readable. APIs, structured databases, and clean CSVs are machine-readable.
3. Do you have at least 6 months of historical data?
Most AI models need historical data to learn patterns. If you're only tracking something from last month, you probably don't have enough signal. Time-series data is especially important for prediction and anomaly detection use cases.
Data Quality
4. Is the data consistently formatted?
Does "date" always mean the same format? Does "revenue" include or exclude GST consistently? Are categories standardised or do people enter free text? Inconsistency is the number one data quality killer for AI projects.
5. Is the data accurate?
When was the last time someone validated the data against reality? Databases accumulate errors over time. Duplicate records, stale entries, and incorrectly categorised items all degrade AI performance.
6. Are there significant gaps in the data?
Missing values aren't always a dealbreaker — statistical methods can handle some gaps. But if 40% of your records are missing a critical field, that's a problem. Map your data completeness before you start.
Data Governance
7. Do you know who owns the data?
Data ownership determines who can authorise its use for AI. In many organisations, data ownership is ambiguous — IT manages the systems, business units generate the data, and nobody has clear authority to approve AI training.
8. Are there privacy or regulatory constraints?
Australian Privacy Principles, industry-specific regulations, and contractual obligations all affect what data you can use for AI. If the data contains personal information, you need to understand your obligations before processing it.
9. Do you have consent for AI processing?
Some data was collected with specific consent — e.g., "for billing purposes." Using it for AI training may not be covered by that original consent. Check your privacy notices and terms of service.
Data Infrastructure
10. Can your systems handle the processing load?
AI training and inference require compute. Can your current infrastructure handle it, or will you need cloud resources? Have you estimated the cost?
11. Do you have a data pipeline or ETL process?
AI needs fresh data. If loading data into your AI system is a manual process, it won't scale. You need automated pipelines that keep your AI system fed with current data.
12. Is there a plan for ongoing data maintenance?
Data quality degrades over time. New data sources appear. Formats change. Without a maintenance plan, your AI system's accuracy will decline after deployment.
Scoring Your Readiness
- 10-12 yes: You're data-ready. Focus on use case selection and model development.
- 7-9 yes: You're close. Address the gaps before investing heavily in AI tooling.
- 4-6 yes: Significant data work needed. Budget 3-6 months for data preparation before AI development.
- 0-3 yes: Start with data fundamentals. AI projects will fail without this foundation.
The Honest Assessment
Most organisations score 5-7 on this checklist. That's not a failure — it's a starting point. The mistake is pretending you're at 10 and launching an AI project on a shaky foundation.
Get your real data readiness score as part of our AI Readiness Quick Scan. It evaluates data infrastructure alongside seven other dimensions to give you a complete picture.