Keeping your data clean: untidy data costs you time and money

August 2, 2021 Neil Glover
clean-data-AIRPA

For example, a client might be listed in one database as ‘Smyth’ but in another as ‘Smythe’. Even if every other detail from birthdate to address matches perfectly, most systems attempting to merge or map would treat this as a separate record.

When clients ask you questions, it’s always best to be able to answer immediately and with confidence. And that’s the power of clean data – a single version of the truth, at your fingertips.

The flipside is, of course, that untidy, misaligned data causes confusion, leads to mistakes, wastes time and, crucially, can hit the bottom line.

Messy data has been in the news in the past few months.

In March this year, 32-year-old journalist Liam Thorp was surprised to be summoned for a COVID-19 vaccine by the NHS despite being in a non-priority group. When he investigated, he discovered that his height had been recorded as 6″2′ but interpreted as 6.2 cm, prompting the system to flag him as being at high risk due to a supposed body-mass index of 28,000.

In October last year, Public Health England had a similar issue when it ‘lost’ 16,000 positive coronavirus cases as a result of a data source which was capped at 65,000 rows.

Often, the issues faced by business are less dramatic than this, but can still cause serious problems.

For example, if one dataset uses US date format (month-day-year) and another follows UK conventions (day-month-year) it’s all too easy for a 6 July deadline to turn into a 7 June one.

Another common problem is caused by records which are almost, but not quite, duplicates.

For example, a client might be listed in one database as ‘Smyth’ but in another as ‘Smythe’. Even if every other detail from birthdate to address matches perfectly, most systems attempting to merge or map would treat this as a separate record.

Compensating for human error

But, in fact, the example above is very much the epitome of human error: at some point, somebody has misspelled or mistyped that surname while entering data. A similar problem can arise when the person managing a particular database takes shortcuts or develops their own shorthand.

You’ll sometimes hear that it takes human intervention to correct this kind of problem, making the kind of lateral leap of which only the human brain is possible.

But, in fact, the example above is very much the epitome of human error: at some point, somebody has misspelled or mistyped that surname while entering data. A similar problem can arise when the person managing a particular database takes shortcuts or develops their own shorthand.

Maybe they’re in the habit of recording retail businesses as RTL in your customer relationship management system (CRM) – how, without manual intervention, can you pick that up?

This is why these days it’s generally considered good practice to use standardised data sets, such as the postcode directory provided by the Office for National Statistics (ONS). With the above example in mind, the ONS also manages the UK standard industrial classification of economic activities (UK SIC) which breaks things down to the Nth degree.

Artificial intelligence in practice

The great news is, when it comes to cleaning and aligning data, artificial intelligence (AI) comes into its own.

AIRPA uses AI to automate the mapping of data between your systems – and we intend to do more of this as the software grows and develops. At present, it’s primarily used for charts of accounts.

It uses a body of previous learning to recognise data types regardless of format, rounding or presentation and automatically make connections.

Otherwise, you’ll be prompted to manually map one or two records from one dataset with a record from the other.

AIRPA will then derive a set of rules from the example you set and automatically apply it from thereon.

This can all feel pretty magical the first time you see it in action. It might feel all the more so when you’re able to see every item of data relating to a particular client across all your systems on a single master screen.

Clean, properly organised, properly matched data is a beautiful thing.

Get in touch for a free trial so you can play with AIRPA yourself or arrange a guided tour with one of our team.