Applying Bayesian in Data Retention.
When statistics is playing on you.
Applying Bayesian in Data Retention.
In: Data Retention, Security, Policies, PrivacyAssuming you ever been into system administration, the bellow probably rings the bell of postfix bayesian spam filter. Good old one.
The recent consulting work I got was related to data retention and wiping some old data in accordance with detention policies. Customer wanted to be sure that customer records older than 3 years are completely wiped out of the corporate system. They needed an independent researcher to put the signature onto a document that all the records are destroyed. Team started with the standard procedure. Moved all new records towards the new drives, wiping the data from the old ones, then physically destroying them. It was a routine job, waiting for my signature upon completion. In a talk with the company CEO, I understood it’s of top importance for data to get destroyed. It was probably a legal requirement or whatsoever, I did not get too deep, but based on amount they are ready to pay, the deadline and the nervous faces, it was clear they want that data out. Once the job was done, guy was looking at the list of tasks performed with a happy face. Can we finalize the documentation, he asked. Instead, I asked him to give me names of two of his suppliers, and the net turnover with them for a period we were deleting data. Since it was a small industry, with everyone inter-connected, I was able to do a complete calculation in just 3 days, reconstructing the almost-whole customer records that were deleted. Tools used: Public records of revenue / losses of each company within the industry. There were only a single pattern that can support the turnover within the 3 companies within the industry starting with the supply chain. Not only I was able to reconstruct the suppliers, but also reconstruct the level of trades in between the customers. He was stoned.
It was only two customers / suppliers, with the breakdown of trades. Without that, it was impossible to perform such analysis. But with that 2 records, relatively easy. (ok about 3 slipless nights).
Now let’s get back to applied Bayesian theory here. This is the diagram I have presented the company devops team with.
When 100% not really that high.

Join the talk
Share your toughts on the subject or whatever you would like to know.