Data management

September 20th, 2013

There’s a software-developer maxim that I heard recently, which is “if you’re writing code, you should be using source control”. Source control, also known as version control, is a system that looks after all the code that you write and stores it, and data about all the changes you make to it, in a form that allows you to revert to previous versions easily. It also allows you to work collaboratively with other people. By analogy with this principle, let me introduce a new maxim: “if you’re collecting data, you should be doing data management”. What’s “data management”? Let me explain…

Data management is looking after your data, and storing it a form that makes it easy to retrieve and understand later. A common situation is that you start out doing some “play” experiments, fiddling about to try and get a handle on some new piece of equipment. You collect some data, perhaps some numbers in a logbook, perhaps some sort of data file. Then you do more experiments, and unless you were meticulous, you end up with a whole load of different experimental results with filenames like DATA_EXPT.xls, DATA_EXPT2.XLS, DATA_TUESDAY.xls, etc. You put them aside for a week and then come back to them. They were meaningful then, but now you’ve forgotten what the parameters were, or which of the runs produced interesting results. Now you have a data management problem. Read the rest of this entry »