Research Data Management
In this blog, I am going to write about techniques of research data management. Without any doubt, research data management is an essential aspect of any research. Properly managed data makes it possible to reproduce, improve the integrity, facilitate wider access of knowledge in a cost-effective and timely manner. It is often a requirement by the University and publishing bodies to submit a data management plan before collecting/ analysing the data.
A data management plan is a document that provides information on the type of to-be-created data, policy regarding the data ownership and access, required facilities and equipment, data management practice, responsible people for these activities etc. I will briefly discuss some important points which will help to prepare a data management plan in the light of abovementioned points. First, we need to provide a short description of the type of data that are to be collected. Data can come in different forms-experimental, observational, images, models etc. This should be clearly mentioned.
Second, we need to give information on the format type of data and software, if any. This include in which format data will be produced and in which format data will be stored. When we require to use the software, it is imperative that we should consider the long-term access to data so that it does not become obsolete. To avoid obsolescence, it is recommended that we store data in at least two formats- preferably in open, non-proprietary formats; record the software version and archive the used software if possible. In the case of using third-party data, the conditions of using the data should be properly acknowledged. The details (license, condition, access fee, related training, URLs, contact person, etc.) should be mentioned in the document.
Third, the file naming convention and the way of organisation folders should be stated in the data management plan document. It is recommended to avoid long file name, spaces, punctuation as many computer systems may not cope with it. Similar things go to folder naming. It is also a good practice to include to “README” file- a text file that describes the organisation of folders.
Fourth, a clear description of version control systems should be included in the data management plan. This can include, but not limited to, back-up and restore, synchronisation, short-term undo, long-term undo, track changes, track ownership, sandboxing, branching, and merging. A nice visual way of understanding of this can be found from https://betterexplained.com/articles/a-visual-guide-to-version-control/.
Fifth, a plan for data storage is a must for the data management plan. Data storage plan should include the expected size of data, nature of data- physical/digital, storage location, and data retention policy. For long term use, data should be digitalised. With digital data, one should take into account the security issue related to different types of storage and justify the choice. A local hard drive can be a convenient option to access the data for the lead researcher, but it poses a greater threat of security and prone to physical damage. Again, it is not a user-friendly option for collaborative research. Cloud storage and network drive are recommended options for data storage these days. Many institutions provide cloud storage or network drive for their researchers up to and beyond retention period. Western Sydney University provides 1 TB storage for both Cloudstor and Onedrive. Again, the library has an option of archiving data records after retention period.
Sixth, information on ethics approval number and sensitivities related to data is an essential part of a data management plan. The researcher is required to write their policy on how they will maintain the sensitiveness of data so that it does not introduce the risk of discrimination, harm or unwanted incidence. Anonymising data can be one of the ways to protect the sensitiveness of data. Further, assigning different persons whose approval will be needed in a few steps before taking any information out of secured data storage (cloud/ network drive).
Last but least, there should be a section on ownership, licensing, and intellectual property of data. I intend to elaborate on this writing in the future. Hope the reader finds this brief presentation helpful. In conclusion, I would like to share an interesting video with you. Click the following link to watch the video. https://youtu.be/N2zK3sAtr-4
This should not happen if a proper data management plan is prepared and executed.