I tend to agree with Claire about inclusion of version control. This would fall under one of the best practices generally covered by effective data management, and put into practice through data management plans.
Before I comment on 'research dataset', I think think it beneficial to first understand what we mean by a dataset. The issue that arises in the CASRAI context is that there are two different definitions:
A series of structured observations, measurements or facts identified from the research which can be stored in a database
Any organized collection of data in a computational format, defined by a theme or category that reflects what is being measured/observed/monitored. The presentation of the data in the application is enabled through metadata.
Before any compound terms can be made, we must first agree upon which of the data set / dataset definitions we want to use.
The current definition (Research Dataset) blends elements of other definitions. Assuming we can agree on what a dataset actually is, we then need to understand how a "Research Dataset" is different from a 'regular' dataset. By 'research' do we mean that the dataset is created in the context of a research project? Or that the information in a dataset is used to support research? My concern with the term Research dataset is that it seems as though it could apply to all datasets.
As a minor side note, I am not clear as to why the definition includes the condition "in a computational format'. While probably 99.99% of datasets are in a computational format, the format itself is not a vital element (or is it?). Most research datasets will be captured in computational formats. That being said, I can imagine cases where a researcher could collect data and observations by hand, capturing that data in a paper notebook (i.e. a non-computational format). These datasets are paper-based, and would therefore not fit under the definition we are proposing here. This is an unusual case, I admit. But I don't see the need to make the computational format clause, and I think by removing it the definition would actually be improved.