When managing a project with team members, it is important that all the key components of a deliverable are saved for proper documentation, easy reference, and to enable the reproduction of results delivered. Our clients often require or expect it. This document provides good project practices to be used when managing project files and deliverables.
Who is responsible: the project biostatistician or lead programmer
All important study documents are to be saved on the server in the \Docs folder, organized in subfolders based on type of document. The most current or final versions of the following documents should be saved in their respective folders:
Subfolders within each type of document folder can be created to keep older versions or drafts of the documents.
All versions of datasets associated with a data transfer (received or sent) and datasets that support a deliverable are to be saved in a way that someone could go back and recreate the deliverable.
All data files for a project are kept in the \SASDATA folder with each project folder. Subfolders are used to organize the types and versions of the data. Some examples of data folders with \SASDATA include:
Create a folder for each type of data transfer (external labs, PK, test transfers, etc.) received. The date of the transfer should be included in the folder name or file name containing the data transfer. Subfolders can be used to manage multiple data transfers. If files are received (or sent) as password protected files, create a document in that same folder that contains the password (or remove the password from the file).
For each delivery of output or data transfers sent to clients, a copy of the complete set of data used for that deliverable must be kept. This includes all raw data, derived analysis data, and any other supporting data for that deliverable. A subfolder (or .ZIP file) is created, with the date and description of the deliverable, that includes all supporting data.
For example, if when delivering draft TLFs on September 15, 2016, a subfolder is created for the ADaM and SDTM data used for the output as follows:
\SASDATA\Deliverables\20160915 Draft TLFs\ADaM
\SASDATA\Deliverables\20160915 Draft TLFs\SDTM
Creating a .ZIP file that contains the data files provides some extra protection from over writing the data files in the future, and saves disk space when datasets are large.
A copy of all output files (TLFs) delivered to clients are to be saved in the \Output
in their respective subfolders (\Tables, \Listings, \Figures). Each deliverable is a new subfolder under these locations and should be named with a short description of the deliverable and the date delivered. For example:
\Output\Tables\20160915 Draft Tables
\Output\Listings\20160915 Draft Listings
In addition to keeping a version of the data and output files for each deliverable, the version of the SAS programs used to create the deliverable also needs to be retained. This allows the team to go back and reproduce results from the deliverable as required.
This is done by tagging your repository so we get a snapshot of the SAS program versions used for the deliverable, that we can come back to at any time and rerun.
To review, this is done simply by:
You are creating a "copy" of the project files from /trunk to something you create. The naming convention should be something like:
/tags/20160915-IA9-draft-TLFs
Using consistent names for deliverable subfolders under \SASDATA and \Output folders will clearly identify which files were part of the same deliverable.
With the repository tagged at the time of each deliverable, we will always have a record of versions used and the ability to change our LocalDev working folder to point to any tags, which will load those versions in to our folder to execute again if needed.
A copy of the SAS log files from the generation of each deliverable is to be saved as part of the study documentation.
SAS programs should be written to produce a clean log file (no errors or warnings which indicate a potential problem). The log files should show all SAS code executed and but it is not required to output all macro code (no MPRINT option needed, and when running tables set _DEBUG_=0
).