Managing Statistical Deliverables

Introduction

When managing a project with team members, it is important that all the key components of a deliverable are saved for proper documentation, easy reference, and to enable the reproduction of results delivered. Our clients often require or expect it. This document provides good project practices to be used when managing project files and deliverables.

Who is responsible: the project biostatistician or lead programmer

Study Documents

All important study documents are to be saved on the server in the \Docs folder, organized in subfolders based on type of document. The most current or final versions of the following documents should be saved in their respective folders:

  • Study Protocol
  • Case Record Form (CRF) and annotated CRF
  • Statistical Analysis Plan (SAP) and mock table shells
  • Data Specifications

Subfolders within each type of document folder can be created to keep older versions or drafts of the documents.

Study Datasets and Databases

All versions of datasets associated with a data transfer (received or sent) and datasets that support a deliverable are to be saved in a way that someone could go back and recreate the deliverable.

All data files for a project are kept in the \SASDATA folder with each project folder. Subfolders are used to organize the types and versions of the data. Some examples of data folders with \SASDATA include:

  • CRF
  • SDTM
  • ADaM
  • Excel
  • Transfers

DB Transfers

Create a folder for each type of data transfer (external labs, PK, test transfers, etc.) received. The date of the transfer should be included in the folder name or file name containing the data transfer. Subfolders can be used to manage multiple data transfers. If files are received (or sent) as password protected files, create a document in that same folder that contains the password (or remove the password from the file).

Data that Support Deliverables

For each delivery of output or data transfers sent to clients, a copy of the complete set of data used for that deliverable must be kept. This includes all raw data, derived analysis data, and any other supporting data for that deliverable. A subfolder (or .ZIP file) is created, with the date and description of the deliverable, that includes all supporting data.

For example, if when delivering draft TLFs on September 15, 2016, a subfolder is created for the ADaM and SDTM data used for the output as follows:

\SASDATA\Deliverables\20160915 Draft TLFs\ADaM
\SASDATA\Deliverables\20160915 Draft TLFs\SDTM 

Creating a .ZIP file that contains the data files provides some extra protection from over writing the data files in the future, and saves disk space when datasets are large.

Output Files (TLFs)

A copy of all output files (TLFs) delivered to clients are to be saved in the \Output in their respective subfolders (\Tables, \Listings, \Figures). Each deliverable is a new subfolder under these locations and should be named with a short description of the deliverable and the date delivered. For example:

\Output\Tables\20160915 Draft Tables
\Output\Listings\20160915 Draft Listings

SAS Programs

In addition to keeping a version of the data and output files for each deliverable, the version of the SAS programs used to create the deliverable also needs to be retained. This allows the team to go back and reproduce results from the deliverable as required.

This is done by tagging your repository so we get a snapshot of the SAS program versions used for the deliverable, that we can come back to at any time and rerun.

To review, this is done simply by:

  1. Right click on project root folder and select SVN-Branch/tag command.
  2. You are creating a "copy" of the project files from /trunk to something you create. The naming convention should be something like:

    /tags/20160915-IA9-draft-TLFs
  3. Commit and you're done (you probably don't want to change your working folder to the newly created tag when it asks you, typically we stay connected to trunk).

Using consistent names for deliverable subfolders under \SASDATA and \Output folders will clearly identify which files were part of the same deliverable.

With the repository tagged at the time of each deliverable, we will always have a record of versions used and the ability to change our LocalDev working folder to point to any tags, which will load those versions in to our folder to execute again if needed.

Log Files

A copy of the SAS log files from the generation of each deliverable is to be saved as part of the study documentation.

SAS programs should be written to produce a clean log file (no errors or warnings which indicate a potential problem). The log files should show all SAS code executed and but it is not required to output all macro code (no MPRINT option needed, and when running tables set _DEBUG_=0).