Data Access

How do I get access to the data?

All requests to access the CLSA data are reviewed by the Data and Sample Access Committee (DSAC). Please consult the Data and Sample Access Policy and Guiding Principles, the pertinent sections of the CLSA protocol(s), the CLSA Data Collection Tools, and the information on the Data Access Application Process and timelines, in advance of preparing an application. The most up-to-date Data Application Forms are available on the website. The steps involved in the data access process are also outlined here.

For guidance on the use of the access@clsa-elcv.ca email address by researchers, trainees and approved users, please click here.


Do I need an institutional email address to access CLSA data?

Yes, anyone on the Project Team requiring access to data must use the email address of the institution they are affiliated to. Data will only be released to institutional email addresses. Email addresses containing domain names such as Gmail, Hotmail, etc. are not acceptable.

What are the data access application deadlines?

There are 3 data access application deadlines per year. For upcoming dates, please see the Application Deadlines page of our website under the Data Access section. Please note that applications must be received by 11:59 p.m. Eastern Time on the day of the submission deadline

How are data released to users?

The Data and Sample Access Committee (DSAC) will review all applications for the use of CLSA data and biospecimens and make a recommendation to the Scientific Management Team (SMT). Once the project has been approved by the SMT, the CLSA Access Agreement has been signed, and proof of ethics approval has been received by the CLSA, we will send a download link for the dataset to the Primary Applicant. The link is valid for 7 days and the number of downloads is determined by the number of Project Team members who have signed Schedule F of the CLSA Access Agreement, indicating that they require direct access to the data. The Primary Applicant will need to share the download link with the Project Team members who have signed Schedule F. It is the Primary Applicant's responsibility to ensure that all of the Project Team members respect the terms of the signed CLSA Access Agreement. Please refer to the CLSA Access Agreement for more information on the responsibilities of users.

What is the format of the dataset when released?

Data are provided to researchers in a comma-separated values (.csv) file. Please note that the complete CLSA alphanumeric dataset contains over 4000 variables collected from more than 51,000 participants. Depending on your choice of statistical software and proposed analyses, automatic data imports may not succeed and you may need to instruct your software how to read the file. This may require the use of advanced scripting and/or macros in some cases. The CLSA encourages you to include someone experienced in working with such complex datasets on your project team.

How large are the datasets?

The combined size of the Baseline alphanumeric dataset including Tracking, Maintaining Contact, and Comprehensive datasets is approximately 325MB. Presently, the CLSA provides users with separate files for the Tracking dataset (65MB) and the Comprehensive dataset (205MB). The Tracking Maintaining Contact Questionnaire (MCQ) and the Comprehensive MCQ datasets are approximately 30MB each and may be requested in conjunction with the Tracking and Comprehensive datasets. If you are concerned about the file size, you can make a request in your application for the data to be made available to you in 'chunks', grouped into smaller files.

Does the CLSA provide guidance on how to analyse my data?

No, it is not within the purview of the CLSA to advise approved users on statistical analyses for approved projects. Data Support Documentation is available under the Researchers tab of our website, including a detailed document on the use of Sampling Weights. For further help, please consult with a statistician.

How can I request biospecimens?

The anticipated release date for biospecimens is 2019. There will be one application deadline to submit biospecimen access requests per year. The application deadline is yet to be determined, and will be posted on our website. The form to request biospecimens (i.e. Part 3 of the CLSA Data and Biospecimen Request Application) will be made available at a later date. For further questions related to biospecimen access, please contact the Biorepository and Bioanalysis Centre (BBC) at bbc@clsa-elcv.ca.

How long do I have to use and analyze the data once I receive them?

Researchers who have received data will have a specified time period within which the proposed analyses must be completed. This timeframe is defined in the CLSA Access Agreement. If the analyses are not completed in this time frame, the applicant must either submit a request for a time extension or their data access agreement may be terminated (CLSA Access Agreement, Section 13.2).  The CLSA will monitor the approved applications for adherence to the timeline. To make a request for a timeline extension, please request an Amendment Form by sending an email to access@clsa-elcv.ca.

Can I share the data?

Strict CLSA security and confidentiality rules are in place governing the use of CLSA data. The CLSA requires users to sign a CLSA Access Agreement that details the specific uses of data and the CLSA’s expectations with regard to privacy and confidentiality. Only the Primary Applicant and the Project Team members who have signed Schedule F of the CLSA Access Agreement are allowed to have direct access to the raw data. No approved user or member of their research team is allowed to share in whole, or in part the CLSA dataset with individuals who have not signed Schedule F of the CLSA Access Agreement. 

Can I add an investigator or a student to the Project Team, so that they may have access to the data?

Yes, you can add personnel to your study while your CLSA Access Agreement is valid.  Please request an Amendment Form by sending an email to access@clsa-elcv.ca with ‘Amendment Form Request’ in the subject line of your email. Once completed, please return the form via access@clsa-elcv.ca. Only once your amendment has been approved, and the CLSA Access Agreement has been amended (if required), can you allow the new person(s) access to the dataset and/or biospecimens.

What if there appears to be an error or omission in the data that I receive?

The CLSA takes great care to check the accuracy and completeness of the data prior to release. However, because of the size of the dataset and the large number of variables, we cannot guarantee the accuracy, completeness, or fitness for any particular purpose of the data. It is the responsibility of each data user to verify their dataset. If you think your data are incomplete or if you identify errors while conducting your analyses, please contact us at access@clsa-elcv.ca with any questions.

Occasionally, there may be a change in the data after you have already received your dataset. If this occurs, we will send a Data Release Update to all approved users, explaining the change(s). You will be able to request the updated dataset if relevant to your study.

Can data access be expedited?

Data access cannot be expedited; all interested researchers are required to follow the same application procedures to gain access to the CLSA dataset. Please see the Data Access Process and Data Release Timeline sections of our website for further information.


Will the CLSA dataset be linked to provincial health administrative databases across Canada? When will these data be available?

CLSA is working centrally on strategies to link individual level CLSA data with data from health administrative databases across Canada. Please continue to monitor the website for updates.

Are CLSA data available in Research Data Centres (RDC)?

No, currently, CLSA data are only available through a direct application to the CLSA. For more information on how to apply, please consult the Data Access Application Process section of our website.


Where can I find publications about and using the CLSA dataset?

Publications about the CLSA as well as those using CLSA data, can be found under Publications, in the Stay Informed section of our website.

Who owns the intellectual property arising from my use of CLSA data/samples?

The CLSA and its lead institution (McMaster University) do not claim any ownership of, or exploitation rights to any intellectual property (IP) arising from approved research projects using CLSA data/samples.

Ownership of any IP resulting from research using CLSA data/samples shall be in accordance with the policies of the institution of the researcher(s), or where applicable, the terms of any existing agreements between the researcher’s institution(s) and third parties or any relevant research contract applicable to the development of the IP.

Given the public nature of the CLSA research platform, it aims to promote a wide and accessible distribution of knowledge developed through the use of this resource and achieve maximum public benefit. Thus, CLSA data and sample users are strongly encouraged to make their results (including research tools) rapidly and widely available to the scientific community.


Who can apply to use CLSA data?

The CLSA alphanumeric data are currently available to approved public sector researchers, with no preferential or exclusive access for any individual. The CLSA welcomes applications from graduate students and postdoctoral fellows who wish to use data for their thesis research or for their postdoctoral work, respectively. For student applications (MSc, PhD), the primary applicant must be the supervisor and the student must be clearly identified. Postdoctoral fellows requesting a fee waiver must apply as a primary applicant but the application must be co-signed by their supervisor. 

As an international researcher, can I apply for CLSA data?

Yes, investigators affiliated with public research organizations outside Canada can apply to access alphanumeric data collected as part of the CLSA.

Currently there is no provision to transfer biospecimens to applicants outside of Canada, however, international researchers may choose to collaborate with Canadian researchers to access biospecimens, as long as the biospecimens are analyzed in Canada.

Does my project need to have secured funding before I apply?

No, you do not need to secure funding before applying to request CLSA data. If funding has been requested, but not yet approved, please provide the name of the funding agency in Section A5 of Part 1 of the CLSA Data and Biospecimen Request Application.

Do I have to obtain ethics approval for my project?

Yes. Please note that ethics approval is not required at the time of the application to use CLSA data, but no data or biospecimens will be released until proof of ethics approval has been received by the CLSA. Should your institution not require a full ethical review for the use of de-identified data, please provide a letter from your Institutional Review Board to this effect. Ethics approval must be obtained only from the Primary Applicant’s institution, not from all of the institutions of the members of the Project Team.

What are the fees for access to CLSA data?

Currently the charge for partial cost recovery for retrieval and preparation of a dataset based on data from the Baseline assessment of the CLSA cohort is $3,000. The Baseline assessment includes data from 21,241 participants in the Tracking assessment, and 30,097 participants in the Comprehensive assessment and from all Tracking and Comprehensive participants who completed the Maintaining Contact Questionnaire. The data access fees are payable for the retrieval and preparation of the dataset per approved project, not for each project team member.

Additional fees may be applied for data access requests that require more complex customization of datasets.

Do trainees have to pay for access to CLSA data?

Graduate students (M.Sc. or Ph.D.) who wish to obtain the CLSA data for the sole purpose of their thesis and postdoctoral fellows (limit 1 waiver per postdoc) who wish to obtain the CLSA data for the sole purpose of their postdoctoral project who are enrolled at Canadian institutions can apply for a fee waiver. Canadian trainees working outside Canada but funded through a Canadian source are also eligible for a fee waiver. The request for a fee waiver must be checked in Part 1 of the CLSA Data and Biospecimen Access Request Application. 

If I have a student or trainee as part of my Project Team, am I eligible to get a fee waiver for data access?

To be eligible for a fee waiver, the trainee must be enrolled at a Canadian university as a graduate student (M.Sc. or Ph.D.) who wishes to obtain the CLSA data for the sole purpose of their thesis or a postdoctoral fellow (limit 1 waiver per postdoc) who wishes to obtain the CLSA data for the sole purpose of their postdoctoral project. Canadian trainees working outside Canada but funded through a Canadian peer review agency are also eligible. Simply having trainees as part of the Project Team does not satisfy criteria for eligibility for a fee waiver.

Will my proposal undergo a scientific peer review?

Evidence of peer-reviewed funding will be considered evidence of scientific review for data access applications. If there are no plans to submit an application for financial support for your project, please provide evidence of peer review (e.g. internal Departmental review; thesis protocol defense, etc.) if available. If no evidence of scientific peer review is provided with the application, then the project will undergo scientific review by the Data and Sample Access Committee.

How long will it take to receive my CLSA dataset?

Once you submit your CLSA Data and Biospecimen Request Application, you will receive an auto-reply email confirming that your submission has been received. You will be contacted once the review process is complete or sooner if additional information is required. You will be notified about the approval status of your application approximately 3 months after the submission deadline.If your application is approved, a CLSA Access Agreement must be negotiated and signed between McMaster University and your institution. This part of the process can take a variable length of time (up to an additional 3 months) and is not under the control of the CLSA. You will also need to provide evidence of ethics approval for your project, if you had not done so within your initial application. Please be aware that these steps may affect the length of time that it takes for the data to be released to you. Once all parties have signed the CLSA Access Agreement and proof of ethics approval has been received by the CLSA, your data will be released within 7 – 10 working days. When planning for your project, please include in your timeframe at least six (6) months from the application submission deadline to the time you receive your dataset.

Can my application be rejected?

The goal of the CLSA is to enable data access as far as possible. There may be some instances when an applicant is asked to revise and resubmit an application at the recommendation of the Data and Sample Access Committee (DSAC). To avoid applications being sent to the DSAC that are not appropriate, we try to work with interested researchers before the application is submitted, to provide them with information about the available data and feasibility of a project. During the application process itself, we ask applicants to correct errors and omissions and provide feedback from the DSAC review, so that applicants can clarify or revise & resubmit a proposal.

I have already received the Tracking dataset. What do I need to do to get the Comprehensive and Maintaining Contact datasets as well?

The Comprehensive and Maintaining Contact (MCQ) datasets became available for access in the spring of 2016 and projects that were approved before this release, were approved to receive Tracking cohort data only. Should you be interested in the Comprehensive or MCQ data, we invite you to contact us via access@clsa-elcv.ca to request an Amendment Form. 

What do I do if I have problems completing the application forms?

Should you encounter any issues completing the application forms, please contact us via access@clsa-elcv.ca


When will data from the first Follow-Up be available?

We are currently in the data collection phase for the first follow-up, which is due to end in mid-2018. The data from the first follow-up (FUP1) will be verified and prepared for release before they can be made available. We anticipate first release of FUP1 alphanumeric data to start in Spring 2019. 

If I already have an approved CLSA project, do I get access to Follow-Up data as well?

When the Follow-up data are released all current and new users will be required to complete a new CLSA Data and Biospecimen Request Application to access Follow-Up 1 (FUP1) data. FUP 1 data cannot be requested through an Amendment to an existing project. The partial cost recovery fees for access to FUP1 data have not been determined yet.

Can I publish multiple peer-reviewed manuscripts based on my approved CLSA project?

As a publicly funded research platform, the CLSA encourages the dissemination of research findings from approved projects. The CLSA expects users to publish their findings in peer-reviewed journals. Multiple publications may be prepared based on a single approved project as long as the publications are directly linked to the objectives of the approved project.

Final drafts of all manuscripts describing research using CLSA data and/or biospecimens must be sent to the CLSA for review at least 15 working days prior to anticipated submission to the journal. Please review our Publication Policy for additional information.

Data Preview Portal

What is the best browser to use to explore the DataPreview Portal?

The DataPreview Portal works best with Chrome and Firefox. Mac OS users can also use Safari. Internet Explorer appears to have limited functionality, and is currently not recommended. 

How can the variables list be searched or filtered?

From the Variables Search, accessed through the DataPreview Portal, full-text searches can be carried out on either the variable Name or Label. The variables can also be filtered by classification through the clickable boxes on the left. A Help button is available on the DataPreview Portal page, next to the main search bar, with detailed instructions. To see all of the questionnaires used by the CLSA, you can visit the Data Collection Tools under the Researchers tab.


What variables are included in each questionnaire?

To obtain all the variables contained in a questionnaire, type the two or three letter prefix (e.g. SDC for socio-demographic variables) into the full-text search box in the Variables listing. You can also use more general terms such as ‘food’, ‘work’, etc. to find variables related to those terms, however, search terms are not exhaustive. For more information on the variables included in a questionnaire, please visit the Data Collection Tools under the Researchers tab.

How are multiple choice questions represented as variables in the CLSA dataset?

Multiple choice questions are either represented by a single variable or multiple variables, depending on what the question allows:

- A question allowing only one response is represented by a single variable that can take on multiple values. Open-text responses are permitted in many questions; common and distinct responses are recoded to create new categories within the variable itself.

-  For a question allowing multiple responses, each possible response category is assigned its own binary variable. Open-text responses are also permitted in many of these questions; common and distinct options are also recoded to create additional variables within the question scope. The number of variables corresponding to that question matches the number of response options.  

Where can I find information about response categories for multiple choice questions?

In the variable view, clicking on the name of each variable reveals information that is more detailed. For example, under Categories, the variable information page will include the following information:

•           Name:  the value entered for a response in the questionnaire;

•           Label:  the response (or response category) corresponding to each value (Name);

•           Missing:  values corresponding to a question not answered, (don’t know or not applicable, refused). Summary statistics reported on the DataPreview Portal exclude missing values.

Where can I find supplementary information about each of the variables?

Clicking on the variable name, in the variable view reveals more detailed information about that variable. (This function is not available for all study variables.) This information includes the question pertinent to the variable, the variable label, a list of the response option categories and some automatically generated summary statistics. In some instances, additional notes on skip patterns or references are included as well.

What do blank values for a variable represent?

In general, variables in the CLSA dataset reflect the interview process. In some cases, follow-up questions were only asked if specific answers were given to preceding questions. Blank values represent valid skip patterns. For example, number of daughters and sons are only asked if the participant answered they have at least one child. In the CLSA dataset, participants with no children will have blank values for both.

How were “don’t know” or “refused to answer” responses coded in the CLSA dataset?

There are specific codes assigned to participants’ “don’t know” (or no answer) and “refused to answer” responses. On the DataPreview Portal, these codes are marked as “missing” and are excluded in summary statistic calculations. These observations should usually be excluded when calculating statistics in basic summaries. For questions with possible multiple answers, variables were created for don’t know (containing DK_NA in the name) and refused to answer (REFUSED) categories.

What are derived variables?

Within the CLSA, derived variables (DVs) are variables that are created from other variables. DVs are ‘derived’ by re-grouping or re-classifying the original variables, to glean information otherwise not available. Some DVs are based on published measures or scales. You will find documentation related to DVs on our Data Support Documentation page, under the Researchers tab of our website.

How many participants are part of the CLSA?

At Baseline, 21,241 participants were enrolled in the Tracking cohort and 30,097 participants in the Comprehensive cohort for a total of 51,338 CLSA participants. 

What were the Baseline exclusion criteria?

Please refer to Section 5.3 of the CLSA Protocol, available under the Researchers section of our website

When were Baseline data collected?

Periods of data collection for the Baseline assessments were as follows:

Baseline Tracking: 2011-09 to 2014-05

Baseline Comprehensive: 2011-12 to 2015-07

Maintaining Contact Questionnaire (MCQ) Tracking: 2013-09 to 2016-02

Maintaining Contact Questionnaire (MCQ) Comprehensive: 2014-05 to 2016-01

Are there any differences in questionnaire content between the telephone interview and the in-home interview?

Baseline Tracking Telephone interview and the Comprehensive In-Home Questionnaire are very similar in content, while the Tracking and Comprehensive Maintaining Contact Questionnaires contain somewhat different content. For downloadable copies of the questionnaires, we recommend visiting the Data Collection Tools, under the Researchers section of our website. 


How is participant death captured in the CLSA?

Participant death is currently captured in 3 ways: 1) from the next of kin contacting the CLSA directly, 2) through the ‘maintaining contact’ telephone calls that occur between main waves of data collection, or 3) from linkage to provincial vital statistics . Mortality data are not yet available.