In many areas of the social sciences, one of the most commonly used methods of research is the secondary analysis of publicly available files of data. The federal government as well as large data consolidation bureaus and consortiums provide public access to many data sets. Additionally, many federal funding programs as well as social science professional organizations and journals now require that researchers make the data they collect publicly available to encourage scholarly replication of research. Data may also available from previously IRB approved protocols where the data sets do not contain information that could be used to identify individual research participants.
Research Where IRB review is Not Required for Publicly Available Data
Under the federal regulations for human subjects (45 CFR Part 46), research involving publicly available data sets would not require IRB review – no application is required – as long as:
- the data come from sources that are publicly available, and
- the data is deidentified and uncoded and stripped of identifiers.
The University of Maryland, Baltimore County’s IRB has created a list of data holders whose archives include publicly available, de-identified data. Review this list and follow the respective links below to learn more about the access and download procedures each data source.
Caution: If you are designing a research project that merges more than one public data set and you recognize that this may increase the risk of identification of individual research participants, please contact the ORC.
Exempt Determination of Research with Identifiable Private Information
Research projects by investigators who initially have access to identifiable private information and then abstract the data needed for the research in such a way that the information can no longer be connected to the identity of the subjects would fall under exemption category 4 – secondary research use of identifiable private information or identifiable biospecimens where consent is not required. This means that the abstracted data set does not include direct identifiers (names, social security numbers, addresses, phone numbers, etc.) or indirect identifiers (codes or pseudonyms that are linked to the subject’s identity).
NOTE: The collection and analysis of protected health information (PHI) and personally identifiable information not allowed under this exempt category would likely fall under Expedited Review Category #5 and would require a data use agreement (DUA) [more info here]. Please see Columbia University’s Teacher College’s excellent review of Category #5 for more information.