Data Builder and cohort-extractor🔗
Missing features from cohort-extractor🔗
- Many features of cohort-extractor are not yet implemented in Data
Builder.
- The current development approach is to implement a few features in Data Builder fully end-to-end.
- See the ehrQL reference for a complete list of supported features.
- Data Builder has no current way to generate dummy data.
- You can supply a CSV file containing dummy data to Data Builder.
- It is possible to generate dummy data via the previous cohort-extractor.
The development plan for cohort-extractor🔗
cohort-extractor will continue to be supported by OpenSAFELY while Data Builder is in this initial design phase.
Once Data Builder is ready for general use, cohort-extractor will continue to be maintained, where possible, so that ongoing OpenSAFELY studies can continue to be run.
However:
- New features are likely to only be added to Data Builder.
- It may become infeasible to support cohort-extractor if the currently supported data backends undergo considerable change.
More detail for existing cohort-extractor users
The dataset definition used by Data Builder has the same underlying purpose as cohort-extractor's study definition.
To extract data, an OpenSAFELY research study would typically use one of:
- Data Builder with a dataset definition
- cohort-extractor with a study definition
Dataset definitions have a considerably different structure from the study definitions. You will need to refer to the new language to write a dataset definition.
Cohorts are now referred to as datasets. This accommodates the possibility of handling other types of data, other than purely patient data.
The main researcher-facing change with the introduction of Data Builder is the new language for extracting datasets. Data Builder does not affect the rest of the structure of an OpenSAFELY project.