The European Journal of Social & Behavioural Sciences
Online ISSN: 2301-2218
European Publisher
A Framework to Assess Healthcare Data Quality
Table 1: Final Data Quality Framework with Definitions
Criteria | Measurement | Definition |
Accessibility | Assess which researchers need access to the data, and does access need to be authorised? | To ensure only those who need to use the dataset have access to the file |
Has the data been protected from deliberate bias? | Can the process of acquiring the dataset be traced? | |
Will the appropriate steps be undertaken to ensure the dataset cannot be damaged or misused? | Ensure the dataset is saved in a secure file for analysis | |
Relevance | Are the concepts in the dataset needed for the current user? | Refer to hypotheses and evaluate whether the dataset is relevant |
Are the produced statistics needed by the user? | Investigate whether statistics have been formulated and whether these could be used in the present research | |
Accuracy | Is the coefficient of variation available? | Compare the degree of variation from one data series to another |
What is the response rate? | Reported as a percentage of how many participants returned the data collection | |
Does the data represent a complete list of eligible persons or units? and not just a fraction of the list | Review the response rate and determine whether datasets were not submitted or incomplete. Depending on the severity of this issue, contact the data source or consider using statistical tests to account for missing values. | |
Is the imputation rate available? | How many fields have been inserted to account for missing data | |
Has the dataset been revised? | Check for number of revisions and ensure the researchers access the latest version | |
Were data cleansing methods used? | Investigate the responsible statistician, and review the cleansing methods | |
Reliability | Is the data generated based on protocols and procedures that do not change according to who is using them? So, is the data completely objective, independent of user or use? | Search for published guidelines for data collection, and examine the process. |
Are variables defined, and are these definitions standardised and based on a referenced source? | Determine whether definitions of variables are available | |
Timeliness | Can the amount of time between the dataset and reference point be calculated? | Important when planning further research and comparisons. |
Clarity | Is the metadata completed? | Imperative to assess data quality. Contact the source if metadata is not available |
Comparability | What is the length of the time-series? | The occurrence of the publication of the dataset |
Which geographical areas are used? And, can these be transformed into larger geographies? | List of geographical granularity, for example, County and District | |
Can the data be easily manipulated and presented as needed? | Can the dataset be modified to suit the researcher’s needs, for example, can units be converted? | |
Coherence | Taking the above questions into account, can the current data be compared to other datasets? | Prompts the researcher to reflect on the information |
Validity | Is engagement with researchers evident? | During the data collection process, and publication of the dataset were relevant researchers liaised with? |
Are the reports provisional and subject to change or have inaccuracies been reported separately? | Find out whether the report is provisional and/or search for documentation of inaccuracies | |
Is there evidence of positive reports and no negative reports on the findings? | Review the data source. Negative reports will be those that suggest that there are contradictions between different data sources for the same data. | |
Overall, does the dataset meet validation criteria? | Dependent on the aforementioned. Mark the dataset as Pass, Borderline or Fail. | |
Confidentiality | Does the dataset meet the BPS code of conduct for confidentiality? | Check the data contains no identifiable information |