Data Engineering in Automatic Machine Learning – Opportunities at University of Aberdeen – Scotland, UK‎

Application deadline date has been passed for this Job.
  • Post Date: October 15, 2021
  • Applications 0
  • Views 313
Job Overview

Data Engineering in Automatic Machine Learning

Data Cleaning in Machine Learning: Best Practices and Methods


Artificial intelligence (AI) using machine learning (ML) is being widely used in industry such as in finance, engineering, health care, energy, chemistry, etc. However, building and deploying ML models embedded in AI systems need humans to prepare data to be acceptable to ML models. It is estimated that data scientists may spend 80% of their time in data preparation; the data preparation process is more formally called data engineering (DE) [1]. Yet, more work in the field of ML is devoted to model fitting rather than data engineering, despite compelling evidence that ML projects that overlook DE failed to meet their project goals.

Automatic machine learning (AML) is the main research topic for this PhD project where focus will be on automating the development of AI systems end-to-end. Precisely, we want to develop the AML pipeline so that our AI system is able to read data directly, prepare it for modelling, build models for user tasks and produce the task results – all automated without significant human intervention. This means that our AML system will automatically complete the DE task.

We will develop DE methods for enabling AML. We will focus on various existing big data sets including the recent household energy consumption data [2] as one of our case studies. Specifically, we will develop machine learning methods to 1) automatically recognise the data schema by labelling the data features and names, 2) identifying and unifying feature names, 3) recognising and correcting data quality issues and 4) performing feature engineering for producing the data structures required by ML. Our aim is that we will have an AML model which is able to read data and produce the analysis results without significant human intervention.

Selection will be made on the basis of academic merit. The successful candidate should have, or expect to obtain, a UK Honours degree at 2.1 or above (or equivalent) in computer science or related subjects.

Formal applications can be completed online:

• Apply for Degree of Doctor of Philosophy in Computing Science

• State name of the lead supervisor as the Name of Proposed Supervisor

• State ‘Self-funded’ as Intended Source of Funding

• State the exact project title on the application form

When applying please ensure all required documents are attached:

• All degree certificates and transcripts (Undergraduate AND Postgraduate MSc-officially translated into English where necessary)

• Detailed CV, Personal Statement/Motivation Letter and Intended source of funding

Informal inquiries can be made to Dr M Zhong () with a copy of your curriculum vitae and cover letter. All general enquiries should be directed to the Postgraduate Research School ()

Funding Notes

This PhD project has no funding attached and is therefore available to students (UK/International) who are able to seek their own funding or sponsorship. Supervisors will not be able to respond to requests to source funding. Details of the cost of study can be found by visiting

Job Detail
  • Offered SalaryNot Specified
  • Career LevelNot Specified
  • ExperienceNot Specified
  • GenderBoth
  • INDUSTRYEducation
  • QualificationMaster's Degree(M.Sc.)
Shortlist Never pay anyone for job application test or interview.