Gaze estimation in the wild. Creating a ground-breaking eye-tracking system for in-context market research

Developing a successful Spark platform made our partners hungry for deepened market research in the wild. Thus, we partnered again to create something that hasn't been done before - a web eye-tracking solution for customer experience research in the natural environment.

Client

Name: eye square GmbH
Line of business: market research, human experience
Founding year: 1999
Size: 50-200 employees
Country: Germany (Berlin)

Challenge

After a successful collaboration on the Spark surveying platform, the client came back to us with a new project for deepening the research with a more hands-on experience, adopting gaze estimation to track the user’s attention while browsing content on their smartphones. The key issue is that such an approach is still highly innovative and uncharted territory.

Solution

Collecting training and test data using crowd-sourcing platforms and a dedicated web app
Creating AI-powered computer vision processing pipeline to detect the point of gaze
Developing a full-stack framework consisting of a set of web services, backend infrastructure, front-end SDK and web app
Continuous research & development of algorithms for accuracy improvement

Developers’ insights – the challenge

One of the biggest challenges was estimating the gaze point on the screen in a non-controlled environment with varying light and camera positions. Contemporary solutions rely on expensive hardware and extended sets of recorded people. Few solutions currently available on the market would do what ours does with just the built-in smartphone camera.

The challenge that we find to be most formidable is the complexity of the problem. We must perform several tasks simultaneously to find the touch point between the gaze and the screen surface. And these tasks are complicated research challenges on their own.

The biggest challenge is to achieve high accuracy of the point of gaze estimation, i.e. the point the smartphone user is looking at. Our client wants the margin of error not to exceed 1 cm for a wide range of phones and their users. It’s a difficult task for a variety of reasons. Firstly, every person is different, and their eyes have a unique shape, colour, etc. Secondly, each phone has a different camera, size and screen ratio. Thirdly, each use case has different lighting conditions, and the person holds the phone differently. All of these factors make the task and the ultimate goal so complex.

Python
PyTorch
OpenCV
MediaPipe
JavaScript
FastAPI
AWS
Docker
ClearML
DVC

The challenges of the innovation in computer vision

The collaboration on the web eye-tracking project evolved organically from the previous project we worked on – the Spark platform for market research. Given the successful results of that project, eye square requested to start working on a new solution that would allow them to deepen the research even more. They wanted to create a solution to track the customer’s attention while browsing the content on their smartphones for more accurate marketing and experience research. They needed an accurate system for tracking the user’s gaze over the surface of the phone’s screen. The project’s key challenge was that the eye-tracking technology for mobile phones was an innovative concept without a practical market application yet. Therefore, they needed competent experts and engineers to create something entirely from scratch. Moreover, the created environment had to be executable in a real-life context without any specific research environment. The aim was to create something that would work in the daily setting and regular phone use.

Setting the right course for the product

Our initial communication involved the company’s COO, Phillip Reiter, technical project managers – Garrit Güldenpfennig, Frederic Neitzel and Olaf Briese, and the CFO, Friedrich Jakobi. They outlined their expectations and needs for the project.

The company previously used several third-party solutions for laptop-related use cases, some of them required external HW. The main goal this time was to create a new solution that would be suitable for mobile phones. The initial agreements took approximately two weeks, after which the team started working on the Proof of Concept to ensure the visions were aligned before starting the subsequent phases of such a complex R&D project.

Building a team of gaze estimation and computer vision experts

Our team comprises a Senior Computer Vision Engineer, Machine Learning Engineers, DevOps specialists, frontend developers and a project manager.

eye square supported us with three technical Project Managers (one of whom has become a project coordinator), the CFO and a developer to provide extra help. Moreover, an essential part of their contribution was coordinating the crowd-sourcing platform to acquire testers and data sets. The technical project managers Garrit, Frederic and Olaf also provided their expertise and help whenever needed.

Technological aspects of the gaze estimation project

To create a solution that exceeds the state-of-the-art technology, we had to use the available resources to the maximum. The crucial part of the process was to create a stable foundation that would allow the creation and development of new features that would bring it closer to the final project and what it should look like.

We applied Python, PyTorch and OpenCV, among other libraries, to create the base algorithm. We later based the development on testing data from early tests and larger data gathered via the ClickWorker crowd-sourcing platform.
JavaScript was used to develop WETSDK, Training Data Collection App (TDCA) and an example web application illustrating the production use case.
FastAPI was used to develop a communication interface between the end user – the web app and the algorithm running on the backend server
AWS allowed us to store the training and validation data in the cloud
Docker made it easier to encapsulate the algorithm in self-contained SW images that can run in the cloud

Due to its complexity and innovation, the project must be divided into multiple stages and requires extensive research, including the “trial and error” approach. The biggest challenges involve the dynamic environment, as we wanted to create a solution that would work “in the wild”, without the need for any specific HW and with minimum prerequisites from the user.

This raised several obstacles, including the complexity of calibrating the phone camera. Phone screens & cameras differ from model to model. Therefore, it’s hard to find a generic estimation method, especially since phone manufacturers don’t disclose the physical dimension of devices.

Since the user needs to have complete freedom to use their phone, there’s the challenge of making sure that the gaze estimation algorithm can be auto-calibrated in different models of smartphones. It is a difficult task given different angles, distances, and face detection capabilities. Our neural networks are trained on different faces and angles to get a result similar to regular use.

Due to the vast crowd-sourced data from the ClickWorker platform, the team needs to evaluate the data quality. An automatic framework was developed to filter out recordings that don’t satisfy basic quality metrics, like proper lighting, lack of blurring, etc.

Developers’ Insights – overcoming obstacles

One of the most prominent obstacles we are proud to have overcome is putting the product elements together. Both parts are strongly dependent on each other, and thanks to putting them together, we can quickly transfer the solution into the cloud.

We needed to overcome the barrier of using our data set to train the neural network. Especially in our case, when acquiring the training data was more complicated.

One of the most prominent barriers we overcame was collecting enough data for training the neural network models. Such data exists but isn’t available for commercial use and doesn’t always match the specific use case. That’s why we’ve done a lot of work to collect and analyse a large amount of data from hundreds of people to create our own training set.

Developers’ insights – technologies

One of the most important technologies we use would be computer vision systems (based on Pytorch framework), linear algebra, and our skills involving careful reading and applying the solutions we found in different papers.

The technologies I found especially helpful were Python, Numpy, OpenCV, PyTorch, MediaPipe, Scipy, GPU and CUDA.

The WET project wouldn’t be possible without applying machine learning algorithms. We use deep learning elements at every stage of image processing – from face detection in the three-dimensional surface to gaze point estimation and like. We find Python and PyTorch to be invaluable there.

Milestones 1-3: developing a PoC
- Milestone 1 – achieving a certain level of algorithm accuracy (06.06-31.07.2022)
- Milestone 2 – creating SDK and an example of application design (05.08-30.09.2022)
- Milestone 3 – web eye tracking service and further improvement of the algorithm accuracy to meet the acceptance criteria (01.10-31.10.2022)
Milestone 4: continuous research, improving the algorithm and preparing the application to collect a large amount of data (TDCA – Test Data Collection Application) via ClickWorker – a crowd-sourcing platform
Milestone 5: processing the data from ClickWorker, adding TDCA features, and further working on the algorithm accuracy

The project started as a proof of concept. Initially, we aimed to achieve the required minimum algorithm accuracy.
Upon establishing the essential accuracy, our team worked on the SDK environment and application design ready for testing and data collection.
After establishing these features, we created a web eye-tracking service for further testing.
The next step involved continuous research in improving the algorithm and preparing the application for collecting more considerable amounts of data from the ClickWorker crowd-sourcing platform
Currently, we are working on the next round of testing to improve the algorithm’s accuracy to the maximum of 1 cm average point of gaze estimation error on a wide range of test subjects.

What were the key metrics of our journey?

80% – the target accuracy for the next phase
<1cm – the target margin of gaze point detection error

Outcomes and further steps towards reliable eye-tracking experience research

Even though the innovation threshold is set high for that project, the results are satisfying on both sides. The initial aim was to prepare the Proof of Concept. However, our partner keeps extending our work, as the results are good and the prospects promising.

We are gathering and processing more training data to improve the algorithm’s accuracy. We already exceeded the technological state-of-the-art and are continuing to work on achieving better accuracy and taking the next steps towards creating a working product.

Several elements of the processing pipeline developed by DAC.digital are currently considered for patent submission.

Words from eye square

It was the synergy between your coordination, efforts on our side, and all of your team effort. We are looking forward to the next project with you guys.

So, on this eye-tracking project in particular, we were very impressed with the competence of the team, the timelines, and how they are met. It’s very, good to work together, and you have a great project team on site and communication, as Michael mentioned, is very good.

So although the project is not yet over, we now have a prototype, and now the prototype has to become real live-action and has to be integrated in our technology. So these are the next steps on our journey. And we hope that DAC will continue to support us, as has before.

(…) we are confident in looking forward that we have a good partner on our side.