Mamta Aggarwal Rajnayak, Managing Director, AI Hub at Accenture India, and Reema Malhotra Chawla, Data Science Senior Manager at Accenture India share with Women Who Code about Accenture’s intelligent product analyzer. They discuss how Reema’s idea was introduced and how it can be helpful for clients.
Mamta: I've been with Accenture for the last seven years. I grew from a manager to a managing director over this time. I really thrive here because I can do so many different things. I work with my extraordinarily talented team who's working on cutting-edge technologies like machine learning, computer visions, speech processing, virtual agents, and etcetera. Accenture has a very strong patenting program where everyone in the organization can request to file their innovation with the patent office.
Reema: I joined Accenture straight out of college. In a span of eight years, I've had the opportunity to work across various facets of data science. It has helped me create an accelerated career path. I have gone from an analyst role to a manager role in about seven, seven, and a half years.
Mamta: When I'm talking to the CDOs, the CTOs, CEOs, the chief analytical officers, data officers and technology officers, specifically in the retail domain, what I'm hearing is when they are launching any new product, the product hierarchy takes them anything between 6 to 11 months. It takes a retailer so much time to come up with this hierarchy and the hierarchy defined is still not clean data. As a result, we end up spending a lot of time cleaning this data.
At Accenture, we have knowledge-sharing forums where anyone can come and showcase their work. These forums are think tanks. At times when we are listening to the solutions, we end up actually getting some clues about other complicated problems we are trying to solve. This is exactly what happened in this case.
Reema did a project on text mining to organize objects into classes. In listening to her project and the technical details, I felt that we could apply a lot of this concept to solve the prime problem that I'm talking about. It's called Intelligent Product Analyser. I've been working in this industry for 15 years. My clients have been asking for this kind of solution. Reema, why don't you tell us a little bit more about how we go about doing this?
Reema: So many clients have no way to recognize which product is being talked about in the description because these are all data terms that have never been classified and organized.
We recognized that text classification is probably not the means to get there. The means to get there is to have the ability to first recognize which product is being talked about in any given product description. And then, to know the attributes of a product which are also equally important. We need the ability to go beyond the SKU and actually look at analyzing the different facets.
There's always a starting point. When I started, I knew some text mining. I knew some of the things that we wanted to do. For example, recognizing a product name by creating your own named-entity recognizer (NER). We faced so many blocks. Accenture has a vast repository of learning and training material. Whenever we were stuck with a problem, there was a lot of in-house material that helped us. There's always that culture of learning and innovation.
Let’s talk about what an intelligent product analyzer is. When we were trying to build it, we knew that we would want to make something so robust that we could apply it to retail today, but to the supply chain or any other industry in the future. Things should be able to be re-used and the same models or architecture can be leveraged. We are a data-driven consulting firm. At times we don't have time to build things from scratch. The idea was instead of making just one engine, start to end, why not divide it into smaller engines and modules? They will take care of individual tasks. When we have a similar problem at hand, we may not want to use them all together, but we use only the tools that are really required for a particular problem.
At times when we go to our clients, some of the clients are very rigid about a particular hierarchy for products. Some may say they don't have a hierarchy. Then, we create a very generic, three-level hierarchy. The solution will be intelligent enough to classify similarly described products together and then propose a simpler three-level hierarchy that can be leveraged by the clients.
We are trying to build this entire mechanism that can spot a product name and then classify it, based on some amount of labeled data that was either present or that we have created. When a new product comes in, you don’t want to repeat the steps each time, so we will train the models in a way that they will remember how a particular product is usually described. All of that information is stored in our predictive models. We also have a sensitivity analysis to predict the strength of our probability or the prediction.
I'll talk about some of the modules now, for example, the name normalizer. People who have worked in text mining, know the pain of cleaning text data. There are no standardizations to text. Even with so many systems in place, people and organizations tend to do their own thing. For example, some acronyms are standard and understood, while others are not. So we compiled an entire dictionary on our own, based on our experience and some of the data that we had at hand. We are much better prepared to clean and pre-process data.
Let's say, you have to develop your own NER. There are open-source NERs available. The problem with those is that any named-entity recognizer usually is trained to recognize the name of a person, an organization, a location, or a date. A custom-made NER can help you identify a product name or its attributes, like color, brand name, etcetera.
To develop your own NER, you have to train the model. If somebody even thinks of doing it manually, it's going to be a humongous task, because again, these are deep learning-backed algorithms, so you need an enormous amount of data to even begin to train those. This is where we build our in-house annotator. This will reduce the task by 80-85%. It will give you an indication of what would be the probable product names, etcetera. Similarly, to expand the scope, we built it for other attributes, which could be the color, shape, size, brand name, etcetera.
We are creating a data science solution where we are trying to spot a product name and classify it automatically. There are so many channels from where data come in, that they are not able to actually assimilate all of this in a proper standardized way.
Mamta: Sometimes the description is not good enough. That's the reason for the next module that Reema and her team are working on. We don't really need to have a description of the product. If we just have a picture or video of the product, then you can apply image or video analytics to figure out what those products are. Then you can go ahead and define those hierarchies.