Predicting the Demand of Products Sold Online

Can the demand for an item be predicted before its market launch?

How about 1 month into launch, once the initial numbers are in?

Which signals best prophesy future demand? Can these be controlled?

We set out to answer the questions above using the ~10 billion price & demand signals in our database, and our Universal Product Catalog. Our approach was to build a forecasting model tuned to predict publicly available sales metrics such as sales rank.

Click here to read the rest of this article on the Semantics3 blog

Introducing Attribute Extraction from User-Generated Content

Attribute Extraction from ecommerce data – the generation of structured fields from unstructured text – is a popular product offering of ours. Our customers use it to improve the quality of their search catalogs, and thereby, their search relevance, faceted search and ad targeting.

Thus far, the scope of this product offering has extended to catalog data, primarily product titles, description and specifications. Now, we’re extending this capability to process user-generated content, including customer questions & answers.

Catalog Data
User-Generated Content

By distilling factual information from customer content, these algorithms can help boost the number and relevancy of structured attributes on popular ecommerce product pages. 

Click here to read the rest of this article on the Semantics3 blog

To Read Less — My 2020 Resolution

Each December, I religiously set a list of personal targets for the new year. Usually, the first item on this list is a reading target, and inevitably, the target is to read more than I did the previous year.

This year though, my target is to read less than I did the previous year… cue shock, horror, cries of blasphemy! What’s gotten into you Govind?

I’ll get to my justification in a bit, but first, a bit of a back story that I think many may identify with.


Like most people I knew growing up, I viewed reading as the noblest virtue, but one that I didn’t do enough of. At times, I’d get through a couple of books a week, but at others, I’d find myself staring at the same page for hours on end. I tried to be deliberate about making reading more of a habit, but I had many many stretches of downtime that I was quite self-critical of.

Click here to read the rest of this article on Medium

Shipping Line, Port and Route Dynamics of US Shipment Imports

This is the first in an article series in which we attempt to unearth dynamics of the shipping industry, by analyzing publicly available import shipment data in the United States.

In this article, we’ll take a look at market share dynamics for shipping lines – businesses that transport cargo aboard ships – and see how they’ve varied over the last few years. We’ll also look at traffic into ports in the US, origin ports from which the shipments leave, and the route links taken. The goal in exploring these questions is to understand market dynamics around key players, geographical locations and entrenched behaviors in the market, and trace how dynamics have changed through the years as market forces have had their say.

The answers to these questions come from an analysis of publicly available import data from the United States Customs and Border Protection’s (CBP) Automated Manifest System. This data spans approximately five years, and covers 180 million data points. Here’s what we found.

Shipping Lines

Click here to read the rest of this article on the Semantics3 blog

Using AI to Automate Web Crawling

Writing crawlers to extract data from websites is a seemingly intractable problem. The issue is that while it’s easy to build a one-off crawler, writing systems that generalize across sites is not easy, since websites usually have distinct unique underlying patterns. What’s more, website structures change with time, so these systems have to be robust to change.

In the age of machine learning, is there a smarter, more hands-off way of doing crawling? This is a goal that we’ve been chipping away at for years now, and over time we’ve made decent progress in doing automated generalizable crawling for a specific domain — ecommerce. In this article, I’d like to describe the system that we’ve built and the algorithms behind it; this work is the subject of a recent patent filing by our team.

The goal of our automated crawling project

Our goal in this project is to extract the entire catalog of an ecommerce site given just its homepage URL (see image above). This involves three key challenges.

Click here to read the rest of this article on the Semantics3 blog

Experiments with Shrinking my Garbage Footprint

I wrote this post back in 2017, but left it languishing in my drafts folder. Present day reflection at the end of this article.

“Garbage City”. The city that I live in, Bengaluru, has been conferred this unceremonious moniker in one too many articles of late. An increase in waste generation, poor waste segregation practices, non-operational processing plants and apathy from the citizenry has spawned a growing environmental and health crisis in the city, which has in turn affected the aesthetic beauty of the “Garden City of India”.

And the rest of urban India isn’t far behind. Mumbai, Chennai (link to a previous post on garbage in Chennai), Delhi and Kolkata face their own equally daunting challenges. “According to a World Bank 2015 report, India produces 109,589 tonnes of municipal solid waste a day which is projected to triple to 376,639 tonnes a day by 2025.” [Ref.]

Confronted with these terrifying facts, what is a concerned citizen to do?

Click here to read the rest of this article on Medium

Neutralizing Emissions from International Air Travel — On Project CORSIA and its Shortcomings

If you were to take a flight from India to the United States, to which country would the carbon emissions produced be attributed to? To India, since that’s where the flight took off? An equal split between the two countries? Or to all the countries along the route that the flight takes?

This is not just a hypothetical question. It matters, because it informs graphs like this:

Credit: Union of Concerned Scientists

And these graphs are in turn important, because they tell us where the clock on the metaphorical time bomb of climate doom stands. They determine the targets that each country needs to achieve to keep temperature rise below 2 degrees.

So which country is it? The answer is … drumroll …

Click here to read the rest of this article on Medium

The Problem with the Way We Measure Carbon Emissions

I have been struck by how important measurement is to improving the human condition. You can achieve incredible progress if you set a clear goal and find a measure that will drive progress toward that goal. — Bill Gates

In the global effort to tackle climate change, volume of greenhouse gas emissions is perhaps the most important measure. This metric helps inform the targets that nations agree to in international arenas, and serves as a barometer for ongoing assessment of the impact of policy initiatives. That’s why, it’s important to build a strong understanding of how this metric is measured, understand any inherent biases that its measurement may carry, and counteract any economic misincentives that these biases might create.

Click here to read the rest of this article on Medium

The Ecommerce Knowledge Graph – Semantics3 Labs

Over the past 7 years, we’ve built an extensive Universal Product Catalog, by curating and understanding public data from across the public e-commerce web. This includes information about 100s of millions of products, ~1000 standardized attribute typesbillions of attribute values and tens of billions of pricing and ranking signals.

Now, as part of our latest research initiative, we’ve built an Ecommerce Knowledge Graph to harness the value of the relationships between the entities in our datasets. At the core of this graph is the set of relationships between the structured attributes that describe products in the catalog; the graph is also layered with the billions of relationships between products themselves through characteristics like shoppability, browsability and compatibility.


Click here to read the rest of this article on the Semantics3 blog