Imports, exports and tariffs are quite the theme in the news these days, be it in the context of Brexit, the US-China trade war or the Iran nuclear deal. Executive decisions on what duties should be levied on goods crossing borders are the norm of the day. Have you ever wondered how these decisions are practically implemented at the ground-level though? The answer – Harmonized Tariff Schedules (HTS), a taxonomy built by the World Customs Organization (WCO) to classify and define internationally traded goods. Semantics3 offers automated HTS code classification solutions to help logistics providers modernize their customs workflows.
Harmonized Tariff Schedule (HTS) code classification is a surprisingly challenging machine learning problem – while at face value it is a simple multi-label classification, the real-world specifics are often deceptively intractable:
- For starters, the quality of data available from most sources is rather poor, so automated decision making systems have to learn to pull in external knowledge, and to develop a good understanding of understood norms.
- In addition, target code classes change across geographies and with time, requiring algorithms to keep an eye out for stale data.
- What’s more, it’s surprisingly difficult to have trained human annotators agree on what the right HS code for a given product should be – in datasets annotated by trained professionals, we usually see differing labels for the same product at least 30% of the time.
How do you build automated systems that can deal with these challenges? In this article, I’ll cover five techniques that have helped us deal with these problems.