Facilitating Trade:
Improving Customs Risk Management Systems
In the OIC Member States
43
and the data mining tasks to be performed. The major approaches to text mining, based on the
kinds of data they take as input are:
The keyword-based approach; where the input is a set of keywords or terms in the
documents;
The tagging approach, where the input is a set of tags; and
The information-extraction approach, which inputs semantic information, such as
events, facts, or entities uncovered by information extraction.
The data that can be used for risk assessment are from the intelligence database, analytical
intelligence reports, data from offense reports/protocols, data from criminal reports, data from
the mass media, etc. In this way, the CRM can enter into a process of self-learning about risk
assessment with the application of text mining. For example, Macedonian Customs
Administration has performed a practical test of the use of text mining in the process of customs
risk assessment of data from web news articles published in the media. This process consisted
of the following steps:
Collection of news articles from the web by using the keywords using RSS. This approach
allows quick collection of hundreds of textual bits of information about seizures or
customs fraud;
Structuring the information in a database, classified by keyword and relating to the text
(the keyword used to find the information);
Application of text mining techniques;
Obtaining results from the text mining techniques that can be in various forms (Rules,
Associations network, classification tree, etc.).
As a result, the data is presented in the formof a dependency network. The dependency network
presents the relating elements to drugs and cigarette smuggling - risky timeframe, mode of
transport and modus operandi.
The advantages of the use of text mining techniques for self-learning in regards to customs risk
assessment are huge since they allow extraction of knowledge of previously unknown events.
Without application of text mining techniques, it would be near to impossible to extract any
knowledge/ information from the collected information. As a good example is collection and
analysis of over 100 news articles containing over 200 pages in Macedonian Customs
Administration (se
e Figure 12).