Guest Contribution: “Can Media and Text Analytics Provide Insights into Labour Market Conditions in China?”

Today we are pleased to present a guest contribution written by Jeannine Bailliu, Xinfen Han, Mark Kruger, Yu-Hsien Liu and Sri Thanabalasingam (all Bank of Canada). This research may support or challenge prevailing policy orthodoxy. Therefore, the views expressed in this paper are solely those of the authors and may differ from official Bank of Canada views. No responsibility for them should be attributed to the Bank.


Although issues have been raised with respect to many of China’s official statistics, those pertaining to the labour market have been seen as particularly problematic. The main problem with the official statistics is that while they capture formal employment, they do not appear to include migrant workers, who are typically engaged on an informal basis. The omission of migrant workers in Chinese labour statistics is problematic because they represent a large share of the labour force. It is generally agreed that the official unemployment rate underestimates the level of unemployment in China and has failed to capture what is known about key historical development in China’s labour market. It has remained fairly stable over time; notably, it did not increase by much during the global financial crisis (GFC) in spite of the significant employment loss over that period.

Bailliu et al (2018) utilize machine learning techniques, specifically text analytics, to construct a labour market conditions index (LMCI) for China by extracting labour market information from mainland Chinese-language newspapers over the period 2003 to 2017. We employ a supervised machine learning approach by training a support vector machine (SVM) in a two-stage process. In the first stage, we train our SVM to find articles that are relevant to the state of the Chinese labour market. In the second stage, we train the classifier to distinguish between articles that represent positive and negative labour market sentiment.

We find that the behaviour of our LMCI appears to be consistent with the economic shocks that have impacted the Chinese labour market (Figure 1):


Figure 1: Labour Market Conditions Index for China (2003-2017). Source: Bailliu et al. (2018)

The usefulness of our LMCI will depend on the extent to which it captures direct measures of labour market outcomes. Moreover, the LMCI’s value added needs to be assessed vis-a-vis the ability of the official measures of labour market activity. We test the usefulness of the LMCI to explain and predict the behavior of wages ands credit against official measures: the registered urban unemployment rate, the urban labour demand-supply ratio and the employment sub-indices of the purchasing managers’ indices. Our results suggest that, although each of the official labour market indicators does contain some information either for wage or for credit growth, the information in our LMCI is more consistent. Moreover, the LMCI provides wage and credit forecasts that are better than those from any single official labour market indicator.

Since our dataset covers newspapers from a range of Chinese cities, we can also analyze how regional labour market conditions may vary. To test this, we construct two LMCI sub-indices: one for the export-oriented coastal provinces and a second for the remaining inland provinces. We find that exports are a predictor of labour market conditions in the coastal region (and for the country as a whole) but not for the inland region.

These results suggest that the text analytics can be used to extract useful labour market information from Chinese media.


This post written by Jeannine Bailliu, Xinfen Han, Mark Kruger, Yu-Hsien Liu and Sri Thanabalasingam.

11 thoughts on “Guest Contribution: “Can Media and Text Analytics Provide Insights into Labour Market Conditions in China?”

  1. pgl

    I have to admit I know little about China’s labor market. I’m listening to Trump bloviate right now and he is just making up BS about the US labor market. It seems his tax cut was the biggest ever and his regulatory cuts were even bigger. And according to Trump – they have worked together to created the most explosive labor market ever. Blah, blah, blah. Oh wait – Kudlow is in the room so we can expect even bigger lies in a few minutes!

  2. 2slugbaits

    The authors identified the software package they used to translate the Chinese, but unfortunately they did not identify the SVM package they used. Was it one that they created? Was it some C++ wrapper around “libsvm”? It also wasn’t clear to me how many support vectors they used.

    1. Moses Herzog

      @ 2slugbaits
      I am only mildly ashamed to say you are talking above my head on SVM packages. However, I wonder if the answer you are looking for is in the Tobback2018 paper, as they say at least twice in the intro that they are “building off of” Tobback’s work. My guess is they have “tweaked” it ever so slightly from Tobback’s. Hope that helps with your question.

    2. Moses Herzog

      On the 3rd page of the Tobback paper (near bottom 2nd column) it says: “We use an SVM with a linear kernel and get the output of a linear model where each word is assigned a weight in favour of either class 1 (EPU) or -1 (no EPU)”

      Also, near to the very end of the Tobback paper it says “To encourage further research on the influence of uncertainty on the economy, a daily updated version of the EPU SVM indicator can be downloaded from our website http://www.applieddatamining.com

      And here (I think) is the specific link: http://applieddatamining.com/cms/?q=content/economic-policy-uncertainty-index

      Maybe if you download it, it tells more about the support vectors?? I was afraid it would take up huge amounts of computer storage.

  3. pgl

    Breaking news. Remember Michael Cohen’s claim that he had thousands of legal clients. Well make that only 10 clients with 7 of them being business consulting clients. So we were down to 3 – only 2 he was willing to disclose their names. The judge ordered Cohen to identify the third – and that would be Sean Hannity!

    The Onion is writing this entire tale!

  4. PeakTrader

    China’s working age population is declining, while its retired population is rising. They’re both accelerating, because of the one-child policy, slow population growth, little immigration, etc.. Median income is low and there isn’t much of a safety net (including for the disabled) or retirement benefits. China is heading into a demographic crisis – a rapidly shrinking poor working population with a rapidly growing poor retirement population.

  5. Moses Herzog

    We have some very smart commenters on this blog (you know who you are) because Menzie runs the content to attract the sharp cookies. So I need you guys help with something. I mean I am in dire need of help here.

    Do you guys know what “the fixin’ section” of the bookstore is?? I been thinking about this for the last 5–10 minutes and I’m going blank. What would “the fixin’ section” of the bookstore be?? Would that be for food?? “I’m fixin’ to make some waffles”?? Would it be like books about contract assassins “fixin’ to kill”?? See I really wanna know where “the fixin’ section” is because there is this former FBI guy who wrote a book, who is trying to save a country that used to be known for democracy, and save a country that used to take the better quality apples (i.e Jewish immigrants on the run) that Hitler was shaking out of the trees that is now being “led” by a orange-headed madman. And I been in Barnes and Noble, Borders books, B Daltons, Half-Price-Books, Waldenbooks and I never remember “the fixin’ section”. Is this some unusual discipline of learning Mike Huckabee acquired as a child in special ed??
    https://www.youtube.com/watch?v=1TX3hJ9xLY8

    P.S. Menzie, remember when I told you I knew many mainland Chinese who had never left their country who could speak better English than “native” English speakers?? Of those “native” English speakers who many mainland Chinese can speak better than, I think we can safely add “SHS” (the disheveled housewife with 6 empty Hostess snacks boxes on her living room coffee table) to this list.

Comments are closed.