Research Assistant in natural language processing

I have a 10-hour Graduate Research Assistant position available in 2021 spring (for UT students only). The project will examine bias and social stereotypes in the nonprofit sector using computational methods. Publication and authorship are possible depending on contribution. The successful applicant is expected to have the following qualifications:

  • Use Python as a primary coding language.
  • Familiar with web crawling and markup languages (e.g., XML and HTML).
  • Proficient in natural language processing, contextual word embedding (e.g., BERT) in particular.
  • Know how to examine and process bias in word embeddings (e.g., ).
  • Knowledge in sociology or psychology is a plus.

Please send your 1) CV, 2) a letter stating your qualifications (300 words max), 3) source codes of your project meeting the qualifications, to

Application deadline: 2021-01-10

Build your own computing cluster on ChameleonCloud

Social scientists also run heavy computational jobs. In one of my projects, I need to analyze the psychological state of a few billion Telegram messages. ChameleonCloud provides hosts with up to 64 cores (or “threads”, sometimes “workers”, yes these terms are confusing but CS folks to blame). But even with parallel computing on the best server, the job will run for years, and I need this project for tenure.

Continue reading “Build your own computing cluster on ChameleonCloud”

Operating large files on ChameleonCloud

I primarily use Chameleon Cloud (CC) for my research projects. It provides great flexibility because I can run bare-metal servers (e.g., 44 threads/cores, 128G+ RAM) for a seven-day lease which is also renewable if the hosts I’m using are not booked by others. Its supporting team is also amazing.

But everything becomes slow if you are working with a really big dataset. For example, I’m working on a Telegram project and have 1TB+ data. This really gets me a headache. Well, the CC machines are able to handle this but need extra configurations.

Continue reading “Operating large files on ChameleonCloud”

Lineage–the Yangs

In 2019 August, we finished our fieldwork in two rural villages in southeast China. The graph below shows the self-governance organizations weave together through local elites (xiangxian). I wrote a non-academic article introducing our work, which was featured in the Nonprofit Academic Centers Council’s monthly newsletter and IC2’s website. You can read the full article here.



  • 日程:待定
  • 地点:待定





Continue reading “学术会议参会基本礼仪”

[Voluntas] A Century of Nonprofit Studies: Scaling the Knowledge of the Field

I started to work on this project since early 2015, and the first paper is finally accepted in Voluntas today, which is my civil calendar birthday. Although our family tradition is to use the Chinese lunar calendar, still a nice gift.

Sara and I started to work on the first draft at Mo’Joe Coffeehouse, which was permanently closed in June this year. Another coffeehouse, Thirsty Scholar, was also closed around the same time. Lots of memories with friends in both places.

There are at least three versions of this paper. The first draft almost entirely relied on a citation analysis software package named CiteSpace. It was a very simple paper but it helped me get familiar with relevant concepts and methodology and cleaned a part of the dataset used in the final analysis. In the second draft, I started to write Python scripts for processing and analyzing data. In early 2017, while I was waiting for my wife, parents, and parents-in-law at Kuala Lumpur airport to start a wonderful journey in Malaysia, I received the rejection from a journal. Then I tried to rewrite the whole paper to analyze the literature published in the last century. I still remember the classroom in which I crawled the first hundreds of records – it was a classroom on the first floor of Teaching Building 2 in Beijing Normal University, where I also spent many nights for preparing my Ph.D. application. I then had a lunch with a good friend who just returned from UPenn about a year ago. She said she felt her heart was in peace, and she was sure about the direction of her career. That was a day in March, Beijing was snowing heavily.

In late June of this year, I submitted the third draft to Voluntas in the office at IQSS, where Prof. Peter Bol treated me so well. We got “minor revision” in early August, and I had a phone call with Sara on the third day after moving to Austin. Life was pretty hectic.

A paper for me has two meanings: the words and numbers for reviewers and readers, and the memories for myself. All things grow, I’m waiting and watching 万物并作,吾以观复.


This empirical study examines knowledge production between 1925 and 2015 in nonprofit and philanthropic studies from quantitative and thematic perspectives. Quantitative results suggest that scholars in this field have been actively generating a considerable amount of literature and a solid intellectual base for developing this field towards a new discipline. Thematic analyses suggest that knowledge production in this field is also growing in cohesion – several main themes have been formed and actively advanced since the 1980s, and the study of volunteering can be identified as a unique core theme of this field. The lack of geographic and cultural diversity is a critical challenge for advancing nonprofit studies. New paradigms are needed for developing this research field and mitigating the tension between academia and practice. Methodological and pedagogical implications, limitations, and future studies are discussed.

Keywords: nonprofit and philanthropic studies; network analysis; knowledge production; paradigm shift; science mapping


Datasets in “state power and elite autonomy in a networked civil society”

The paper State power and elite autonomy in a networked civil society: The board interlocking of Chinese non-profits is published at Social Networks (Open Access, you can get the paper free of charge because we’ve paid for the knowledge we produced). Here are the hand-coded datasets in the paper. You are welcome to use as long as you give appropriate attribution.

All the datasets used in this paper are open to use, review, or replicate. Feel free to send me an email if you need more information.

Continue reading “Datasets in “state power and elite autonomy in a networked civil society””

The research infrastructure of Chinese foundations, a database for Chinese civil society studies @Scientific Data

Ma, J., Wang, Q., Dong, C., & Li, H. (2017). The research infrastructure of Chinese foundations, a database for Chinese civil society studies. Scientific Data, 4, sdata201794.

Continue reading “The research infrastructure of Chinese foundations, a database for Chinese civil society studies @Scientific Data”