Data and Software

npoclass – Classify nonprofits using NTEE codes (LINK)

This research developed a machine-learning classifier that reliably automates the coding process using the National Taxonomy of Exempt Entities as a schema and remapped the U.S. nonprofit sector. I achieved 90% overall accuracy for classifying the nonprofits into nine broad categories and 88% for classifying them into 25 major groups. The intercoder reliabilities between algorithms and human coders measured by kappa statistics are in the “almost perfect” range of 0.80–1.00. The results suggest that a state-of-the-art machine-learning algorithm can approximate human coders and substantially improve researchers’ productivity. I also reassigned multiple category codes to over 439 thousand nonprofits and discovered a considerable amount of organizational activities that were previously ignored. The classifier is an essential methodological prerequisite for large-N and Big Data analyses, and the remapped U.S. nonprofit sector can serve as an important instrument for asking or reexamining fundamental questions of nonprofit studies.

Citation: Ma, Ji. 2020. “Automated Coding Using Machine-Learning and Remapping the U.S. Nonprofit Sector: A Guide and Benchmark.” Nonprofit and Voluntary Sector Quarterly forthcoming.

The Research Infrastructure of Chinese Foundations (RICF)

“A database of Chinese foundations, civil society, and social development in general. The structure of the RICF is deliberately designed and normalized according to the Three Normal Forms. The database schema consists of three major themes: foundations’ basic organizational profile (i.e., basic profile, board member, supervisor, staff, and related party tables), program information (i.e., program information, major program, program relationship, and major recipient tables), and financial information (i.e., financial position, financial activities, cash flow, activity overview, and large donation tables).”

Visit for more information (as of May 2019, I stopped updating this project regularly).

Citation: Ma, J., Wang, Q., Dong, C., and Li, H. (2017). The research infrastructure of Chinese foundations, a database for Chinese civil society studies. Scientific Data, 4:170094.

Citing Publications      Download Data

Datasets in “state power and elite autonomy in a networked civil society” (link)

“In response to failures of central planning, the Chinese government has experimented not only with free-market trade zones, but with allowing non-profit foundations to operate in a decentralized fashion. A network study shows how these foundations have connected together by sharing board members, in a structural parallel to what is seen in corporations in the United States and Europe. This board interlocking leads to the emergence of an elite group with privileged network positions. While the presence of government officials on non-profit boards is widespread, government officials are much less common in a subgroup of foundations that control just over half of all revenue in the network. This subgroup, associated with business elites, not only enjoys higher levels of within-elite links, but even preferentially excludes government officials from the NGOs with higher degree. The emergence of this structurally autonomous sphere is associated with major political and social events in the state–society relationship. Cluster analysis reveals multiple internal components within this sphere that share similar levels of network influence. Rather than a core-periphery structure centered around government officials, the Chinese non-profit world appears to be a multipolar one of distinct elite groups, many of which achieve high levels of independence from direct government control.”

Citation: Ma, J., & DeDeo, S. (2018). State power and elite autonomy in a networked civil society: The board interlocking of Chinese non-profits. Social Networks, 54, 291–302.

Bibliographic Records of Nonprofit and Philanthropic Studies (1925-2016)

A collection of 12,016 bibliographic records from 19 journals published between 1925-2016 (both ends included) worldwide. Each bibliographic record consists of various data fields including the title of the citing article, author’s name, publication year, publication title, and the article’s cited references. The cited references have 311,212 entries representing journal articles, books, dissertations, and technical reports, etc. Because of copyright restrictions, the raw dataset cannot be posted publicly.

Citation: Ma, J., Konrath, S. A Century of Nonprofit Studies: Scaling the Knowledge of the Field. Voluntas 29, 1139–1158 (2018).

A list of papers on the research of nonprofit and philanthropic education.

Full-text corpus of People’s Daily (1946-2017)

The corpus includes over 1.6 million full-text records of People’s Daily, the official newspaper of the Chinese Communist Party. Good for researches in political science, public administration, sociology, and nonprofit studies, etc. Because of copyright restrictions, the raw dataset cannot be posted publicly.