2023 Spring Course: Computational Social Science Methods (Text and Network Analysis)

The course may also be cross-listed at UT’s soc, info, and govt departments.

The course has demanding prerequisites (https://css.jima.me/prerequisites/), and I hope to recruit highly motivated students to join the self-learning group which helps them meet the requirements. Students who are interested in this course should prepare for the prerequisites over the summer and fall so that they can meet the requirements before the 2023 spring. Course details are below.

Computational Social Science Methods (2023 Spring, graduate course)

This course introduces computational social science methods and contextualizes these methods within the social science research design. The first part of this course (w1–w3) gives you an overview of this course, programming fundamentals, and how to use high-performance cloud computing resources (https://www.tacc.utexas.edu/systems/chameleon). The second part (w4–w12) is analysis-oriented and primarily covers text analysis (w4–w8; with an emphasis on multilingual language analysis) and network analysis (w9–w12). The last few weeks focus on research design with computational methods and the final project. Bilingual or multilingual language ability is a plus. Programming is an essential part of this course but not the purpose and will not be taught. We will be coding for social good.

The course has demanding prerequisites (https://css.jima.me/prerequisites); therefore, students may need to work on the prerequisites in 2022 summer and fall if they are highly motivated. All registrations need to be approved by the instructor in late 2022 fall. Students who are interested in this course can join the learning group (https://uta-css.slack.com/) where more learning resources will be shared.

Research Assistant in natural language processing

I have a 10-hour Graduate Research Assistant position available in 2021 spring (for UT students only). The project will examine bias and social stereotypes in the nonprofit sector using computational methods. Publication and authorship are possible depending on contribution. The successful applicant is expected to have the following qualifications:

  • Use Python as a primary coding language.
  • Familiar with web crawling and markup languages (e.g., XML and HTML).
  • Proficient in natural language processing, contextual word embedding (e.g., BERT) in particular.
  • Know how to examine and process bias in word embeddings (e.g., https://arxiv.org/abs/1607.06520 ).
  • Knowledge in sociology or psychology is a plus.

Please send your 1) CV, 2) a letter stating your qualifications (300 words max), 3) source codes of your project meeting the qualifications, to maji@austin.utexas.edu.

Application deadline: 2021-01-10

Build your own computing cluster on ChameleonCloud

Social scientists also run heavy computational jobs. In one of my projects, I need to analyze the psychological state of a few billion Telegram messages. ChameleonCloud provides hosts with up to 64 cores (or “threads”, sometimes “workers”, yes these terms are confusing but CS folks to blame). But even with parallel computing on the best server, the job will run for years, and I need this project for tenure.

Continue reading “Build your own computing cluster on ChameleonCloud”

Operating large files on ChameleonCloud

I primarily use Chameleon Cloud (CC) for my research projects. It provides great flexibility because I can run bare-metal servers (e.g., 44 threads/cores, 128G+ RAM) for a seven-day lease which is also renewable if the hosts I’m using are not booked by others. Its supporting team is also amazing.

But everything becomes slow if you are working with a really big dataset. For example, I’m working on a Telegram project and have 1TB+ data. This really gets me a headache. Well, the CC machines are able to handle this but need extra configurations.

Continue reading “Operating large files on ChameleonCloud”

Lineage–the Yangs

In 2019 August, we finished our fieldwork in two rural villages in southeast China. The graph below shows the self-governance organizations weave together through local elites (xiangxian). I wrote a non-academic article introducing our work, which was featured in the Nonprofit Academic Centers Council’s monthly newsletter and IC2’s website. You can read the full article here.

2020年暑期混合研究方法培训相关内容

2020年暑期培训

  • 日程:待定
  • 地点:待定

既往培训日程及相关资料

学术会议参会基本礼仪

刚刚参加完2019年的MPSA(美国中西部政治学年会),回忆这些年参加的诸多不同领域的年会,颇有感触,但最深刻的却和学术并无关系,而是参会过程中打过交道的形形色色的人。他们中有学术巨擘,有无名学生,有高傲的聪明人,也有谦卑的实干家。不管哪个学术会议,华人学者的参会人数都在过去几年激增。这篇短文结合我自己这些年的参会经验,总结一些基本礼仪。

值得说明的是,写这篇文章的目的并不是因为华人学者的参会礼仪有问题,而是因为不管来自哪里,普遍都有参会礼仪有问题的学者,也有非常值得我们学习的榜样。作为华人学者的一员,我希望我们不仅能够向世界展示我们一流的研究,也能够像世界展示我们一流的风范。

Continue reading “学术会议参会基本礼仪”

[Voluntas] A Century of Nonprofit Studies: Scaling the Knowledge of the Field

I started to work on this project since early 2015, and the first paper is finally accepted in Voluntas today, which is my civil calendar birthday. Although our family tradition is to use the Chinese lunar calendar, still a nice gift.

Sara and I started to work on the first draft at Mo’Joe Coffeehouse, which was permanently closed in June this year. Another coffeehouse, Thirsty Scholar, was also closed around the same time. Lots of memories with friends in both places.

There are at least three versions of this paper. The first draft almost entirely relied on a citation analysis software package named CiteSpace. It was a very simple paper but it helped me get familiar with relevant concepts and methodology and cleaned a part of the dataset used in the final analysis. In the second draft, I started to write Python scripts for processing and analyzing data. In early 2017, while I was waiting for my wife, parents, and parents-in-law at Kuala Lumpur airport to start a wonderful journey in Malaysia, I received the rejection from a journal. Then I tried to rewrite the whole paper to analyze the literature published in the last century. I still remember the classroom in which I crawled the first hundreds of records – it was a classroom on the first floor of Teaching Building 2 in Beijing Normal University, where I also spent many nights for preparing my Ph.D. application. I then had a lunch with a good friend who just returned from UPenn about a year ago. She said she felt her heart was in peace, and she was sure about the direction of her career. That was a day in March, Beijing was snowing heavily.

In late June of this year, I submitted the third draft to Voluntas in the office at IQSS, where Prof. Peter Bol treated me so well. We got “minor revision” in early August, and I had a phone call with Sara on the third day after moving to Austin. Life was pretty hectic.

A paper for me has two meanings: the words and numbers for reviewers and readers, and the memories for myself. All things grow, I’m waiting and watching 万物并作,吾以观复.

Abstract

This empirical study examines knowledge production between 1925 and 2015 in nonprofit and philanthropic studies from quantitative and thematic perspectives. Quantitative results suggest that scholars in this field have been actively generating a considerable amount of literature and a solid intellectual base for developing this field towards a new discipline. Thematic analyses suggest that knowledge production in this field is also growing in cohesion – several main themes have been formed and actively advanced since the 1980s, and the study of volunteering can be identified as a unique core theme of this field. The lack of geographic and cultural diversity is a critical challenge for advancing nonprofit studies. New paradigms are needed for developing this research field and mitigating the tension between academia and practice. Methodological and pedagogical implications, limitations, and future studies are discussed.

Keywords: nonprofit and philanthropic studies; network analysis; knowledge production; paradigm shift; science mapping

Fulltext: https://papers.ssrn.com/abstract=2834121

Datasets in “state power and elite autonomy in a networked civil society”

The paper State power and elite autonomy in a networked civil society: The board interlocking of Chinese non-profits is published at Social Networks (Open Access, you can get the paper free of charge because we’ve paid for the knowledge we produced). Here are the hand-coded datasets in the paper. You are welcome to use as long as you give appropriate attribution.

All the datasets used in this paper are open to use, review, or replicate. Feel free to send me an email if you need more information.

Continue reading “Datasets in “state power and elite autonomy in a networked civil society””