Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Synthetic DID and its implementation in Stata

9 minute read

Published:

As one can tell from the name, synthetic DID is a combination of synthetic control and DID. One main challenge of DID is the validation of parallel trend assumption, as it not always holds in the data that treated and control groups have identical trend or characteristics prior to the treatment event. Built on the method of synthetic control, synthetic DID loosens the needs for parallel assumption by constructing matched control units.

OCR Tools in Python: Azure Vision and Tesseract

7 minute read

Published:

Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:

Accelerating Data Processing in Python with Parallelism

3 minute read

Published:

Parallel data processing is a technique used to speed up data processing tasks by dividing the data into smaller chunks and processing them simultaneously on multiple processors. It could be useful as long as do not need to be implemented sequentially. In contrast to threading, which is suitable for tasks that are I/O bounded but not tasks that are CPU bounded, multiprocessing runs on multiple cores and helps when tasks are CPU bounded. Python provides several ways to implement parallel data processing.

“Difference-in-Differences When Parallel Trends Might Be Violated” by Jonathan Roth

6 minute read

Published:

The Validity of difference-in-differences relies on the parallel trends (PT) assumption. Namely, in the absence of the treatment, the treatment and control groups would have followed a similar trend over time. This assumption allows us to attribute any difference in outcomes after the treatment to the treatment itself, rather than other confounding factors.

How to Build Your Personal Website?

6 minute read

Published:

Designing a personal website may appear challenging, yet it is much simpler than it seems. In this article, I will share my personal experience when creating my website.

How ChatGPT Changes Our Life?

4 minute read

Published:

ChatGPT has been under the spotlight in recent times, garnering significant attention from people across the world. In this article, I will share with you how ChatGPT changes my daily workflow, and pose open questions for further discussion.

How to Customize My Plot with Matplotlib?

7 minute read

Published:

Matplotlib is a powerful data visualization library in Python that offers many customization options for plotting. In this post, I will introduce some of the most common customization options in Matplotlib.

ESG Ratings: KLD data

3 minute read

Published:

In this post, I will introduce the ESG ratings provided by KLD (Kinder, Lydenberg, and Domini).

Word Embedding (I): Count-based Word Vectors

14 minute read

Published:

If you are interested in NLP (natural language processing), you probably have heard of word embedding. In simple words, we use word embedding to map words to real vectors. It is the first step for computers to understand human language.

Evaluate Performance of Classification Models and its Visualization in Python

10 minute read

Published:

If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.

portfolio

publications

talks

teaching