Posts by Tags

ESG

ESG Ratings: KLD data

3 minute read

Published:

In this post, I will introduce the ESG ratings provided by KLD (Kinder, Lydenberg, and Domini).

Econometrics

Synthetic DID and its implementation in Stata

9 minute read

Published:

As one can tell from the name, synthetic DID is a combination of synthetic control and DID. One main challenge of DID is the validation of parallel trend assumption, as it not always holds in the data that treated and control groups have identical trend or characteristics prior to the treatment event. Built on the method of synthetic control, synthetic DID loosens the needs for parallel assumption by constructing matched control units.

“Difference-in-Differences When Parallel Trends Might Be Violated” by Jonathan Roth

6 minute read

Published:

The Validity of difference-in-differences relies on the parallel trends (PT) assumption. Namely, in the absence of the treatment, the treatment and control groups would have followed a similar trend over time. This assumption allows us to attribute any difference in outcomes after the treatment to the treatment itself, rather than other confounding factors.

ML

Evaluate Performance of Classification Models and its Visualization in Python

10 minute read

Published:

If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.

NLP

OCR Tools in Python: Azure Vision and Tesseract

7 minute read

Published:

Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:

How ChatGPT Changes Our Life?

4 minute read

Published:

ChatGPT has been under the spotlight in recent times, garnering significant attention from people across the world. In this article, I will share with you how ChatGPT changes my daily workflow, and pose open questions for further discussion.

Word Embedding (I): Count-based Word Vectors

14 minute read

Published:

If you are interested in NLP (natural language processing), you probably have heard of word embedding. In simple words, we use word embedding to map words to real vectors. It is the first step for computers to understand human language.

Python

OCR Tools in Python: Azure Vision and Tesseract

7 minute read

Published:

Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:

Accelerating Data Processing in Python with Parallelism

3 minute read

Published:

Parallel data processing is a technique used to speed up data processing tasks by dividing the data into smaller chunks and processing them simultaneously on multiple processors. It could be useful as long as do not need to be implemented sequentially. In contrast to threading, which is suitable for tasks that are I/O bounded but not tasks that are CPU bounded, multiprocessing runs on multiple cores and helps when tasks are CPU bounded. Python provides several ways to implement parallel data processing.

How to Customize My Plot with Matplotlib?

7 minute read

Published:

Matplotlib is a powerful data visualization library in Python that offers many customization options for plotting. In this post, I will introduce some of the most common customization options in Matplotlib.

Evaluate Performance of Classification Models and its Visualization in Python

10 minute read

Published:

If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.

Website

How to Build Your Personal Website?

6 minute read

Published:

Designing a personal website may appear challenging, yet it is much simpler than it seems. In this article, I will share my personal experience when creating my website.