Blog posts

I have benefited a lot from reading the blogs of others, for which I am deeply grateful. As a way to pay it forward, I would like to share some of my learnings with you here. All feedbacks and suggestions are truly welcomed!

View it by tags: Econometrics ESG ML NLP Python Website

2024

NLP in Finance and Accounting (VI): Generative LLM

7 minute read

Published: April 28, 2024

This is the sixth post of the series on NLP in finance and accounting. In this post, we will talk about the use of generative large language models (LLMs) in finance research.

Rise of Generative LLM
Applications of Generative LLM
Practical Advise

Synthetic DID and its implementation in Stata

9 minute read

Published: January 22, 2024

As one can tell from the name, synthetic DID is a combination of synthetic control and DID. One main challenge of DID is the validation of parallel trend assumption, as it not always holds in the data that treated and control groups have identical trend or characteristics prior to the treatment event. Built on the method of synthetic control, synthetic DID loosens the needs for parallel assumption by constructing matched control units.

2023

OCR Tools in Python: Azure Vision and Tesseract

8 minute read

Published: September 24, 2023

Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:

Azure AI Vision
Tesseract

Accelerating Data Processing in Python with Parallelism

3 minute read

Published: June 28, 2023

Parallel data processing is a technique used to speed up data processing tasks by dividing the data into smaller chunks and processing them simultaneously on multiple processors. It could be useful as long as do not need to be implemented sequentially. In contrast to threading, which is suitable for tasks that are I/O bounded but not tasks that are CPU bounded, multiprocessing runs on multiple cores and helps when tasks are CPU bounded. Python provides several ways to implement parallel data processing.

“Difference-in-Differences When Parallel Trends Might Be Violated” by Jonathan Roth

6 minute read

Published: May 28, 2023

The Validity of difference-in-differences relies on the parallel trends (PT) assumption. Namely, in the absence of the treatment, the treatment and control groups would have followed a similar trend over time. This assumption allows us to attribute any difference in outcomes after the treatment to the treatment itself, rather than other confounding factors.

How to Build Your Personal Website?

6 minute read

Published: May 06, 2023

Designing a personal website may appear challenging, yet it is much simpler than it seems. In this article, I will share my personal experience when creating my website.

How ChatGPT Changes Our Life?

4 minute read

Published: March 26, 2023

ChatGPT has been under the spotlight in recent times, garnering significant attention from people across the world. In this article, I will share with you how ChatGPT changes my daily workflow, and pose open questions for further discussion.

How to Customize My Plot with Matplotlib?

7 minute read

Published: March 17, 2023

Matplotlib is a powerful data visualization library in Python that offers many customization options for plotting. In this post, I will introduce some of the most common customization options in Matplotlib.

2022

NLP in Finance and Accounting (V): BERT

5 minute read

Published: November 27, 2022

This is the fifth post of the series on NLP in finance and accounting. In this post, we will talk about measures based on BERT model.

BERT Measure
Related Literature

NLP in Finance and Accounting (IV): Readability Measure

5 minute read

Published: November 20, 2022

This is the fourth post of the series on NLP in finance and accounting. In this post, we will talk about readability measures.

Readability Measure
Related Literature

NLP in Finance and Accounting (III): Sentiment Measure

8 minute read

Published: November 13, 2022

This is the third post of the series on NLP in finance and accounting. In this post, we will talk about sentiment measures. There are two main approach to construct sentiment measures.

Dictionary-based Approach
Machine Learning Approach

NLP in Finance and Accounting (II): Text Similarity Measure

5 minute read

Published: November 06, 2022

This is the second post of the series on NLP in finance and accounting. In this post, we will talk about the text similarity measures.

Text Similarity Measure
Related Literature

NLP in Finance and Accounting (I): Term Frequency Measure

10 minute read

Published: October 30, 2022

In this series of posts, I would like to summarize the current applications of NLP in finance and accounting research, with a focus on how different text-based measurements are constructed. I plan to cover Term Frequency, Text similarity, Sentiment, Readability, BERT.

ESG Ratings: KLD data

3 minute read

Published: October 16, 2022

In this post, I will introduce the ESG ratings provided by KLD (Kinder, Lydenberg, and Domini).

The Divergence of ESG Ratings

9 minute read

Published: October 09, 2022

In this post, I will introduce the paper Aggregate Confusion: The Divergence of ESG Ratings by Florian Berg, Julian F Kölbel, and Roberto Rigobon.

Motivation
Sources of Divergence
Decomposition
Rater Effect
Contribution and Implication

Word Embedding (III): GloVe (Global Vectors)

10 minute read

Published: October 02, 2022

In this article, we will introduce another prediction-based word vectors, GloVe.

Background
GloVe Model and Performance
Python Implementation

Word Embedding (II): Word2Vec

10 minute read

Published: September 25, 2022

In this article, we will start to discuss prediction-based word vectors. The first model is Word2Vec. We will cover its model structure and implementation.

Background
Skip-gram
CBOW
Python implementation

Word Embedding (I): Count-based Word Vectors

14 minute read

Published: September 18, 2022

If you are interested in NLP (natural language processing), you probably have heard of word embedding. In simple words, we use word embedding to map words to real vectors. It is the first step for computers to understand human language.

Evaluate Performance of Classification Models and its Visualization in Python

10 minute read

Published: September 11, 2022

If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.

Set Up Environment for Deep Learning on Mac M1

2 minute read

Published: September 05, 2022

This article introduce steps to set up the Python environment for deep learning on Mac M1. We will cover

Install Miniforge 3
Install Tensorflow
Install Transformers

Menghan Wang

Blog posts

2024

2023

2022