ESG Ratings: KLD data
Published:
In this post, I will introduce the ESG ratings provided by KLD (Kinder, Lydenberg, and Domini).
Published:
In this post, I will introduce the ESG ratings provided by KLD (Kinder, Lydenberg, and Domini).
Published:
In this post, I will introduce the paper Aggregate Confusion: The Divergence of ESG Ratings by Florian Berg, Julian F Kölbel, and Roberto Rigobon.
Published:
As one can tell from the name, synthetic DID is a combination of synthetic control and DID. One main challenge of DID is the validation of parallel trend assumption, as it not always holds in the data that treated and control groups have identical trend or characteristics prior to the treatment event. Built on the method of synthetic control, synthetic DID loosens the needs for parallel assumption by constructing matched control units.
Published:
The Validity of difference-in-differences relies on the parallel trends (PT) assumption. Namely, in the absence of the treatment, the treatment and control groups would have followed a similar trend over time. This assumption allows us to attribute any difference in outcomes after the treatment to the treatment itself, rather than other confounding factors.
Published:
If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.
Published:
This article introduce steps to set up the Python environment for deep learning on Mac M1. We will cover
Published:
This is the sixth post of the series on NLP in finance and accounting. In this post, we will talk about the use of generative large language models (LLMs) in finance research.
Published:
Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:
Published:
ChatGPT has been under the spotlight in recent times, garnering significant attention from people across the world. In this article, I will share with you how ChatGPT changes my daily workflow, and pose open questions for further discussion.
Published:
This is the fifth post of the series on NLP in finance and accounting. In this post, we will talk about measures based on BERT model.
Published:
This is the fourth post of the series on NLP in finance and accounting. In this post, we will talk about readability measures.
Published:
This is the third post of the series on NLP in finance and accounting. In this post, we will talk about sentiment measures. There are two main approach to construct sentiment measures.
Published:
This is the second post of the series on NLP in finance and accounting. In this post, we will talk about the text similarity measures.
Published:
In this series of posts, I would like to summarize the current application of NLP in finance and accounting research, with a focus on how different text-based measurements are constructed. I plan to cover Term Frequency, Text similarity, Sentiment, Readability, BERT.
Published:
In this article, we will introduce another prediction-based word vectors, GloVe.
Published:
In this article, we will start to discuss prediction-based word vectors. The first model is Word2Vec. We will cover its model structure and implementation.
Published:
If you are interested in NLP (natural language processing), you probably have heard of word embedding. In simple words, we use word embedding to map words to real vectors. It is the first step for computers to understand human language.
Published:
Optical Character Recognition (OCR) is a technology that enables computers to read text from images. OCR tools have become increasingly popular due to their ability to automate data entry and extract information from scanned documents. In this blog post, I will cover the usage of two OCR tools in Python:
Published:
Parallel data processing is a technique used to speed up data processing tasks by dividing the data into smaller chunks and processing them simultaneously on multiple processors. It could be useful as long as do not need to be implemented sequentially. In contrast to threading, which is suitable for tasks that are I/O bounded but not tasks that are CPU bounded, multiprocessing runs on multiple cores and helps when tasks are CPU bounded. Python provides several ways to implement parallel data processing.
Published:
Matplotlib is a powerful data visualization library in Python that offers many customization options for plotting. In this post, I will introduce some of the most common customization options in Matplotlib.
Published:
If you are developing models to deal with classification problem, you probably have heard metrics like accuracy, precision, recall, F1-score for performance evaluation. You could easily get their meanings through a quick search on Google, but you may ends up with lines of formulas, while wondering why we need various metrics. In this article, I will try to explain the rationale behind those metrics, and to share the Python code of their computation and visualization.
Published:
Designing a personal website may appear challenging, yet it is much simpler than it seems. In this article, I will share my personal experience when creating my website.