Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DDG-DA paper code #743

Merged
merged 47 commits into from
Jan 10, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
48f8694
Merge data selection to main
wendili-cs Jul 1, 2021
5bb06cd
Update trainer for reweighter
wendili-cs Jul 1, 2021
4f442f5
Typos fixed.
wendili-cs Jul 8, 2021
da013fd
Merge branch 'main' into ds
you-n-g Jul 30, 2021
81b4383
update data selection interface
you-n-g Aug 9, 2021
aa2699f
successfully run exp after refactor some interface
you-n-g Aug 13, 2021
d17aaac
data selection share handler & trainer
you-n-g Aug 20, 2021
82b4115
fix meta model time series bug
you-n-g Aug 22, 2021
5b118c4
fix online workflow set_uri bug
you-n-g Sep 13, 2021
3b073f7
fix set_uri bug
you-n-g Sep 26, 2021
384b670
Merge remote-tracking branch 'origin/main' into ds
you-n-g Sep 26, 2021
b0850b0
updawte ds docs and delay trainer bug
you-n-g Sep 27, 2021
051b261
Merge remote-tracking branch 'wd_ds/ds' into ds
you-n-g Oct 9, 2021
f10d726
Merge branch 'main' into ds
you-n-g Nov 14, 2021
cdcfe30
Merge remote-tracking branch 'origin/main' into ds
you-n-g Nov 14, 2021
6d61ad0
Merge remote-tracking branch 'origin/main' into ds
you-n-g Nov 16, 2021
f32a7ad
docs
you-n-g Nov 16, 2021
8fb37b6
resume reweighter
you-n-g Nov 16, 2021
21baead
add reweighting result
you-n-g Nov 16, 2021
12afe61
fix qlib model import
you-n-g Nov 17, 2021
1d9732b
make recorder more friendly
you-n-g Nov 17, 2021
20a8fe5
fix experiment workflow bug
you-n-g Nov 18, 2021
faf3e03
commit for merging master incase of conflictions
you-n-g Dec 9, 2021
76d1bd9
Merge remote-tracking branch 'origin/main' into ds
you-n-g Dec 9, 2021
3bc4030
Successful run DDG-DA with a single command
you-n-g Dec 11, 2021
49c4074
remove unused code
you-n-g Dec 11, 2021
ce66d9a
asdd more docs
you-n-g Dec 13, 2021
cea134d
Update README.md
you-n-g Dec 13, 2021
a4a2b32
Update & fix some bugs.
demon143 Jan 8, 2022
8241832
Update configuration & remove debug functions
wendili-cs Jan 8, 2022
e1b079d
Update README.md
wendili-cs Jan 9, 2022
6a3f471
Modfify horizon from code rather than yaml
wendili-cs Jan 9, 2022
c3364cd
Update performance in README.md
wendili-cs Jan 9, 2022
b3d1081
Merge remote-tracking branch 'origin/main' into ds
you-n-g Jan 9, 2022
fa2d047
fix part comments
you-n-g Jan 9, 2022
efab5cb
Remove unfinished TCTS.
wendili-cs Jan 10, 2022
5a184eb
Fix some details.
wendili-cs Jan 10, 2022
8fee1b4
Update meta docs
wendili-cs Jan 10, 2022
a31a4d5
Update README.md of the benchmarks_dynamic
wendili-cs Jan 10, 2022
ca3fe76
Merge branch 'main' into ds
you-n-g Jan 10, 2022
97f61d5
Update README.md files
wendili-cs Jan 10, 2022
2726560
Merge branch 'ds' of wd_git:you-n-g/qlib into ds
wendili-cs Jan 10, 2022
da68103
Add README.md to the rolling_benchmark baseline.
wendili-cs Jan 10, 2022
7e1183b
Refine the docs and link
you-n-g Jan 10, 2022
b0857c2
Rename README.md in benchmarks_dynamic.
wendili-cs Jan 10, 2022
38b83dd
Remove comments.
wendili-cs Jan 10, 2022
34f5bd2
auto download data
you-n-g Jan 10, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 28 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
Recent released features
| Feature | Status |
| -- | ------ |
| Meta-Learning-based framework & DDG-DA | [Released](/~https://github.com/microsoft/qlib/pull/743) on Jan 10, 2022 |
| Planning-based portfolio optimization | [Released](/~https://github.com/microsoft/qlib/pull/754) on Dec 28, 2021 |
| Release Qlib v0.8.0 | [Released](/~https://github.com/microsoft/qlib/releases/tag/v0.8.0) on Dec 8, 2021 |
| ADD model | [Released](/~https://github.com/microsoft/qlib/pull/704) on Nov 22, 2021 |
Expand Down Expand Up @@ -50,9 +51,12 @@ For more details, please refer to our paper ["Qlib: An AI-oriented Quantitative
- [Data Preparation](#data-preparation)
- [Auto Quant Research Workflow](#auto-quant-research-workflow)
- [Building Customized Quant Research Workflow by Code](#building-customized-quant-research-workflow-by-code)
- [**Quant Model(Paper) Zoo**](#quant-model-paper-zoo)
- [Run a single model](#run-a-single-model)
- [Run multiple models](#run-multiple-models)
- [Main Challenges & Solutions in Quant Research](#main-challenges--solutions-in-quant-research)
- [Forecasting: Finding Valuable Signals/Patterns](#forecasting-finding-valuable-signalspatterns)
- [**Quant Model (Paper) Zoo**](#quant-model-paper-zoo)
- [Run a Single Model](#run-a-single-model)
- [Run Multiple Models](#run-multiple-models)
- [Adapting to Market Dynamics](#adapting-to-market-dynamics)
- [**Quant Dataset Zoo**](#quant-dataset-zoo)
- [More About Qlib](#more-about-qlib)
- [Offline Mode and Online Mode](#offline-mode-and-online-mode)
Expand All @@ -69,7 +73,6 @@ Your feedbacks about the features are very important.
| -- | ------ |
| Point-in-Time database | Under review: /~https://github.com/microsoft/qlib/pull/343 |
| Orderbook database | Under review: /~https://github.com/microsoft/qlib/pull/744 |
| Meta-Learning-based data selection | Under review: /~https://github.com/microsoft/qlib/pull/743 |

# Framework of Qlib

Expand Down Expand Up @@ -280,8 +283,18 @@ Qlib provides a tool named `qrun` to run the whole workflow automatically (inclu
## Building Customized Quant Research Workflow by Code
The automatic workflow may not suit the research workflow of all Quant researchers. To support a flexible Quant research workflow, Qlib also provides a modularized interface to allow researchers to build their own workflow by code. [Here](examples/workflow_by_code.ipynb) is a demo for customized Quant research workflow by code.

# Main Challenges & Solutions in Quant Research
Quant investment is an very unique scenario with lots of key challenges to be solved.
Currently, Qlib provides some solutions for several of them.

# [Quant Model (Paper) Zoo](examples/benchmarks)
## Forecasting: Finding Valuable Signals/Patterns
Accurate forecasting of the stock price trend is a very important part to construct profitable portfolios.
However, huge amount of data with various formats in the financial market which make it challenging to build forecasting models.

An increasing number of SOTA Quant research works/papers, which focus on building forecasting models to mine valuable signals/patterns in complex financial data, are released in `Qlib`


### [Quant Model (Paper) Zoo](examples/benchmarks)

Here is a list of models built on `Qlib`.
- [GBDT based on XGBoost (Tianqi Chen, et al. KDD 2016)](examples/benchmarks/XGBoost/)
Expand All @@ -308,7 +321,7 @@ Your PR of new Quant models is highly welcomed.

The performance of each model on the `Alpha158` and `Alpha360` dataset can be found [here](examples/benchmarks/README.md).

## Run a single model
### Run a single model
All the models listed above are runnable with ``Qlib``. Users can find the config files we provide and some details about the model through the [benchmarks](examples/benchmarks) folder. More information can be retrieved at the model files listed above.

`Qlib` provides three different ways to run a single model, users can pick the one that fits their cases best:
Expand All @@ -318,7 +331,7 @@ All the models listed above are runnable with ``Qlib``. Users can find the confi
- Users can use the script [`run_all_model.py`](examples/run_all_model.py) listed in the `examples` folder to run a model. Here is an example of the specific shell command to be used: `python run_all_model.py run --models=lightgbm`, where the `--models` arguments can take any number of models listed above(the available models can be found in [benchmarks](examples/benchmarks/)). For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).
- **NOTE**: Each baseline has different environment dependencies, please make sure that your python version aligns with the requirements(e.g. TFT only supports Python 3.6~3.7 due to the limitation of `tensorflow==1.15.0`)

## Run multiple models
### Run multiple models
`Qlib` also provides a script [`run_all_model.py`](examples/run_all_model.py) which can run multiple models for several iterations. (**Note**: the script only support *Linux* for now. Other OS will be supported in the future. Besides, it doesn't support parallel running the same model for multiple times as well, and this will be fixed in the future development too.)

The script will create a unique virtual environment for each model, and delete the environments after training. Thus, only experiment results such as `IC` and `backtest` results will be generated and stored.
Expand All @@ -330,6 +343,14 @@ python run_all_model.py run 10

It also provides the API to run specific models at once. For more use cases, please refer to the file's [docstrings](examples/run_all_model.py).

## [Adapting to Market Dynamics](examples/benchmarks_dynamic)

Due to the non-stationary nature of the environment of the financial market, the data distribution may change in different periods, which makes the performance of models build on training data decays in the future test data.
So adapting the forecasting models/strategies to market dynamics is very important to the model/strategies' performance.

Here is a list of solutions built on `Qlib`.
- [Rolling Retraining](examples/benchmarks_dynamic/baseline/)
- [DDG-DA on pytorch (Wendi, et al. AAAI 2022)](examples/benchmarks_dynamic/DDG-DA/)

# Quant Dataset Zoo
Dataset plays a very important role in Quant. Here is a list of the datasets built on `Qlib`:
Expand Down
68 changes: 68 additions & 0 deletions docs/component/meta.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
.. _meta:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check CI


=================================
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model
=================================
.. currentmodule:: qlib


Introduction
=============
``Meta Controller`` provides guidance to ``Forecast Model``, which aims to learn regular patterns among a series of forecasting tasks and use learned patterns to guide forthcoming forecasting tasks. Users can implement their own meta-model instance based on ``Meta Controller`` module.

Meta Task
=============

A `Meta Task` instance is the basic element in the meta-learning framework. It saves the data that can be used for the `Meta Model`. Multiple `Meta Task` instances may share the same `Data Handler`, controlled by `Meta Dataset`. Users should use `prepare_task_data()` to obtain the data that can be directly fed into the `Meta Model`.

.. autoclass:: qlib.model.meta.task.MetaTask
:members:

Meta Dataset
=============

`Meta Dataset` controls the meta-information generating process. It is on the duty of providing data for training the `Meta Model`. Users should use `prepare_tasks` to retrieve a list of `Meta Task` instances.

.. autoclass:: qlib.model.meta.dataset.MetaTaskDataset
:members:

Meta Model
=============

General Meta Model
------------------
`Meta Model` instance is the part that controls the workflow. The usage of the `Meta Model` includes:
1. Users train their `Meta Model` with the `fit` function.
2. The `Meta Model` instance guides the workflow by giving useful information via the `inference` function.

.. autoclass:: qlib.model.meta.model.MetaModel
:members:

Meta Task Model
------------------
This type of meta-model may interact with task definitions directly. Then, the `Meta Task Model` is the class for them to inherit from. They guide the base tasks by modifying the base task definitions. The function `prepare_tasks` can be used to obtain the modified base task definitions.
you-n-g marked this conversation as resolved.
Show resolved Hide resolved

.. autoclass:: qlib.model.meta.model.MetaTaskModel
:members:

Meta Guide Model
------------------
This type of meta-model participates in the training process of the base forecasting model. The meta-model may guide the base forecasting models during their training to improve their performances.

.. autoclass:: qlib.model.meta.model.MetaGuideModel
:members:


Example
=============
``Qlib`` provides an implementation of ``Meta Model`` module, ``DDG-DA``,
which adapts to the market dynamics.

``DDG-DA`` includes four steps:

1. Calculate meta-information and encapsulate it into ``Meta Task`` instances. All the meta-tasks form a ``Meta Dataset`` instance.
2. Train ``DDG-DA`` based on the training data of the meta-dataset.
3. Do the inference of the ``DDG-DA`` to get guide information.
4. Apply guide information to the forecasting models to improve their performances.

The `above example </~https://github.com/microsoft/qlib/tree/main/examples/benchmarks_dynamic/DDG-DA>`_ can be found in ``examples/benchmarks_dynamic/DDG-DA/workflow.py``.
3 changes: 2 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -36,10 +36,11 @@ Document Structure
:caption: COMPONENTS:

Workflow: Workflow Management <component/workflow.rst>
Data Layer: Data Framework&Usage <component/data.rst>
Data Layer: Data Framework & Usage <component/data.rst>
Forecast Model: Model Training & Prediction <component/model.rst>
Portfolio Management and Backtest <component/strategy.rst>
Nested Decision Execution: High-Frequency Trading <component/highfreq.rst>
Meta Controller: Meta-Task & Meta-Dataset & Meta-Model <component/meta.rst>
Qlib Recorder: Experiment Management <component/recorder.rst>
Analysis: Evaluation & Results Analysis <component/report.rst>
Online Serving: Online Management & Strategy & Tool <component/online.rst>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ data_handler_config: &data_handler_config
- class: CSRankNorm
kwargs:
fields_group: label
label: ["Ref($close, -2) / Ref($close, -1) - 1"]
port_analysis_config: &port_analysis_config
strategy:
class: TopkDropoutStrategy
Expand Down
1 change: 0 additions & 1 deletion examples/benchmarks/TFT/tft.py
Original file line number Diff line number Diff line change
Expand Up @@ -209,7 +209,6 @@ def fit(self, dataset: DatasetH, MODEL_FOLDER="qlib_tft_model", USE_GPU_ID=0, **
fixed_params = self.data_formatter.get_experiment_params()
params = self.data_formatter.get_default_model_params()

# Wendi: 合并调优的参数和非调优的参数
params = {**params, **fixed_params}

if not os.path.exists(self.model_folder):
Expand Down
27 changes: 27 additions & 0 deletions examples/benchmarks_dynamic/DDG-DA/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Introduction
This is the implementation of `DDG-DA` based on `Meta Controller` component provided by `Qlib`.

## Background
In many real-world scenarios, we often deal with streaming data that is sequentially collected over time. Due to the non-stationary nature of the environment, the streaming data distribution may change in unpredictable ways, which is known as concept drift. To handle concept drift, previous methods first detect when/where the concept drift happens and then adapt models to fit the distribution of the latest data. However, there are still many cases that some underlying factors of environment evolution are predictable, making it possible to model the future concept drift trend of the streaming data, while such cases are not fully explored in previous work.

Therefore, we propose a novel method `DDG-DA`, that can effectively forecast the evolution of data distribution and improve the performance of models. Specifically, we first train a predictor to estimate the future data distribution, then leverage it to generate training samples, and finally train models on the generated data.

## Dataset
The data in the paper are private. So we conduct experiments on Qlib's public dataset.
Though the dataset is different, the conclusion remains the same. By applying `DDG-DA`, users can see rising trends at the test phase both in the proxy models' ICs and the performances of the forecasting models.

you-n-g marked this conversation as resolved.
Show resolved Hide resolved
## Run the Code
Users can try `DDG-DA` by running the following command:
```bash
python workflow.py run_all
```

The default forecasting models are `Linear`. Users can choose other forecasting models by changing the `forecast_model` parameter when `DDG-DA` initializes. For example, users can try `LightGBM` forecasting models by running the following command:
```bash
python workflow.py --forecast_model="gbdt" run_all
```


## Results

The results of other methods in Qlib's public dataset can be found [here](../)
1 change: 1 addition & 0 deletions examples/benchmarks_dynamic/DDG-DA/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
torch==1.10.0
Loading