Skip to content

Latest commit

 

History

History

process_articles

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Article Preprocessing

To run the article preprocessing, including the cleaning and shortening of dummy scraped articles, please run the code (with env activated):

python src/process_articles/shorten_scraped_articles.py

To shorten and prepare DaNewsRoom corpora, you can request the data by contacting the authors (see danielvarab/da-newsroom). Then place the data in the data folder and run the script by typing (with env activated);

python src/process_articles/shorten_danewsroom.py