Write tests #3

gedankenstuecke · 2017-12-15T00:12:49Z

The old problem :P

troublemagnet · 2018-03-22T09:46:55Z

gedankenstuecke · 2018-03-22T12:41:10Z

Absolutely! 👍

jobliz · 2018-05-16T19:37:49Z

@gedankenstuecke I was thinking on this issue and realized I have doubts on how to test the data loading and analyzing process. The create_main_dataframe function needs a twitter zip archive to be tested, and the functions in analyse_data.py all need the created dataframe as a parameter to be tested, so I think it comes down to where a twitter archive file should be stored. I see two alternatives:

The tests would be very general and able to pass with any twitter archive that the dev places in a specific directory, so the test data should be downloaded separately after cloning the repository, but I'm not sure how one could test specific analysis return values from potentially different archives.
The tests would be archive-specific (for example, depending on test_archive.zip). If this were the case I see three ways to go: The user could be instructed in the documentation to download the file from a static URL, the file itself could be uploaded to the repository, or a new script could download the archive from a static URL only once, before tests can be run.

In a project I once did we placed the test data inside the code repository, but I'm not sure if that'd fit the idea behind twitter-analyser, so I wanted to know more before doing a pull request.

gedankenstuecke · 2018-05-16T20:16:32Z

@jobliz I think having an archive specific test would be okay with me.

To keep the file size (and test duration) manageable one could just minimize the data in the test_archive.zip to something like two months worth of data (e.g. take december 2008 and january 2009, with that it also includes the yearly break).

Does that sound good?

jobliz · 2018-05-16T22:34:11Z

Sure! I'll create a minimized archive and do new tests with it.

gedankenstuecke · 2018-05-16T22:44:36Z

Awesome, thanks!

jobliz · 2018-05-17T22:14:43Z

I made a minified archive file by deleting JSON files in the /tweets directory and manually setting a new tweet total in payload_details.js, then tried to run the analysis functions on the resulting dataframe. Unexpectedly, two functions raised exceptions. test_create_timeline raises NotImplementedError: Not supported for type RangeIndex and create_hourly_stats raises KeyError: 'Weekday'. This doesn't happen with the full archive (and yes, testing does take way too long to be practical with all tweets). I tried the same tests with a 4 month archive including November 2008 and February 2009 too, but it also raised the same exceptions.

Given that I'm not entirely sure why this is happening in the code and the tests aren't really passing I'm not yet making a PR, but the code is in a branch in my fork.

Do you have any idea why a minimized archive might raise exceptions?

gedankenstuecke · 2018-05-18T00:12:19Z

I haven't found time to look closely into the data. But I suspect that both create_hourly_stats and create_timeline might crash due to a lack of GPS data in the 2008/2009 data, as it only works for tweets that have latitude & longitude to find out the time zone/have a lat/long to plot on the map later on.

Looking at my own data it seems that geolocations on Twitter became available only at mid-2011. So probably I just made a bad recommendation on which data to pick for the test set.

This would also explain why it doesn't happen for the full data set, as we do have GPS coordinates in the full archive.

jobliz · 2018-05-18T00:59:59Z

That makes sense. I will make a minimized archive from months later than mid-2011 and see how it goes. If I'm getting it right then these crashes will also happen when loading full archives from people that haven't turned on location data. When I receive my archive from Twitter I will check if it does to be sure.

gedankenstuecke · 2018-05-18T06:15:29Z

Great, thanks so much!

gedankenstuecke added the help wanted label Dec 15, 2017

gedankenstuecke added the mozsprint label May 3, 2018

jobliz mentioned this issue May 29, 2018

Basic data import tests #38

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Write tests #3

Write tests #3

gedankenstuecke commented Dec 15, 2017

troublemagnet commented Mar 22, 2018

gedankenstuecke commented Mar 22, 2018

jobliz commented May 16, 2018 •

edited

Loading

gedankenstuecke commented May 16, 2018

jobliz commented May 16, 2018 •

edited

Loading

gedankenstuecke commented May 16, 2018

jobliz commented May 17, 2018

gedankenstuecke commented May 18, 2018

jobliz commented May 18, 2018

gedankenstuecke commented May 18, 2018

Write tests #3

Write tests #3

Comments

gedankenstuecke commented Dec 15, 2017

troublemagnet commented Mar 22, 2018

gedankenstuecke commented Mar 22, 2018

jobliz commented May 16, 2018 • edited Loading

gedankenstuecke commented May 16, 2018

jobliz commented May 16, 2018 • edited Loading

gedankenstuecke commented May 16, 2018

jobliz commented May 17, 2018

gedankenstuecke commented May 18, 2018

jobliz commented May 18, 2018

gedankenstuecke commented May 18, 2018

jobliz commented May 16, 2018 •

edited

Loading

jobliz commented May 16, 2018 •

edited

Loading