-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error: A valid tsibble must have distinct rows identified by key and index. Please use duplicates()
to check the duplicated rows.
#191
Comments
Even if the measured value remains constant, then the time of observation (index: Perhaps the error message can be tweaked to de-emphasise 'duplicated rows' and instead emphasise the temporal nature of the duplicate. |
I am afraid that I do not understand the error message: duplicates(df1)
Using `snsr_dt` as index variable.
# A tibble: 1,755,129 x 4
snsr_dt machine signal instrument
<date> <chr> <dbl> <dbl>
1 2011-03-14 machine1 71 1
2 2011-03-21 machine1 71 1
3 2011-03-28 machine1 71 1 The snsr_dt seems different to me. Also, the process that is generating this data is actually dropping duplicates at the end. So I am a little bit confused here. BR |
When using the |
Dear @mitchelloharawild, Thank you for the details. In fact I have 62 duplicates. I might have miss-understood the dplyr::distinct() function but basically I was doing: df_raw <- dbGetQuery(myDB,
"SELECT * FROM target_table
WHERE locf_tag='N'
AND (EXTRACT(WEEKDAY FROM timestamp)=1)
AND (EXTRACT(HOUR FROM timestamp)=1)") %>%
rename_all(tolower) %>%
group_by(machine, signal, instrument)
df<- df_raw %>% filter(n() >= 52*2) %>% #removing groups with less than 2y of observations
mutate(snsr_val_clean = if_else(condition = inc_tag == 'Y', true = snsr_val, false = NA_real_)) %>%
mutate(snsr_val_clean = na.approx(snsr_val_clean, na.rm=FALSE)) %>%
# tk_augment_timeseries_signature(snsr_dt) %>%
# select_if(negate(is.factor)) %>%
# select(-c(diff)) %>%
select(-c(interp_qlty, base_qlty, inc_tag, locf_tag, timestamp, snsr_val)) %>%
# select(-c("hour", "minute", "second", "hour12", "wday", "wday.xts", "am.pm")) %>%
drop_na() %>%
distinct() Shouldn't this remove all of the duplicates? BR |
How you remove your duplicates is problem specific, although |
I get the same error messages donloading as csv file and using pivotlonger to get a three column file the first is the index as monthly data, the second (the key) are 14 observations at every month marked as factors and finally a value to each index key pair all are distinct. Trying to find dublicates I get the error message: Error in dublicates(., key = Pattern, index = yr.mnth) : |
Dear colleagues,
I am trying to fit an auto.arima model to multiple weekly time series grouped by 3 columns (machine, signal and instrument)
Created on 2020-05-25 by the reprex package (v0.3.0)
However when I am trying to convert them to a tsibble I am getting the following error:
Created on 2020-05-25 by the reprex package (v0.3.0)
Every row in that dataframe is unique in terms of combination of machine, signal and instrument and snsr_val. It is possible however that for an specific combination of machine, signal and instrument I have duplicated values (as the sensor is a cumulative one and if it is not measuring this value will stay constant). Is this the case why I have to check into the duplicates?
Is there any work around this?
BR
/Edgar
The text was updated successfully, but these errors were encountered: