Update `value_count` serialization/deserialization to be consistent with original schema #111

oliverholworthy · 2022-08-03T15:14:20Z

Goal

Ensure that the Schema (ColumnSchema) value_count property is restored to the same value when serialized/deserialized.
Update default value of is_list/is_ragged to infer from value_count property.

Details

`value_count` property serialization/deserialzation:

fails to serialize to json
- value_count = {'max': 1, 'min': 1}
- AttributeError: 'int' object has no attribute '_serialized_on_wire'
silently fails to re-construct the same schema object (serialize -> deserialize)
- value_count = {{'max': 0, 'min': 0}, {'max': 2, 'min': 1}}

~~These value_count examples may not be practically an issue~~. It opens up some questions about where validation should take place for the Schema (at schema construction time or at serialization time). And where there are certain assumptions or constraints in serialation.

`is_list`/`is_ragged` default value

value_count max > min
- implies that is_ragged=True, is_list=True
value_count max == min
- implies that is_ragged=False, is_list=True
value_count max=0
- implies that is_ragged=False, is_list=False

nvidia-merlin-bot · 2022-08-03T15:15:33Z

Click to view CI Results

GitHub pull request #111 of commit 6b66c872515fc178ad2b55b1ce97837aa31ac558, no merge conflicts.
Running as SYSTEM
Setting status of 6b66c872515fc178ad2b55b1ce97837aa31ac558 to PENDING with url https://10.20.13.93:8080/job/merlin_core/87/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6b66c872515fc178ad2b55b1ce97837aa31ac558^{commit} # timeout=10
Checking out Revision 6b66c872515fc178ad2b55b1ce97837aa31ac558 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6b66c872515fc178ad2b55b1ce97837aa31ac558 # timeout=10
Commit message: "Add test for value_count property"
 > git rev-list --no-walk a5137c1f066c1577794aa12ad48d20f21d9b6afb # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2111088128308957226.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 344 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ..................................FFFF......... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 74%]

.......................................................................  [ 95%]

tests/unit/schema/test_schema.py ......                                  [ 96%]

tests/unit/schema/test_schema_io.py ..F                                  [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_________________ test_dask_dataset_from_dataframe[True-cudf] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4')

origin = 'cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5')

origin = 'dask_cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-pd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6')

origin = 'pd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-dd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7')

origin = 'dd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

_______________________________ test_value_count _______________________________
def test_value_count():
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": {"min": 1, "max": 1},
                },
                is_list=True,
            )
        ]
    )


  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


tests/unit/schema/test_schema_io.py:86:

merlin/schema/io/tensorflow_metadata.py:216: in to_json

return self.proto_schema.to_json()

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json

return json.dumps(self.to_dict(), indent=indent)

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in 

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict

output[cased_name] = v.to_dict(casing, include_default_values)

self = FixedShape(dim=1), casing = <function camelcase at 0x7f7a1593e700>

include_default_values = False
def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:


              if v._serialized_on_wire or include_default_values:


E                   AttributeError: 'int' object has no attribute '_serialized_on_wire'
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError

=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45691 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42353 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36545 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40371 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46753 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38309 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count - AttributeError...

============ 5 failed, 339 passed, 1 skipped, 82 warnings in 51.65s ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins12869842802791269842.sh

nvidia-merlin-bot · 2022-08-03T15:18:57Z

Click to view CI Results

GitHub pull request #111 of commit a8b6dd5ccb520d8852368c3be924d3b2dfc3904a, no merge conflicts.
Running as SYSTEM
Setting status of a8b6dd5ccb520d8852368c3be924d3b2dfc3904a to PENDING with url https://10.20.13.93:8080/job/merlin_core/88/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse a8b6dd5ccb520d8852368c3be924d3b2dfc3904a^{commit} # timeout=10
Checking out Revision a8b6dd5ccb520d8852368c3be924d3b2dfc3904a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a8b6dd5ccb520d8852368c3be924d3b2dfc3904a # timeout=10
Commit message: "Add test for value_count property"
 > git rev-list --no-walk 6b66c872515fc178ad2b55b1ce97837aa31ac558 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1305244692537516776.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ..................................FFFF......... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 73%]

.......................................................................  [ 94%]

tests/unit/schema/test_schema.py ......                                  [ 96%]

tests/unit/schema/test_schema_io.py ..FFF                                [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_________________ test_dask_dataset_from_dataframe[True-cudf] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4')

origin = 'cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5')

origin = 'dask_cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-pd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6')

origin = 'pd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-dd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7')

origin = 'dd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

________________________ test_value_count[value_count0] ________________________
value_count = {'max': 0, 'min': 0}
@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                    -----------------------------------

E         + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
tests/unit/schema/test_schema_io.py:91: AssertionError

________________________ test_value_count[value_count1] ________________________
value_count = {'max': 1, 'min': 1}
@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )


  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


tests/unit/schema/test_schema_io.py:89:

merlin/schema/io/tensorflow_metadata.py:216: in to_json

return self.proto_schema.to_json()

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json

return json.dumps(self.to_dict(), indent=indent)

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in 

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict

output[cased_name] = v.to_dict(casing, include_default_values)

self = FixedShape(dim=1), casing = <function camelcase at 0x7f5bdcabe700>

include_default_values = False
def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:


              if v._serialized_on_wire or include_default_values:


E                   AttributeError: 'int' object has no attribute '_serialized_on_wire'
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError

________________________ test_value_count[value_count2] ________________________
value_count = {'max': 2, 'min': 1}
@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                                                                                                                   ^^^^

E         + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]

E         ?                                                                                                                                                   ^^^
tests/unit/schema/test_schema_io.py:91: AssertionError

=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43747 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41063 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33331 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37257 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43745 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35217 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...

============ 7 failed, 339 passed, 1 skipped, 82 warnings in 51.31s ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins16851984469355567604.sh

nvidia-merlin-bot · 2022-08-03T15:25:33Z

Click to view CI Results

GitHub pull request #111 of commit 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5, no merge conflicts.
Running as SYSTEM
Setting status of 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 to PENDING with url https://10.20.13.93:8080/job/merlin_core/89/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5^{commit} # timeout=10
Checking out Revision 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 # timeout=10
Commit message: "reformat test_schema_io.py"
 > git rev-list --no-walk a8b6dd5ccb520d8852368c3be924d3b2dfc3904a # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins16948746651111168622.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ..................................FFFF......... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 73%]

.......................................................................  [ 94%]

tests/unit/schema/test_schema.py ......                                  [ 96%]

tests/unit/schema/test_schema_io.py ..FFF                                [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_________________ test_dask_dataset_from_dataframe[True-cudf] __________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4')

origin = 'cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5')

origin = 'dask_cudf', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-pd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6')

origin = 'pd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

__________________ test_dask_dataset_from_dataframe[True-dd] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7')

origin = 'dd', cpu = True
@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)


  ddf_check = dask_cudf.read_parquet(path).compute()


tests/unit/io/test_io.py:290:

/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute

(result,) = compute(self, traverse=False, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute

results = schedule(dsk, keys, **kwargs)

/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync

return get_async(

/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async

for key, res_info, failed in queue_get(queue).result():

/usr/lib/python3.8/concurrent/futures/_base.py:437: in result

return self.__get_result()

/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result

raise self._exception

/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit

fut.set_result(fn(*args, **kwargs))

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in 

return [execute_task(a) for a in it]

/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task

result = pack_exception(e, dumps)

/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task

result = _execute_task(task, data)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call

return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))

/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get

result = _execute_task(task, cache)

/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task

return func((_execute_task(a, cache) for a in args))

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call

return read_parquet_part(

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part

dfs = [

/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in 

func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition

cls._read_paths(

/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths

df = cudf.read_parquet(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet

) = _process_dataset(

/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner

result = func(*args, **kwargs)

/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset

dataset = ds.dataset(

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset

return _filesystem_dataset(source, **kwargs)

/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset

return factory.finish(schema)

pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish

???

pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status

???


???

E   pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid

________________________ test_value_count[value_count0] ________________________
value_count = {'max': 0, 'min': 0}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                    -----------------------------------

E         + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
tests/unit/schema/test_schema_io.py:93: AssertionError

________________________ test_value_count[value_count1] ________________________
value_count = {'max': 1, 'min': 1}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )


  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


tests/unit/schema/test_schema_io.py:91:

merlin/schema/io/tensorflow_metadata.py:216: in to_json

return self.proto_schema.to_json()

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json

return json.dumps(self.to_dict(), indent=indent)

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in 

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict

output[cased_name] = v.to_dict(casing, include_default_values)

self = FixedShape(dim=1), casing = <function camelcase at 0x7f234bb6d700>

include_default_values = False
def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:


              if v._serialized_on_wire or include_default_values:


E                   AttributeError: 'int' object has no attribute '_serialized_on_wire'
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError

________________________ test_value_count[value_count2] ________________________
value_count = {'max': 2, 'min': 1}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                                                                                                                   ^^^^

E         + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]

E         ?                                                                                                                                                   ^^^
tests/unit/schema/test_schema_io.py:93: AssertionError

=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40849 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45103 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46801 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39641 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34735 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44247 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...

FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...

============ 7 failed, 339 passed, 1 skipped, 82 warnings in 51.75s ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins18257896002267801304.sh

nvidia-merlin-bot · 2022-08-15T13:49:23Z

Click to view CI Results

GitHub pull request #111 of commit d2b21ecc5495fb500d784ea9b3ee00c37f5950c0, no merge conflicts.
Running as SYSTEM
Setting status of d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 to PENDING with url https://10.20.13.93:8080/job/merlin_core/102/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse d2b21ecc5495fb500d784ea9b3ee00c37f5950c0^{commit} # timeout=10
Checking out Revision d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 5c431330c7a782eae5e4bcef3a628ce76281825c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17375995638170082886.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped
tests/unit/core/test_dispatch.py ..                                      [  0%]

tests/unit/dag/test_base_operator.py ....                                [  1%]

tests/unit/dag/test_column_selector.py ..........................        [  9%]

tests/unit/dag/test_graph.py .                                           [  9%]

tests/unit/dag/test_tags.py ......                                       [ 11%]

tests/unit/dag/ops/test_selection.py ...                                 [ 12%]

tests/unit/io/test_io.py ............................................... [ 25%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py ............................... [ 53%]

........................................................................ [ 73%]

.......................................................................  [ 94%]

tests/unit/schema/test_schema.py ......                                  [ 96%]

tests/unit/schema/test_schema_io.py ..FFF                                [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

________________________ test_value_count[value_count0] ________________________
value_count = {'max': 0, 'min': 0}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                    -----------------------------------

E         + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
tests/unit/schema/test_schema_io.py:93: AssertionError

________________________ test_value_count[value_count1] ________________________
value_count = {'max': 1, 'min': 1}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )


  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


tests/unit/schema/test_schema_io.py:91:

merlin/schema/io/tensorflow_metadata.py:216: in to_json

return self.proto_schema.to_json()

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json

return json.dumps(self.to_dict(), indent=indent)

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in 

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict

output[cased_name] = v.to_dict(casing, include_default_values)

self = FixedShape(dim=1), casing = <function camelcase at 0x7fd7a8680820>

include_default_values = False
def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:


              if v._serialized_on_wire or include_default_values:


E                   AttributeError: 'int' object has no attribute '_serialized_on_wire'
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError

________________________ test_value_count[value_count2] ________________________
value_count = {'max': 2, 'min': 1}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]

E         Full diff:

E         - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

E         ?                                                                                                                                                   ^^^^

E         + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]

E         ?                                                                                                                                                   ^^^
tests/unit/schema/test_schema_io.py:93: AssertionError

=============================== warnings summary ===============================

tests/unit/dag/test_base_operator.py: 4 warnings

tests/unit/io/test_io.py: 71 warnings

/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/utils/test_utils.py::test_serial_context[True]

/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first

self.make_current()
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33193 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45721 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35157 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37185 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42931 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44417 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

=========================== short test summary info ============================

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...

FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...

============ 3 failed, 343 passed, 1 skipped, 83 warnings in 54.73s ============

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins5560053937149570237.sh

nvidia-merlin-bot · 2022-11-18T09:58:20Z

Click to view CI Results

GitHub pull request #111 of commit 6b6531116bfe32ee2e932e8d199de5de924c6724, no merge conflicts.
Running as SYSTEM
Setting status of 6b6531116bfe32ee2e932e8d199de5de924c6724 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/288/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6b6531116bfe32ee2e932e8d199de5de924c6724^{commit} # timeout=10
Checking out Revision 6b6531116bfe32ee2e932e8d199de5de924c6724 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6b6531116bfe32ee2e932e8d199de5de924c6724 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 1538d39f5e2ec8b983f343ec87dc6db6cf11a0f5 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins775823410952237213.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+16.g6b65311.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+16.g6b65311,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1141691438'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 390 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py .......................         [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py ..FF................................ [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

________________________ test_value_count[value_count0] ________________________
value_count = {'max': 0, 'min': 0}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


  assert output_schema == schema


E       AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...agged': True}]

E         Use -v to get more diff
tests/unit/schema/test_schema_io.py:97: AssertionError

________________________ test_value_count[value_count1] ________________________
value_count = {'max': 1, 'min': 1}
@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )


  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


tests/unit/schema/test_schema_io.py:95:

merlin/schema/io/tensorflow_metadata.py:216: in to_json

return self.proto_schema.to_json()

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json

return json.dumps(self.to_dict(), indent=indent)

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in 

v = [i.to_dict(casing, include_default_values) for i in v]

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict

output[cased_name] = v.to_dict(casing, include_default_values)

self = FixedShape(dim=1), casing = <function camelcase at 0x7feac5d3f8b0>

include_default_values = False
def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:


              if v._serialized_on_wire or include_default_values:


E                   AttributeError: 'int' object has no attribute '_serialized_on_wire'
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36059 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41935 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39425 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40581 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42777 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35983 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     190     17    91%

merlin/schema/schema.py                     209     31    85%

merlin/schema/tags.py                        82      0   100%
TOTAL                                      4665   1464    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

======= 2 failed, 388 passed, 1 skipped, 25 warnings in 66.26s (0:01:06) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins6122469300596003724.sh

nvidia-merlin-bot · 2022-11-18T11:12:18Z

Click to view CI Results

GitHub pull request #111 of commit 814472f0aa1b18d0f43bc01800899c1df3d952e3, no merge conflicts.
Running as SYSTEM
Setting status of 814472f0aa1b18d0f43bc01800899c1df3d952e3 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/289/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 814472f0aa1b18d0f43bc01800899c1df3d952e3^{commit} # timeout=10
Checking out Revision 814472f0aa1b18d0f43bc01800899c1df3d952e3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 814472f0aa1b18d0f43bc01800899c1df3d952e3 # timeout=10
Commit message: "Update value_count serizliation and default is_list/is_ragged."
 > git rev-list --no-walk 6b6531116bfe32ee2e932e8d199de5de924c6724 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins14230189237701941417.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+17.g814472f.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+17.g814472f,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3508079650'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 390 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 44%]

tests/unit/schema/test_column_schemas.py FFFF..................F         [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

..........................................................F              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

______________________ test_dtype_column_schema[float32] _______________________
d_types = <class 'numpy.float32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count")


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

______________________ test_dtype_column_schema[float64] _______________________
d_types = <class 'numpy.float64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count")


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint32] _______________________
d_types = <class 'numpy.uint32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count")


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint64] _______________________
d_types = <class 'numpy.uint64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count")


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_________________________ test_list_column_attributes __________________________
def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list


  assert col2_schema.is_ragged


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged
tests/unit/schema/test_column_schemas.py:183: AssertionError

______________________ test_tensorflow_metadata_from_json ______________________
def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}


  assert column_schema.is_list


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list
tests/unit/schema/test_schema_io.py:220: AssertionError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36259 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34657 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45199 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39145 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38521 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34791 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     183     17    91%

merlin/schema/schema.py                     216     36    83%

merlin/schema/tags.py                        82      0   100%
TOTAL                                      4665   1469    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

======= 6 failed, 384 passed, 1 skipped, 25 warnings in 64.37s (0:01:04) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins17926776758125310944.sh

nvidia-merlin-bot · 2022-11-18T11:38:53Z

Click to view CI Results

GitHub pull request #111 of commit bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083, no merge conflicts.
Running as SYSTEM
Setting status of bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/290/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083^{commit} # timeout=10
Checking out Revision bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 # timeout=10
Commit message: "Add check for value count and is_ragged compatibility"
 > git rev-list --no-walk 814472f0aa1b18d0f43bc01800899c1df3d952e3 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2161594793601554336.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+18.gbd68a14.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+18.gbd68a14,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='180264363'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py FFFF..................F.        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

..........................................................F              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_______________________ test_string_datatypes[None-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_csv0')

engine = 'csv', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[None-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_par0')

engine = 'parquet', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[None-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_csv1')

engine = 'csv-no-header', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

_______________________ test_string_datatypes[True-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_csv0')

engine = 'csv', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[True-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_par0')

engine = 'parquet', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[True-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_csv1')

engine = 'csv-no-header', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False
merlin/schema/schema.py:99: ValueError

______________________ test_dtype_column_schema[float32] _______________________
d_types = <class 'numpy.float32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

______________________ test_dtype_column_schema[float64] _______________________
d_types = <class 'numpy.float64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint32] _______________________
d_types = <class 'numpy.uint32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint64] _______________________
d_types = <class 'numpy.uint64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_________________________ test_list_column_attributes __________________________
def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list


  assert col2_schema.is_ragged


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged
tests/unit/schema/test_column_schemas.py:183: AssertionError

______________________ test_tensorflow_metadata_from_json ______________________
def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}


  assert column_schema.is_list


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list
tests/unit/schema/test_schema_io.py:220: AssertionError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33883 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34367 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33699 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36081 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45019 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34047 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     183     17    91%

merlin/schema/schema.py                     218     36    83%

merlin/schema/tags.py                        82      0   100%
TOTAL                                      4667   1469    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

====== 12 failed, 379 passed, 1 skipped, 25 warnings in 64.77s (0:01:04) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins12932198516773488835.sh

nvidia-merlin-bot · 2022-11-18T11:44:05Z

Click to view CI Results

GitHub pull request #111 of commit d5a0ca7d749b703b3ae3b21887c380b87131b060, no merge conflicts.
Running as SYSTEM
Setting status of d5a0ca7d749b703b3ae3b21887c380b87131b060 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/291/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse d5a0ca7d749b703b3ae3b21887c380b87131b060^{commit} # timeout=10
Checking out Revision d5a0ca7d749b703b3ae3b21887c380b87131b060 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d5a0ca7d749b703b3ae3b21887c380b87131b060 # timeout=10
Commit message: "Update formatting"
 > git rev-list --no-walk bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2975759695102390733.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+19.gd5a0ca7.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+19.gd5a0ca7,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2218809350'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py FFFF..................F.        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

..........................................................F              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_______________________ test_string_datatypes[None-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_csv0')

engine = 'csv', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[None-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_par0')

engine = 'parquet', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[None-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_csv1')

engine = 'csv-no-header', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_______________________ test_string_datatypes[True-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_csv0')

engine = 'csv', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[True-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_par0')

engine = 'parquet', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[True-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_csv1')

engine = 'csv-no-header', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

______________________ test_dtype_column_schema[float32] _______________________
d_types = <class 'numpy.float32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

______________________ test_dtype_column_schema[float64] _______________________
d_types = <class 'numpy.float64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint32] _______________________
d_types = <class 'numpy.uint32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint64] _______________________
d_types = <class 'numpy.uint64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_________________________ test_list_column_attributes __________________________
def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list


  assert col2_schema.is_ragged


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged
tests/unit/schema/test_column_schemas.py:183: AssertionError

______________________ test_tensorflow_metadata_from_json ______________________
def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}


  assert column_schema.is_list


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list
tests/unit/schema/test_schema_io.py:218: AssertionError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41273 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40839 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35229 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46001 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45573 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45757 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     183     17    91%

merlin/schema/schema.py                     218     36    83%

merlin/schema/tags.py                        82      0   100%
TOTAL                                      4667   1469    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

====== 12 failed, 379 passed, 1 skipped, 25 warnings in 65.28s (0:01:05) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins18216793390002291021.sh

Would be nice if the pre-commit versions matched the versions of things in CI

nvidia-merlin-bot · 2022-11-18T11:49:59Z

Click to view CI Results

GitHub pull request #111 of commit 6610d7f31c395fe2e223ea6748ff1432066f3c14, no merge conflicts.
Running as SYSTEM
Setting status of 6610d7f31c395fe2e223ea6748ff1432066f3c14 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/292/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6610d7f31c395fe2e223ea6748ff1432066f3c14^{commit} # timeout=10
Checking out Revision 6610d7f31c395fe2e223ea6748ff1432066f3c14 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6610d7f31c395fe2e223ea6748ff1432066f3c14 # timeout=10
Commit message: "Update formatting."
 > git rev-list --no-walk d5a0ca7d749b703b3ae3b21887c380b87131b060 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins13047363149763558308.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+20.g6610d7f.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+20.g6610d7f,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2747429696'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py FFFF..................F.        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

..........................................................F              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

_______________________ test_string_datatypes[None-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_csv0')

engine = 'csv', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[None-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_par0')

engine = 'parquet', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[None-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_csv1')

engine = 'csv-no-header', cpu = None
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_______________________ test_string_datatypes[True-csv] ________________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_csv0')

engine = 'csv', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

_____________________ test_string_datatypes[True-parquet] ______________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_par0')

engine = 'parquet', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

__________________ test_string_datatypes[True-csv-no-header] ___________________
tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_csv1')

engine = 'csv-no-header', cpu = True
@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})


  dataset = merlin.io.Dataset(df)


tests/unit/io/test_io.py:89:

merlin/io/dataset.py:354: in init

self.infer_schema()

merlin/io/dataset.py:1140: in infer_schema

col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)

:9: in init

???

self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:


      raise ValueError(


            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E           ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.
merlin/schema/schema.py:99: ValueError

______________________ test_dtype_column_schema[float32] _______________________
d_types = <class 'numpy.float32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

______________________ test_dtype_column_schema[float64] _______________________
d_types = <class 'numpy.float64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint32] _______________________
d_types = <class 'numpy.uint32'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_______________________ test_dtype_column_schema[uint64] _______________________
d_types = <class 'numpy.uint64'>
@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):


  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)


tests/unit/schema/test_column_schemas.py:26:

:9: in init

???

self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)
def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)


  value_count = self.properties.get("value_count", {"min": 0, "max": 0})


E       AttributeError: 'list' object has no attribute 'get'
merlin/schema/schema.py:78: AttributeError

_________________________ test_list_column_attributes __________________________
def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list


  assert col2_schema.is_ragged


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged
tests/unit/schema/test_column_schemas.py:183: AssertionError

______________________ test_tensorflow_metadata_from_json ______________________
def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}


  assert column_schema.is_list


E       AssertionError: assert False

E        +  where False = ColumnSchema(name='categories', tags={<Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM: 'item'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list
tests/unit/schema/test_schema_io.py:218: AssertionError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44479 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35819 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41373 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 36359 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35435 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41963 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     183     17    91%

merlin/schema/schema.py                     218     36    83%

merlin/schema/tags.py                        82      0   100%
TOTAL                                      4667   1469    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

====== 12 failed, 379 passed, 1 skipped, 25 warnings in 64.90s (0:01:04) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins14282197680582606154.sh

nvidia-merlin-bot · 2022-11-18T12:04:49Z

Click to view CI Results

GitHub pull request #111 of commit 8c549eceef416090f082834354288c23a9b1094e, no merge conflicts.
Running as SYSTEM
Setting status of 8c549eceef416090f082834354288c23a9b1094e to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/293/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 8c549eceef416090f082834354288c23a9b1094e^{commit} # timeout=10
Checking out Revision 8c549eceef416090f082834354288c23a9b1094e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8c549eceef416090f082834354288c23a9b1094e # timeout=10
Commit message: "Only check is_ragged min/max if provided in constructor"
 > git rev-list --no-walk 6610d7f31c395fe2e223ea6748ff1432066f3c14 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins473916914072643228.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+24.g8c549ec.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+24.g8c549ec,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2371883594'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ........................        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39891 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34709 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 42293 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44505 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37431 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37553 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     185     17    91%

merlin/schema/schema.py                     220     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4671   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 391 passed, 1 skipped, 25 warnings in 64.47s (0:01:04) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins16623248719987275115.sh

nvidia-merlin-bot · 2022-11-18T12:26:24Z

Click to view CI Results

GitHub pull request #111 of commit 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee, no merge conflicts.
Running as SYSTEM
Setting status of 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/294/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee^{commit} # timeout=10
Checking out Revision 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee # timeout=10
Commit message: "Update formatting"
 > git rev-list --no-walk 8c549eceef416090f082834354288c23a9b1094e # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins9919008783548112554.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+25.g3feb59a.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+25.g3feb59a,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3136074863'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ........................        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41305 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33947 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38161 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45481 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46591 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43821 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     185     17    91%

merlin/schema/schema.py                     220     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4671   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 391 passed, 1 skipped, 25 warnings in 64.02s (0:01:04) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins11761895414974473889.sh

github-actions · 2022-11-18T12:34:17Z

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-111

oliverholworthy · 2022-11-18T15:04:42Z

This PR is a replacement for #169 since it removes the line being changed in that PR. #169 could be merged, but not strictly required to if this one is.

nvidia-merlin-bot · 2022-11-18T15:09:52Z

Click to view CI Results

GitHub pull request #111 of commit b89438dbfbabcca48e3c2091194796b90a23d3e1, no merge conflicts.
Running as SYSTEM
Setting status of b89438dbfbabcca48e3c2091194796b90a23d3e1 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/296/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse b89438dbfbabcca48e3c2091194796b90a23d3e1^{commit} # timeout=10
Checking out Revision b89438dbfbabcca48e3c2091194796b90a23d3e1 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b89438dbfbabcca48e3c2091194796b90a23d3e1 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 1cef02f30a7eb98c3d47c20365785717b952c92f # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins16288681192185688023.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+27.gb89438d.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+27.gb89438d,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1669387843'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ........................        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43481 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 40769 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33277 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41429 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37233 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41637 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     185     17    91%

merlin/schema/schema.py                     220     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4671   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 391 passed, 1 skipped, 25 warnings in 65.24s (0:01:05) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins3869258974199238942.sh

nvidia-merlin-bot · 2022-11-18T15:28:13Z

Click to view CI Results

GitHub pull request #111 of commit 01010c52e3d54f7e9f01c49b140d36b96949a23c, no merge conflicts.
Running as SYSTEM
Setting status of 01010c52e3d54f7e9f01c49b140d36b96949a23c to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/298/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 01010c52e3d54f7e9f01c49b140d36b96949a23c^{commit} # timeout=10
Checking out Revision 01010c52e3d54f7e9f01c49b140d36b96949a23c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 01010c52e3d54f7e9f01c49b140d36b96949a23c # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk e4045ea065917f972fe9766d1cf208e16a2eda29 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins6196049096676985959.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+29.g01010c5.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+29.g01010c5,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1555762591'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 13%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ........................        [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34829 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37093 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39711 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37169 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35723 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 33349 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     185     17    91%

merlin/schema/schema.py                     220     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4671   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 391 passed, 1 skipped, 25 warnings in 66.13s (0:01:06) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins14536671718947287545.sh

tests/unit/schema/test_column_schemas.py

(matching `is_list`) if value_count not provided

nvidia-merlin-bot · 2022-11-18T17:17:36Z

Click to view CI Results

GitHub pull request #111 of commit f7fd2e13ff439597401c497c31aabd02c67be6b4, no merge conflicts.
Running as SYSTEM
Setting status of f7fd2e13ff439597401c497c31aabd02c67be6b4 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/299/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse f7fd2e13ff439597401c497c31aabd02c67be6b4^{commit} # timeout=10
Checking out Revision f7fd2e13ff439597401c497c31aabd02c67be6b4 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f7fd2e13ff439597401c497c31aabd02c67be6b4 # timeout=10
Commit message: "Enable partial value count to be specified"
 > git rev-list --no-walk 01010c52e3d54f7e9f01c49b140d36b96949a23c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins11277348768107741401.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+33.gf7fd2e1.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+33.gf7fd2e1,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1910238395'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 393 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 12%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ...........................     [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 63%]

........................................................................ [ 81%]

..........................................................               [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34213 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37149 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45569 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45349 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 39409 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 44687 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     185     17    91%

merlin/schema/schema.py                     224     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4675   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 393 passed, 1 skipped, 25 warnings in 64.97s (0:01:04) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins6043518755805008060.sh

nvidia-merlin-bot · 2022-11-18T18:10:17Z

Click to view CI Results

GitHub pull request #111 of commit c73f7bba9037f42b4518978bb5c519d16636eb98, no merge conflicts.
Running as SYSTEM
Setting status of c73f7bba9037f42b4518978bb5c519d16636eb98 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/300/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse c73f7bba9037f42b4518978bb5c519d16636eb98^{commit} # timeout=10
Checking out Revision c73f7bba9037f42b4518978bb5c519d16636eb98 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c73f7bba9037f42b4518978bb5c519d16636eb98 # timeout=10
Commit message: "Add test for specifying only max value count and fix deserialization"
 > git rev-list --no-walk f7fd2e13ff439597401c497c31aabd02c67be6b4 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1730831325243427305.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+34.gc73f7bb.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+34.gc73f7bb,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3690212739'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 394 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 12%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ...........................     [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py ....F............................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=================================== FAILURES ===================================

___________________ test_value_count[value_count2-True-True] ___________________
value_count = {'max': 5}, expected_is_list = True, expected_is_ragged = True
@pytest.mark.parametrize(
    ["value_count", "expected_is_list", "expected_is_ragged"],
    [
        [{"min": 1, "max": 1}, True, False],
        [{"min": 1, "max": 2}, True, True],
        [{"max": 5}, True, True],
    ],
)
def test_value_count(value_count, expected_is_list, expected_is_ragged):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                is_list=True,
                properties={
                    "value_count": value_count,
                },
            )
        ]
    )
    assert schema["example"].is_list == expected_is_list
    assert schema["example"].is_ragged == expected_is_ragged

    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()


  output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()


tests/unit/schema/test_schema_io.py:102:

merlin/schema/io/tensorflow_metadata.py:202: in to_merlin_schema

col_schema = _merlin_column(feature)

merlin/schema/io/tensorflow_metadata.py:405: in _merlin_column

properties = _merlin_properties(feature)

feature = Feature(name='example', deprecated=False, presence=FeaturePresence(min_fraction=0.0, min_count=0), group_presence=Feat...0.0)), in_environment=[], not_in_environment=[], lifecycle_stage=0, unique_constraints=UniqueConstraints(min=0, max=0))
def _merlin_properties(feature):
    extra_metadata = feature.annotation.extra_metadata
    if len(extra_metadata) > 1:
        raise ValueError(
            f"{feature.name}: extra_metadata should have 1 item, has \
            {len(feature.annotation.extra_metadata)}"
        )
    elif len(extra_metadata) == 1:
        properties = feature.annotation.extra_metadata[0].value

        if isinstance(properties, bytes):
            properties = schema_bp.Any(value=properties).to_dict()

    else:
        properties = {}

    domain = _merlin_domain(feature)

    if domain:
        properties["domain"] = domain

    value_count = _merlin_value_count(feature)

    if value_count:
        properties["value_count"] = value_count


      properties["is_list"] = value_count.get("min") > 0


E           TypeError: '>' not supported between instances of 'NoneType' and 'int'
merlin/schema/io/tensorflow_metadata.py:365: TypeError

=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 43155 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 34311 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45355 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 46143 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45737 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41209 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     190     17    91%

merlin/schema/schema.py                     224     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4680   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

======= 1 failed, 393 passed, 1 skipped, 25 warnings in 65.73s (0:01:05) =======

ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)

___________________________________ summary ____________________________________

ERROR:   test-gpu: commands failed

Build step 'Execute shell' marked build as failure

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins15408042810799427856.sh

nvidia-merlin-bot · 2022-11-18T18:13:33Z

Click to view CI Results

GitHub pull request #111 of commit f48634c2e2a4f3f6139125a62666569e5a86a3e2, no merge conflicts.
Running as SYSTEM
Setting status of f48634c2e2a4f3f6139125a62666569e5a86a3e2 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/301/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse f48634c2e2a4f3f6139125a62666569e5a86a3e2^{commit} # timeout=10
Checking out Revision f48634c2e2a4f3f6139125a62666569e5a86a3e2 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f48634c2e2a4f3f6139125a62666569e5a86a3e2 # timeout=10
Commit message: "Check min or max when setting is_list"
 > git rev-list --no-walk c73f7bba9037f42b4518978bb5c519d16636eb98 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins7519629414838475338.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+35.gf48634c.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+35.gf48634c,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1681900638'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 394 items / 1 skipped
tests/unit/core/test_dispatch.py ....                                    [  1%]

tests/unit/core/test_protocols.py .........                              [  3%]

tests/unit/core/test_version.py .                                        [  3%]

tests/unit/dag/test_base_operator.py ....                                [  4%]

tests/unit/dag/test_column_selector.py ..............................    [ 12%]

tests/unit/dag/test_dictarray.py ...                                     [ 12%]

tests/unit/dag/test_executors.py ..                                      [ 13%]

tests/unit/dag/test_graph.py ....                                        [ 14%]

tests/unit/dag/ops/test_selection.py ....                                [ 15%]

tests/unit/io/test_io.py ............................................... [ 27%]

................................................................         [ 43%]

tests/unit/schema/test_column_schemas.py ...........................     [ 50%]

tests/unit/schema/test_schema.py .............                           [ 53%]

tests/unit/schema/test_schema_io.py .................................... [ 62%]

........................................................................ [ 81%]

...........................................................              [ 96%]

tests/unit/schema/test_tags.py .......                                   [ 97%]

tests/unit/utils/test_utils.py ........                                  [100%]
=============================== warnings summary ===============================

tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.

warnings.warn(
tests/unit/io/test_io.py::test_io_partitions_push

/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.

warnings.warn(
tests/unit/io/test_io.py::test_validate_and_regenerate_dataset

/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.

paths = [p.path for p in pa_dataset.pieces]
tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts

tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].

warnings.warn(
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags

/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38453 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 45769 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 35615 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 37635 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 41337 instead

warnings.warn(
tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]

/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.

Perhaps you already have a cluster running?

Hosting the HTTP server on port 38811 instead

warnings.warn(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
---------- coverage: platform linux, python 3.8.10-final-0 -----------

Name                                      Stmts   Miss  Cover
merlin/core/init.py                       2      0   100%

merlin/core/_version.py                     354    205    42%

merlin/core/compat.py                        10      4    60%

merlin/core/dispatch.py                     356    212    40%

merlin/core/protocols.py                     99     45    55%

merlin/core/utils.py                        195     56    71%

merlin/dag/init.py                        5      0   100%

merlin/dag/base_operator.py                 122     20    84%

merlin/dag/dictarray.py                      55     15    73%

merlin/dag/executors.py                     141     68    52%

merlin/dag/graph.py                          99     35    65%

merlin/dag/node.py                          344    161    53%

merlin/dag/ops/init.py                    4      0   100%

merlin/dag/ops/concat_columns.py             17      4    76%

merlin/dag/ops/selection.py                  22      0   100%

merlin/dag/ops/subset_columns.py             12      4    67%

merlin/dag/ops/subtraction.py                21     11    48%

merlin/dag/selector.py                      101      6    94%

merlin/io/init.py                         4      0   100%

merlin/io/avro.py                            88     88     0%

merlin/io/csv.py                             57      6    89%

merlin/io/dask.py                           181     53    71%

merlin/io/dataframe_engine.py                61      5    92%

merlin/io/dataframe_iter.py                  21      1    95%

merlin/io/dataset.py                        347     54    84%

merlin/io/dataset_engine.py                  37      8    78%

merlin/io/fsspec_utils.py                   127    108    15%

merlin/io/hugectr.py                         45     35    22%

merlin/io/parquet.py                        603     69    89%

merlin/io/shuffle.py                         38     12    68%

merlin/io/worker.py                          80     66    18%

merlin/io/writer.py                         190     52    73%

merlin/io/writer_factory.py                  18      4    78%

merlin/schema/init.py                     2      0   100%

merlin/schema/io/init.py                  0      0   100%

merlin/schema/io/proto_utils.py              20      4    80%

merlin/schema/io/schema_bp.py               306      5    98%

merlin/schema/io/tensorflow_metadata.py     190     17    91%

merlin/schema/schema.py                     224     33    85%

merlin/schema/tags.py                        82      1    99%
TOTAL                                      4680   1467    69%
=========================== short test summary info ============================

SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'

============ 394 passed, 1 skipped, 25 warnings in 64.63s (0:01:04) ============

___________________________________ summary ____________________________________

test-gpu: commands succeeded

congratulations :)

Performing Post build task...

Match found for : : True

Logical operation result is TRUE

Running script  : #!/bin/bash

cd /var/jenkins_home/

CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"

[merlin_core] $ /bin/bash /tmp/jenkins13487378482100226259.sh

Add test for value_count property

a8b6dd5

oliverholworthy force-pushed the schema-io-value-count branch from 6b66c87 to a8b6dd5 Compare August 3, 2022 15:17

oliverholworthy added the bug Something isn't working label Aug 3, 2022

reformat test_schema_io.py

4516dbf

viswa-nvidia added this to the Merlin 22.08 milestone Aug 5, 2022

Merge branch 'main' into schema-io-value-count

d2b21ec

viswa-nvidia modified the milestones: Merlin 22.08, Merlin 22.09 Aug 24, 2022

viswa-nvidia modified the milestones: Merlin 22.09, Merlin 22.10 Sep 29, 2022

Merge branch 'main' into schema-io-value-count

6b65311

Update value_count serizliation and default is_list/is_ragged.

814472f

oliverholworthy self-assigned this Nov 18, 2022

oliverholworthy modified the milestones: Merlin 22.10, Merlin 22.12 Nov 18, 2022

Add check for value count and is_ragged compatibility

bd68a14

Update formatting

d5a0ca7

oliverholworthy changed the title ~~Add test for value_count property in schema serialization~~ Update value_count serialization/deserialization to be consistent with original schema Nov 18, 2022

Update formatting.

6610d7f

Would be nice if the pre-commit versions matched the versions of things in CI

oliverholworthy added 2 commits November 18, 2022 12:00

Restore is_list/is_ragged inference from value_count when loading schema

64d0aa7

Correct test passinging list of properties

f1f9c43

oliverholworthy added 2 commits November 18, 2022 12:02

Update test to reflect default is_ragged attribute

1230469

Only check is_ragged min/max if provided in constructor

8c549ec

Update formatting

3feb59a

oliverholworthy marked this pull request as ready for review November 18, 2022 12:24

karlhigley approved these changes Nov 18, 2022

View reviewed changes

Merge branch 'main' into schema-io-value-count

b89438d

Merge branch 'main' into schema-io-value-count

01010c5

oliverholworthy commented Nov 18, 2022

View reviewed changes

tests/unit/schema/test_column_schemas.py Outdated Show resolved Hide resolved

oliverholworthy added 4 commits November 18, 2022 17:15

Revert change to default value of is_ragged

95ddcfa

(matching `is_list`) if value_count not provided

Add check for value count of zero and raise ValueError

4cdaa78

Add test for zero value in properties

0ca930d

Enable partial value count to be specified

f7fd2e1

oliverholworthy mentioned this pull request Nov 18, 2022

Remove min value count from properties when using sparse_max NVIDIA-Merlin/NVTabular#1705

Merged

Add test for specifying only max value count and fix deserialization

c73f7bb

Check min or max when setting is_list

f48634c

karlhigley merged commit 91e5247 into NVIDIA-Merlin:main Nov 19, 2022

This was referenced Nov 22, 2022

Change is_ragged property based on value_count in with_properties #172

Merged

Remove specifying is_ragged in LocalExecutor _transform_data #173

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `value_count` serialization/deserialization to be consistent with original schema #111

Update `value_count` serialization/deserialization to be consistent with original schema #111

oliverholworthy commented Aug 3, 2022 •

edited

Loading

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 15, 2022

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

github-actions bot commented Nov 18, 2022

oliverholworthy commented Nov 18, 2022

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

Update value_count serialization/deserialization to be consistent with original schema #111

Update value_count serialization/deserialization to be consistent with original schema #111

Conversation

oliverholworthy commented Aug 3, 2022 • edited Loading

Goal

Details

value_count property serialization/deserialzation:

is_list/is_ragged default value

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 3, 2022

nvidia-merlin-bot commented Aug 15, 2022

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

github-actions bot commented Nov 18, 2022

Documentation preview

oliverholworthy commented Nov 18, 2022

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

nvidia-merlin-bot commented Nov 18, 2022

---------- coverage: platform linux, python 3.8.10-final-0 ----------- Name Stmts Miss Cover

Update `value_count` serialization/deserialization to be consistent with original schema #111

Update `value_count` serialization/deserialization to be consistent with original schema #111

oliverholworthy commented Aug 3, 2022 •

edited

Loading

`value_count` property serialization/deserialzation:

`is_list`/`is_ragged` default value

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover