Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update value_count serialization/deserialization to be consistent with original schema #111

Merged
merged 21 commits into from
Nov 19, 2022

Conversation

oliverholworthy
Copy link
Member

@oliverholworthy oliverholworthy commented Aug 3, 2022

Goal

  • Ensure that the Schema (ColumnSchema) value_count property is restored to the same value when serialized/deserialized.
  • Update default value of is_list/is_ragged to infer from value_count property.

Details

value_count property serialization/deserialzation:

  • fails to serialize to json
    • value_count = {'max': 1, 'min': 1}
    • AttributeError: 'int' object has no attribute '_serialized_on_wire'
  • silently fails to re-construct the same schema object (serialize -> deserialize)
    • value_count = {{'max': 0, 'min': 0}, {'max': 2, 'min': 1}}

These value_count examples may not be practically an issue. It opens up some questions about where validation should take place for the Schema (at schema construction time or at serialization time). And where there are certain assumptions or constraints in serialation.

is_list/is_ragged default value

  • value_count max > min
    • implies that is_ragged=True, is_list=True
  • value_count max == min
    • implies that is_ragged=False, is_list=True
  • value_count max=0
    • implies that is_ragged=False, is_list=False

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 6b66c872515fc178ad2b55b1ce97837aa31ac558, no merge conflicts.
Running as SYSTEM
Setting status of 6b66c872515fc178ad2b55b1ce97837aa31ac558 to PENDING with url https://10.20.13.93:8080/job/merlin_core/87/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6b66c872515fc178ad2b55b1ce97837aa31ac558^{commit} # timeout=10
Checking out Revision 6b66c872515fc178ad2b55b1ce97837aa31ac558 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6b66c872515fc178ad2b55b1ce97837aa31ac558 # timeout=10
Commit message: "Add test for value_count property"
 > git rev-list --no-walk a5137c1f066c1577794aa12ad48d20f21d9b6afb # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2111088128308957226.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 344 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ..................................FFFF......... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 74%]
....................................................................... [ 95%]
tests/unit/schema/test_schema.py ...... [ 96%]
tests/unit/schema/test_schema_io.py ..F [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_________________ test_dask_dataset_from_dataframe[True-cudf] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4')
origin = 'cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5')
origin = 'dask_cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-pd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6')
origin = 'pd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-dd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7')
origin = 'dd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-3/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
_______________________________ test_value_count _______________________________

def test_value_count():
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": {"min": 1, "max": 1},
                },
                is_list=True,
            )
        ]
    )
  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()

tests/unit/schema/test_schema_io.py:86:


merlin/schema/io/tensorflow_metadata.py:216: in to_json
return self.proto_schema.to_json()
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json
return json.dumps(self.to_dict(), indent=indent)
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict
output[cased_name] = v.to_dict(casing, include_default_values)


self = FixedShape(dim=1), casing = <function camelcase at 0x7f7a1593e700>
include_default_values = False

def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:
              if v._serialized_on_wire or include_default_values:

E AttributeError: 'int' object has no attribute '_serialized_on_wire'

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45691 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42353 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36545 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40371 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46753 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38309 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count - AttributeError...
============ 5 failed, 339 passed, 1 skipped, 82 warnings in 51.65s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins12869842802791269842.sh

@oliverholworthy oliverholworthy added the bug Something isn't working label Aug 3, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit a8b6dd5ccb520d8852368c3be924d3b2dfc3904a, no merge conflicts.
Running as SYSTEM
Setting status of a8b6dd5ccb520d8852368c3be924d3b2dfc3904a to PENDING with url https://10.20.13.93:8080/job/merlin_core/88/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse a8b6dd5ccb520d8852368c3be924d3b2dfc3904a^{commit} # timeout=10
Checking out Revision a8b6dd5ccb520d8852368c3be924d3b2dfc3904a (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f a8b6dd5ccb520d8852368c3be924d3b2dfc3904a # timeout=10
Commit message: "Add test for value_count property"
 > git rev-list --no-walk 6b66c872515fc178ad2b55b1ce97837aa31ac558 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1305244692537516776.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ..................................FFFF......... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 73%]
....................................................................... [ 94%]
tests/unit/schema/test_schema.py ...... [ 96%]
tests/unit/schema/test_schema_io.py ..FFF [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_________________ test_dask_dataset_from_dataframe[True-cudf] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4')
origin = 'cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5')
origin = 'dask_cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-pd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6')
origin = 'pd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-dd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7')
origin = 'dd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-4/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
________________________ test_value_count[value_count0] ________________________

value_count = {'max': 0, 'min': 0}

@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? -----------------------------------
E + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

tests/unit/schema/test_schema_io.py:91: AssertionError
________________________ test_value_count[value_count1] ________________________

value_count = {'max': 1, 'min': 1}

@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()

tests/unit/schema/test_schema_io.py:89:


merlin/schema/io/tensorflow_metadata.py:216: in to_json
return self.proto_schema.to_json()
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json
return json.dumps(self.to_dict(), indent=indent)
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict
output[cased_name] = v.to_dict(casing, include_default_values)


self = FixedShape(dim=1), casing = <function camelcase at 0x7f5bdcabe700>
include_default_values = False

def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:
              if v._serialized_on_wire or include_default_values:

E AttributeError: 'int' object has no attribute '_serialized_on_wire'

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError
________________________ test_value_count[value_count2] ________________________

value_count = {'max': 2, 'min': 1}

@pytest.mark.parametrize("value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}])
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? ^^^^
E + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]
E ? ^^^

tests/unit/schema/test_schema_io.py:91: AssertionError
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43747 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41063 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33331 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37257 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43745 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35217 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...
============ 7 failed, 339 passed, 1 skipped, 82 warnings in 51.31s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins16851984469355567604.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5, no merge conflicts.
Running as SYSTEM
Setting status of 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 to PENDING with url https://10.20.13.93:8080/job/merlin_core/89/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5^{commit} # timeout=10
Checking out Revision 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 4516dbfa2b45a8580bbc964afe05e09e5c5a54b5 # timeout=10
Commit message: "reformat test_schema_io.py"
 > git rev-list --no-walk a8b6dd5ccb520d8852368c3be924d3b2dfc3904a # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins16948746651111168622.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ..................................FFFF......... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 73%]
....................................................................... [ 94%]
tests/unit/schema/test_schema.py ...... [ 96%]
tests/unit/schema/test_schema_io.py ..FFF [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_________________ test_dask_dataset_from_dataframe[True-cudf] __________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4')
origin = 'cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra4/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
_______________ test_dask_dataset_from_dataframe[True-dask_cudf] _______________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5')
origin = 'dask_cudf', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra5/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-pd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6')
origin = 'pd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra6/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
__________________ test_dask_dataset_from_dataframe[True-dd] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7')
origin = 'dd', cpu = True

@pytest.mark.parametrize("origin", ["cudf", "dask_cudf", "pd", "dd"])
@pytest.mark.parametrize("cpu", [None, True])
def test_dask_dataset_from_dataframe(tmpdir, origin, cpu):

    # Generate a DataFrame-based input
    if origin in ("pd", "dd"):
        df = pd.DataFrame({"a": range(100)})
        if origin == "dd":
            df = dask.dataframe.from_pandas(df, npartitions=4)
    elif origin in ("cudf", "dask_cudf"):
        df = cudf.DataFrame({"a": range(100)})
        if origin == "dask_cudf":
            df = dask_cudf.from_cudf(df, npartitions=4)

    # Convert to an NVTabular Dataset and back to a ddf
    dataset = merlin.io.Dataset(df, cpu=cpu)
    result = dataset.to_ddf()

    # Check resulting data
    assert_eq(df, result)

    # Check that the cpu kwarg is working correctly
    if cpu:
        assert isinstance(result.compute(), pd.DataFrame)

        # Should still work if we move to the GPU
        # (test behavior after repetitive conversion)
        dataset.to_gpu()
        dataset.to_cpu()
        dataset.to_cpu()
        dataset.to_gpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), cudf.DataFrame)
        dataset.to_cpu()
    else:
        assert isinstance(result.compute(), cudf.DataFrame)

        # Should still work if we move to the CPU
        # (test behavior after repetitive conversion)
        dataset.to_cpu()
        dataset.to_gpu()
        dataset.to_gpu()
        dataset.to_cpu()
        result = dataset.to_ddf()
        assert isinstance(result.compute(), pd.DataFrame)
        dataset.to_gpu()

    # Write to disk and read back
    path = str(tmpdir)
    dataset.to_parquet(path, out_files_per_proc=1, shuffle=None)
  ddf_check = dask_cudf.read_parquet(path).compute()

tests/unit/io/test_io.py:290:


/usr/local/lib/python3.8/dist-packages/dask/base.py:288: in compute
(result,) = compute(self, traverse=False, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/base.py:571: in compute
results = schedule(dsk, keys, **kwargs)
/usr/local/lib/python3.8/dist-packages/dask/local.py:553: in get_sync
return get_async(
/usr/local/lib/python3.8/dist-packages/dask/local.py:496: in get_async
for key, res_info, failed in queue_get(queue).result():
/usr/lib/python3.8/concurrent/futures/_base.py:437: in result
return self.__get_result()
/usr/lib/python3.8/concurrent/futures/_base.py:389: in __get_result
raise self._exception
/usr/local/lib/python3.8/dist-packages/dask/local.py:538: in submit
fut.set_result(fn(*args, **kwargs))
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in batch_execute_tasks
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:234: in
return [execute_task(a) for a in it]
/usr/local/lib/python3.8/dist-packages/dask/local.py:225: in execute_task
result = pack_exception(e, dumps)
/usr/local/lib/python3.8/dist-packages/dask/local.py:220: in execute_task
result = _execute_task(task, data)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/optimization.py:969: in call
return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
/usr/local/lib/python3.8/dist-packages/dask/core.py:149: in get
result = _execute_task(task, cache)
/usr/local/lib/python3.8/dist-packages/dask/core.py:119: in _execute_task
return func(
(_execute_task(a, cache) for a in args))
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:87: in call
return read_parquet_part(
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:431: in read_parquet_part
dfs = [
/usr/local/lib/python3.8/dist-packages/dask/dataframe/io/parquet/core.py:432: in
func(fs, rg, columns.copy(), index, **toolz.merge(kwargs, kw))
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:216: in read_partition
cls._read_paths(
/usr/local/lib/python3.8/dist-packages/dask_cudf/io/parquet.py:92: in _read_paths
df = cudf.read_parquet(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:379: in read_parquet
) = _process_dataset(
/usr/local/lib/python3.8/dist-packages/nvtx/nvtx.py:101: in inner
result = func(*args, **kwargs)
/usr/local/lib/python3.8/dist-packages/cudf/io/parquet.py:205: in _process_dataset
dataset = ds.dataset(
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:683: in dataset
return _filesystem_dataset(source, **kwargs)
/usr/local/lib/python3.8/dist-packages/pyarrow/dataset.py:435: in _filesystem_dataset
return factory.finish(schema)
pyarrow/_dataset.pyx:2473: in pyarrow._dataset.DatasetFactory.finish
???
pyarrow/error.pxi:143: in pyarrow.lib.pyarrow_internal_check_status
???


???
E pyarrow.lib.ArrowInvalid: Error creating dataset. Could not read schema from '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7/part_0.parquet': Could not open Parquet input source '/tmp/pytest-of-jenkins/pytest-5/test_dask_dataset_from_datafra7/part_0.parquet': Parquet file size is 0 bytes. Is this a 'parquet' file?

pyarrow/error.pxi:99: ArrowInvalid
________________________ test_value_count[value_count0] ________________________

value_count = {'max': 0, 'min': 0}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? -----------------------------------
E + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

tests/unit/schema/test_schema_io.py:93: AssertionError
________________________ test_value_count[value_count1] ________________________

value_count = {'max': 1, 'min': 1}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()

tests/unit/schema/test_schema_io.py:91:


merlin/schema/io/tensorflow_metadata.py:216: in to_json
return self.proto_schema.to_json()
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json
return json.dumps(self.to_dict(), indent=indent)
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict
output[cased_name] = v.to_dict(casing, include_default_values)


self = FixedShape(dim=1), casing = <function camelcase at 0x7f234bb6d700>
include_default_values = False

def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:
              if v._serialized_on_wire or include_default_values:

E AttributeError: 'int' object has no attribute '_serialized_on_wire'

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError
________________________ test_value_count[value_count2] ________________________

value_count = {'max': 2, 'min': 1}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? ^^^^
E + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]
E ? ^^^

tests/unit/schema/test_schema_io.py:93: AssertionError
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40849 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45103 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46801 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39641 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34735 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44247 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dask_cudf]
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-pd] - ...
FAILED tests/unit/io/test_io.py::test_dask_dataset_from_dataframe[True-dd] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...
============ 7 failed, 339 passed, 1 skipped, 82 warnings in 51.75s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins18257896002267801304.sh

@viswa-nvidia viswa-nvidia added this to the Merlin 22.08 milestone Aug 5, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit d2b21ecc5495fb500d784ea9b3ee00c37f5950c0, no merge conflicts.
Running as SYSTEM
Setting status of d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 to PENDING with url https://10.20.13.93:8080/job/merlin_core/102/console and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse d2b21ecc5495fb500d784ea9b3ee00c37f5950c0^{commit} # timeout=10
Checking out Revision d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d2b21ecc5495fb500d784ea9b3ee00c37f5950c0 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 5c431330c7a782eae5e4bcef3a628ce76281825c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins17375995638170082886.sh
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.2, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.6.1, xdist-2.5.0, forked-1.4.0, cov-3.0.0
collected 346 items / 1 skipped

tests/unit/core/test_dispatch.py .. [ 0%]
tests/unit/dag/test_base_operator.py .... [ 1%]
tests/unit/dag/test_column_selector.py .......................... [ 9%]
tests/unit/dag/test_graph.py . [ 9%]
tests/unit/dag/test_tags.py ...... [ 11%]
tests/unit/dag/ops/test_selection.py ... [ 12%]
tests/unit/io/test_io.py ............................................... [ 25%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ............................... [ 53%]
........................................................................ [ 73%]
....................................................................... [ 94%]
tests/unit/schema/test_schema.py ...... [ 96%]
tests/unit/schema/test_schema_io.py ..FFF [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
________________________ test_value_count[value_count0] ________________________

value_count = {'max': 0, 'min': 0}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...gged': False}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 0, 'max': 0}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? -----------------------------------
E + [{'name': 'example', 'tags': set(), 'properties': {}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]

tests/unit/schema/test_schema_io.py:93: AssertionError
________________________ test_value_count[value_count1] ________________________

value_count = {'max': 1, 'min': 1}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()

tests/unit/schema/test_schema_io.py:91:


merlin/schema/io/tensorflow_metadata.py:216: in to_json
return self.proto_schema.to_json()
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json
return json.dumps(self.to_dict(), indent=indent)
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict
output[cased_name] = v.to_dict(casing, include_default_values)


self = FixedShape(dim=1), casing = <function camelcase at 0x7fd7a8680820>
include_default_values = False

def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:
              if v._serialized_on_wire or include_default_values:

E AttributeError: 'int' object has no attribute '_serialized_on_wire'

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError
________________________ test_value_count[value_count2] ________________________

value_count = {'max': 2, 'min': 1}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...gged': False}]
E Full diff:
E - [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': False}]
E ? ^^^^
E + [{'name': 'example', 'tags': set(), 'properties': {'value_count': {'min': 1, 'max': 2}}, 'dtype': dtype('float64'), 'is_list': True, 'is_ragged': True}]
E ? ^^^

tests/unit/schema/test_schema_io.py:93: AssertionError
=============================== warnings summary ===============================
tests/unit/dag/test_base_operator.py: 4 warnings
tests/unit/io/test_io.py: 71 warnings
/usr/local/lib/python3.8/dist-packages/cudf/core/frame.py:384: UserWarning: The deep parameter is ignored and is only included for pandas compatibility.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/utils/test_utils.py::test_serial_context[True]
/usr/local/lib/python3.8/dist-packages/tornado/ioloop.py:350: DeprecationWarning: make_current is deprecated; start the event loop first
self.make_current()

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33193 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45721 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35157 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37185 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42931 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44417 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count0] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count1] - ...
FAILED tests/unit/schema/test_schema_io.py::test_value_count[value_count2] - ...
============ 3 failed, 343 passed, 1 skipped, 83 warnings in 54.73s ============
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins5560053937149570237.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 6b6531116bfe32ee2e932e8d199de5de924c6724, no merge conflicts.
Running as SYSTEM
Setting status of 6b6531116bfe32ee2e932e8d199de5de924c6724 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/288/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6b6531116bfe32ee2e932e8d199de5de924c6724^{commit} # timeout=10
Checking out Revision 6b6531116bfe32ee2e932e8d199de5de924c6724 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6b6531116bfe32ee2e932e8d199de5de924c6724 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 1538d39f5e2ec8b983f343ec87dc6db6cf11a0f5 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins775823410952237213.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+16.g6b65311.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+16.g6b65311,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1141691438'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 390 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py ....................... [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py ..FF................................ [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
________________________ test_value_count[value_count0] ________________________

value_count = {'max': 0, 'min': 0}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
    output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
  assert output_schema == schema

E AssertionError: assert [{'name': 'ex...agged': True}] == [{'name': 'ex...agged': True}]
E Use -v to get more diff

tests/unit/schema/test_schema_io.py:97: AssertionError
________________________ test_value_count[value_count1] ________________________

value_count = {'max': 1, 'min': 1}

@pytest.mark.parametrize(
    "value_count", [{"min": 0, "max": 0}, {"min": 1, "max": 1}, {"min": 1, "max": 2}]
)
def test_value_count(value_count):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                properties={
                    "value_count": value_count,
                },
                is_list=True,
            )
        ]
    )
  json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()

tests/unit/schema/test_schema_io.py:95:


merlin/schema/io/tensorflow_metadata.py:216: in to_json
return self.proto_schema.to_json()
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:909: in to_json
return json.dumps(self.to_dict(), indent=indent)
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in to_dict
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:811: in
v = [i.to_dict(casing, include_default_values) for i in v]
/usr/local/lib/python3.8/dist-packages/betterproto/init.py:816: in to_dict
output[cased_name] = v.to_dict(casing, include_default_values)


self = FixedShape(dim=1), casing = <function camelcase at 0x7feac5d3f8b0>
include_default_values = False

def to_dict(
    self, casing: Casing = Casing.CAMEL, include_default_values: bool = False
) -> dict:
    """
    Returns a dict representation of this message instance which can be
    used to serialize to e.g. JSON. Defaults to camel casing for
    compatibility but can be set to other modes.

    `include_default_values` can be set to `True` to include default
    values of fields. E.g. an `int32` type field with `0` value will
    not be in returned dict if `include_default_values` is set to
    `False`.
    """
    output: Dict[str, Any] = {}
    for field in dataclasses.fields(self):
        meta = FieldMetadata.get(field)
        v = getattr(self, field.name)
        cased_name = casing(field.name).rstrip("_")  # type: ignore
        if meta.proto_type == "message":
            if isinstance(v, datetime):
                if v != DATETIME_ZERO or include_default_values:
                    output[cased_name] = _Timestamp.timestamp_to_json(v)
            elif isinstance(v, timedelta):
                if v != timedelta(0) or include_default_values:
                    output[cased_name] = _Duration.delta_to_json(v)
            elif meta.wraps:
                if v is not None or include_default_values:
                    output[cased_name] = v
            elif isinstance(v, list):
                # Convert each item.
                v = [i.to_dict(casing, include_default_values) for i in v]
                if v or include_default_values:
                    output[cased_name] = v
            else:
              if v._serialized_on_wire or include_default_values:

E AttributeError: 'int' object has no attribute '_serialized_on_wire'

/usr/local/lib/python3.8/dist-packages/betterproto/init.py:815: AttributeError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36059 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41935 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39425 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40581 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42777 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35983 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 209 31 85%
merlin/schema/tags.py 82 0 100%

TOTAL 4665 1464 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 2 failed, 388 passed, 1 skipped, 25 warnings in 66.26s (0:01:06) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins6122469300596003724.sh

@oliverholworthy oliverholworthy self-assigned this Nov 18, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 814472f0aa1b18d0f43bc01800899c1df3d952e3, no merge conflicts.
Running as SYSTEM
Setting status of 814472f0aa1b18d0f43bc01800899c1df3d952e3 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/289/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 814472f0aa1b18d0f43bc01800899c1df3d952e3^{commit} # timeout=10
Checking out Revision 814472f0aa1b18d0f43bc01800899c1df3d952e3 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 814472f0aa1b18d0f43bc01800899c1df3d952e3 # timeout=10
Commit message: "Update value_count serizliation and default is_list/is_ragged."
 > git rev-list --no-walk 6b6531116bfe32ee2e932e8d199de5de924c6724 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins14230189237701941417.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+17.g814472f.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+17.g814472f,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3508079650'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 390 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 44%]
tests/unit/schema/test_column_schemas.py FFFF..................F [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
..........................................................F [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
______________________ test_dtype_column_schema[float32] _______________________

d_types = <class 'numpy.float32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count")

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
______________________ test_dtype_column_schema[float64] _______________________

d_types = <class 'numpy.float64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count")

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint32] _______________________

d_types = <class 'numpy.uint32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count")

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint64] _______________________

d_types = <class 'numpy.uint64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count")

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_________________________ test_list_column_attributes __________________________

def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list
  assert col2_schema.is_ragged

E AssertionError: assert False
E + where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged

tests/unit/schema/test_column_schemas.py:183: AssertionError
______________________ test_tensorflow_metadata_from_json ______________________

def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}
  assert column_schema.is_list

E AssertionError: assert False
E + where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list

tests/unit/schema/test_schema_io.py:220: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36259 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34657 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45199 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39145 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38521 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34791 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 183 17 91%
merlin/schema/schema.py 216 36 83%
merlin/schema/tags.py 82 0 100%

TOTAL 4665 1469 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 6 failed, 384 passed, 1 skipped, 25 warnings in 64.37s (0:01:04) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins17926776758125310944.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083, no merge conflicts.
Running as SYSTEM
Setting status of bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/290/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083^{commit} # timeout=10
Checking out Revision bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 # timeout=10
Commit message: "Add check for value count and is_ragged compatibility"
 > git rev-list --no-walk 814472f0aa1b18d0f43bc01800899c1df3d952e3 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2161594793601554336.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+18.gbd68a14.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+18.gbd68a14,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='180264363'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py FFFF..................F. [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
..........................................................F [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_______________________ test_string_datatypes[None-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_csv0')
engine = 'csv', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[None-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_par0')
engine = 'parquet', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[None-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_None_csv1')
engine = 'csv-no-header', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
_______________________ test_string_datatypes[True-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_csv0')
engine = 'csv', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[True-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_par0')
engine = 'parquet', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[True-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-2/test_string_datatypes_True_csv1')
engine = 'csv-no-header', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. This is a fixed size list and `is_ragged` should be set to False"
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False

merlin/schema/schema.py:99: ValueError
______________________ test_dtype_column_schema[float32] _______________________

d_types = <class 'numpy.float32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
______________________ test_dtype_column_schema[float64] _______________________

d_types = <class 'numpy.float64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint32] _______________________

d_types = <class 'numpy.uint32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint64] _______________________

d_types = <class 'numpy.uint64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_________________________ test_list_column_attributes __________________________

def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list
  assert col2_schema.is_ragged

E AssertionError: assert False
E + where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged

tests/unit/schema/test_column_schemas.py:183: AssertionError
______________________ test_tensorflow_metadata_from_json ______________________

def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}
  assert column_schema.is_list

E AssertionError: assert False
E + where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list

tests/unit/schema/test_schema_io.py:220: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33883 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34367 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33699 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36081 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45019 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34047 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 183 17 91%
merlin/schema/schema.py 218 36 83%
merlin/schema/tags.py 82 0 100%

TOTAL 4667 1469 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
====== 12 failed, 379 passed, 1 skipped, 25 warnings in 64.77s (0:01:04) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins12932198516773488835.sh

@oliverholworthy oliverholworthy changed the title Add test for value_count property in schema serialization Update value_count serialization/deserialization to be consistent with original schema Nov 18, 2022
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit d5a0ca7d749b703b3ae3b21887c380b87131b060, no merge conflicts.
Running as SYSTEM
Setting status of d5a0ca7d749b703b3ae3b21887c380b87131b060 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/291/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse d5a0ca7d749b703b3ae3b21887c380b87131b060^{commit} # timeout=10
Checking out Revision d5a0ca7d749b703b3ae3b21887c380b87131b060 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f d5a0ca7d749b703b3ae3b21887c380b87131b060 # timeout=10
Commit message: "Update formatting"
 > git rev-list --no-walk bd68a1428d8f3a45dbeafa451ffcba1dc8f3d083 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins2975759695102390733.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+19.gd5a0ca7.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+19.gd5a0ca7,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2218809350'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py FFFF..................F. [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
..........................................................F [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_______________________ test_string_datatypes[None-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_csv0')
engine = 'csv', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[None-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_par0')
engine = 'parquet', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[None-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_None_csv1')
engine = 'csv-no-header', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_______________________ test_string_datatypes[True-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_csv0')
engine = 'csv', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[True-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_par0')
engine = 'parquet', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[True-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-3/test_string_datatypes_True_csv1')
engine = 'csv-no-header', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
______________________ test_dtype_column_schema[float32] _______________________

d_types = <class 'numpy.float32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
______________________ test_dtype_column_schema[float64] _______________________

d_types = <class 'numpy.float64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint32] _______________________

d_types = <class 'numpy.uint32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint64] _______________________

d_types = <class 'numpy.uint64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_________________________ test_list_column_attributes __________________________

def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list
  assert col2_schema.is_ragged

E AssertionError: assert False
E + where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged

tests/unit/schema/test_column_schemas.py:183: AssertionError
______________________ test_tensorflow_metadata_from_json ______________________

def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}
  assert column_schema.is_list

E AssertionError: assert False
E + where False = ColumnSchema(name='categories', tags={<Tags.ITEM: 'item'>, <Tags.CATEGORICAL: 'categorical'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list

tests/unit/schema/test_schema_io.py:218: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41273 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40839 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35229 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46001 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45573 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45757 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 183 17 91%
merlin/schema/schema.py 218 36 83%
merlin/schema/tags.py 82 0 100%

TOTAL 4667 1469 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
====== 12 failed, 379 passed, 1 skipped, 25 warnings in 65.28s (0:01:05) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins18216793390002291021.sh

Would be nice if the pre-commit versions matched the versions of things in CI
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 6610d7f31c395fe2e223ea6748ff1432066f3c14, no merge conflicts.
Running as SYSTEM
Setting status of 6610d7f31c395fe2e223ea6748ff1432066f3c14 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/292/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 6610d7f31c395fe2e223ea6748ff1432066f3c14^{commit} # timeout=10
Checking out Revision 6610d7f31c395fe2e223ea6748ff1432066f3c14 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 6610d7f31c395fe2e223ea6748ff1432066f3c14 # timeout=10
Commit message: "Update formatting."
 > git rev-list --no-walk d5a0ca7d749b703b3ae3b21887c380b87131b060 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins13047363149763558308.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+20.g6610d7f.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+20.g6610d7f,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2747429696'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ...FFFFFF...................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py FFFF..................F. [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
..........................................................F [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
_______________________ test_string_datatypes[None-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_csv0')
engine = 'csv', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[None-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_par0')
engine = 'parquet', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[None-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_None_csv1')
engine = 'csv-no-header', cpu = None

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_______________________ test_string_datatypes[True-csv] ________________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_csv0')
engine = 'csv', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
_____________________ test_string_datatypes[True-parquet] ______________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_par0')
engine = 'parquet', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
__________________ test_string_datatypes[True-csv-no-header] ___________________

tmpdir = local('/tmp/pytest-of-jenkins/pytest-4/test_string_datatypes_True_csv1')
engine = 'csv-no-header', cpu = True

@pytest.mark.parametrize("engine", ["csv", "parquet", "csv-no-header"])
@pytest.mark.parametrize("cpu", [None, True])
def test_string_datatypes(tmpdir, engine, cpu):
    df_lib = dispatch.get_lib()
    df = df_lib.DataFrame({"column": [[0.1, 0.2]]})
  dataset = merlin.io.Dataset(df)

tests/unit/io/test_io.py:89:


merlin/io/dataset.py:354: in init
self.infer_schema()
merlin/io/dataset.py:1140: in infer_schema
col_schema = ColumnSchema(column, dtype=dtype_val, is_list=is_list, is_ragged=is_list)
:9: in init
???


self = ColumnSchema(name='column', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=True)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)

    value_count = self.properties.get("value_count", {"min": 0, "max": 0})

    if self.is_list is None:
        if value_count["max"] > 0:
            object.__setattr__(self, "is_list", True)
        else:
            object.__setattr__(self, "is_list", False)

    if self.is_ragged is None:
        if value_count["max"] > value_count["min"]:
            object.__setattr__(self, "is_ragged", True)
        else:
            object.__setattr__(self, "is_ragged", False)

    if self.is_ragged and not self.is_list:
        raise ValueError(
            "`is_ragged` is set to `True` but `is_list` is not. "
            "Only list columns can set the `is_ragged` flag."
        )

    if self.is_ragged and value_count["max"] == value_count["min"]:
      raise ValueError(
            "`is_ragged` is set to `True` but `value_count.min` == `value_count.max`. "
            "If value_count min/max are equal. "
            "This is a fixed size list and `is_ragged` should be set to False. "
        )

E ValueError: is_ragged is set to True but value_count.min == value_count.max. If value_count min/max are equal. This is a fixed size list and is_ragged should be set to False.

merlin/schema/schema.py:99: ValueError
______________________ test_dtype_column_schema[float32] _______________________

d_types = <class 'numpy.float32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
______________________ test_dtype_column_schema[float64] _______________________

d_types = <class 'numpy.float64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('float64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint32] _______________________

d_types = <class 'numpy.uint32'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint32'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_______________________ test_dtype_column_schema[uint64] _______________________

d_types = <class 'numpy.uint64'>

@pytest.mark.parametrize("d_types", [numpy.float32, numpy.float64, numpy.uint32, numpy.uint64])
def test_dtype_column_schema(d_types):
  column = ColumnSchema("name", tags=[], properties=[], dtype=d_types)

tests/unit/schema/test_column_schemas.py:26:


:9: in init
???


self = ColumnSchema(name='name', tags=set(), properties=[], dtype=dtype('uint64'), is_list=None, is_ragged=None)

def __post_init__(self):
    """Standardize tags and dtypes on initialization

    Raises:
        TypeError: If the provided dtype cannot be cast to a numpy dtype
    """
    tags = TagSet(self.tags)
    object.__setattr__(self, "tags", tags)

    try:
        if hasattr(self.dtype, "numpy_dtype"):
            dtype = np.dtype(self.dtype.numpy_dtype)
        elif hasattr(self.dtype, "_categories"):
            dtype = self.dtype._categories.dtype
        elif isinstance(self.dtype, pd.StringDtype):
            dtype = np.dtype("O")
        else:
            dtype = np.dtype(self.dtype)
    except TypeError as err:
        raise TypeError(
            f"Unsupported dtype {self.dtype}, unable to cast {self.dtype} to a numpy dtype."
        ) from err

    object.__setattr__(self, "dtype", dtype)
  value_count = self.properties.get("value_count", {"min": 0, "max": 0})

E AttributeError: 'list' object has no attribute 'get'

merlin/schema/schema.py:78: AttributeError
_________________________ test_list_column_attributes __________________________

def test_list_column_attributes():
    col0_schema = ColumnSchema("col0")

    assert not col0_schema.is_list
    assert not col0_schema.is_ragged
    assert col0_schema.quantity == ColumnQuantity.SCALAR

    col1_schema = ColumnSchema("col1", is_list=False, is_ragged=False)

    assert not col1_schema.is_list
    assert not col1_schema.is_ragged
    assert col1_schema.quantity == ColumnQuantity.SCALAR

    col2_schema = ColumnSchema("col2", is_list=True)

    assert col2_schema.is_list
  assert col2_schema.is_ragged

E AssertionError: assert False
E + where False = ColumnSchema(name='col2', tags=set(), properties={}, dtype=dtype('float64'), is_list=True, is_ragged=False).is_ragged

tests/unit/schema/test_column_schemas.py:183: AssertionError
______________________ test_tensorflow_metadata_from_json ______________________

def test_tensorflow_metadata_from_json():
    # make sure we can load up tensorflowmetadata serialized json objects, like done by
    # merlin-models
    json_schema = """{"feature": [
    {
      "name": "categories",
      "valueCount": {
        "min": "1",
        "max": "4"
      },
      "type": "INT",
      "intDomain": {
        "name": "categories",
        "min": "1",
        "max": "331",
        "isCategorical": true
      },
      "annotation": {
        "tag": [
          "item"
        ]
      }
    }]}
    """

    schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()
    column_schema = schema.column_schemas["categories"]

    # make sure the value_count is set appropriately
    assert column_schema.properties["value_count"] == {"min": 1, "max": 4}
  assert column_schema.is_list

E AssertionError: assert False
E + where False = ColumnSchema(name='categories', tags={<Tags.CATEGORICAL: 'categorical'>, <Tags.ITEM: 'item'>}, properties={'domain': {...331, 'name': 'categories'}, 'value_count': {'min': 1, 'max': 4}}, dtype=dtype('int64'), is_list=False, is_ragged=False).is_list

tests/unit/schema/test_schema_io.py:218: AssertionError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44479 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35819 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41373 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 36359 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35435 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41963 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 183 17 91%
merlin/schema/schema.py 218 36 83%
merlin/schema/tags.py 82 0 100%

TOTAL 4667 1469 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
====== 12 failed, 379 passed, 1 skipped, 25 warnings in 64.90s (0:01:04) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins14282197680582606154.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 8c549eceef416090f082834354288c23a9b1094e, no merge conflicts.
Running as SYSTEM
Setting status of 8c549eceef416090f082834354288c23a9b1094e to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/293/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 8c549eceef416090f082834354288c23a9b1094e^{commit} # timeout=10
Checking out Revision 8c549eceef416090f082834354288c23a9b1094e (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 8c549eceef416090f082834354288c23a9b1094e # timeout=10
Commit message: "Only check is_ragged min/max if provided in constructor"
 > git rev-list --no-walk 6610d7f31c395fe2e223ea6748ff1432066f3c14 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins473916914072643228.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+24.g8c549ec.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+24.g8c549ec,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='2371883594'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................ [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39891 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34709 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 42293 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44505 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37431 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37553 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 185 17 91%
merlin/schema/schema.py 220 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4671 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 391 passed, 1 skipped, 25 warnings in 64.47s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins16623248719987275115.sh

@oliverholworthy oliverholworthy marked this pull request as ready for review November 18, 2022 12:24
@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee, no merge conflicts.
Running as SYSTEM
Setting status of 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/294/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee^{commit} # timeout=10
Checking out Revision 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 3feb59a033fa3fc2da1b57c2b18a7dcf90dac3ee # timeout=10
Commit message: "Update formatting"
 > git rev-list --no-walk 8c549eceef416090f082834354288c23a9b1094e # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins9919008783548112554.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+25.g3feb59a.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+25.g3feb59a,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,# Editable install with no version control (nvtabular==1.4.0+8.g95e12d347),-e /usr/local/lib/python3.8/dist-packages,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3136074863'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................ [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41305 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33947 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38161 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45481 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46591 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43821 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 185 17 91%
merlin/schema/schema.py 220 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4671 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 391 passed, 1 skipped, 25 warnings in 64.02s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins11761895414974473889.sh

@github-actions
Copy link

Documentation preview

https://nvidia-merlin.github.io/core/review/pr-111

@oliverholworthy
Copy link
Member Author

This PR is a replacement for #169 since it removes the line being changed in that PR. #169 could be merged, but not strictly required to if this one is.

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit b89438dbfbabcca48e3c2091194796b90a23d3e1, no merge conflicts.
Running as SYSTEM
Setting status of b89438dbfbabcca48e3c2091194796b90a23d3e1 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/296/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse b89438dbfbabcca48e3c2091194796b90a23d3e1^{commit} # timeout=10
Checking out Revision b89438dbfbabcca48e3c2091194796b90a23d3e1 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f b89438dbfbabcca48e3c2091194796b90a23d3e1 # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk 1cef02f30a7eb98c3d47c20365785717b952c92f # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins16288681192185688023.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+27.gb89438d.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+27.gb89438d,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1669387843'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................ [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43481 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 40769 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33277 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41429 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37233 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41637 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 185 17 91%
merlin/schema/schema.py 220 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4671 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 391 passed, 1 skipped, 25 warnings in 65.24s (0:01:05) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins3869258974199238942.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit 01010c52e3d54f7e9f01c49b140d36b96949a23c, no merge conflicts.
Running as SYSTEM
Setting status of 01010c52e3d54f7e9f01c49b140d36b96949a23c to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/298/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse 01010c52e3d54f7e9f01c49b140d36b96949a23c^{commit} # timeout=10
Checking out Revision 01010c52e3d54f7e9f01c49b140d36b96949a23c (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f 01010c52e3d54f7e9f01c49b140d36b96949a23c # timeout=10
Commit message: "Merge branch 'main' into schema-io-value-count"
 > git rev-list --no-walk e4045ea065917f972fe9766d1cf208e16a2eda29 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins6196049096676985959.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+29.g01010c5.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+29.g01010c5,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1555762591'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 391 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 13%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................ [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34829 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37093 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39711 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37169 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35723 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 33349 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 185 17 91%
merlin/schema/schema.py 220 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4671 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 391 passed, 1 skipped, 25 warnings in 66.13s (0:01:06) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins14536671718947287545.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit f7fd2e13ff439597401c497c31aabd02c67be6b4, no merge conflicts.
Running as SYSTEM
Setting status of f7fd2e13ff439597401c497c31aabd02c67be6b4 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/299/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse f7fd2e13ff439597401c497c31aabd02c67be6b4^{commit} # timeout=10
Checking out Revision f7fd2e13ff439597401c497c31aabd02c67be6b4 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f7fd2e13ff439597401c497c31aabd02c67be6b4 # timeout=10
Commit message: "Enable partial value count to be specified"
 > git rev-list --no-walk 01010c52e3d54f7e9f01c49b140d36b96949a23c # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins11277348768107741401.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+33.gf7fd2e1.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+33.gf7fd2e1,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1910238395'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 393 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................... [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 63%]
........................................................................ [ 81%]
.......................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34213 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37149 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45569 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45349 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 39409 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 44687 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 185 17 91%
merlin/schema/schema.py 224 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4675 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 393 passed, 1 skipped, 25 warnings in 64.97s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins6043518755805008060.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit c73f7bba9037f42b4518978bb5c519d16636eb98, no merge conflicts.
Running as SYSTEM
Setting status of c73f7bba9037f42b4518978bb5c519d16636eb98 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/300/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse c73f7bba9037f42b4518978bb5c519d16636eb98^{commit} # timeout=10
Checking out Revision c73f7bba9037f42b4518978bb5c519d16636eb98 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f c73f7bba9037f42b4518978bb5c519d16636eb98 # timeout=10
Commit message: "Add test for specifying only max value count and fix deserialization"
 > git rev-list --no-walk f7fd2e13ff439597401c497c31aabd02c67be6b4 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins1730831325243427305.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+34.gc73f7bb.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+34.gc73f7bb,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='3690212739'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 394 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................... [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py ....F............................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=================================== FAILURES ===================================
___________________ test_value_count[value_count2-True-True] ___________________

value_count = {'max': 5}, expected_is_list = True, expected_is_ragged = True

@pytest.mark.parametrize(
    ["value_count", "expected_is_list", "expected_is_ragged"],
    [
        [{"min": 1, "max": 1}, True, False],
        [{"min": 1, "max": 2}, True, True],
        [{"max": 5}, True, True],
    ],
)
def test_value_count(value_count, expected_is_list, expected_is_ragged):
    schema = Schema(
        [
            ColumnSchema(
                "example",
                is_list=True,
                properties={
                    "value_count": value_count,
                },
            )
        ]
    )
    assert schema["example"].is_list == expected_is_list
    assert schema["example"].is_ragged == expected_is_ragged

    json_schema = TensorflowMetadata.from_merlin_schema(schema).to_json()
  output_schema = TensorflowMetadata.from_json(json_schema).to_merlin_schema()

tests/unit/schema/test_schema_io.py:102:


merlin/schema/io/tensorflow_metadata.py:202: in to_merlin_schema
col_schema = _merlin_column(feature)
merlin/schema/io/tensorflow_metadata.py:405: in _merlin_column
properties = _merlin_properties(feature)


feature = Feature(name='example', deprecated=False, presence=FeaturePresence(min_fraction=0.0, min_count=0), group_presence=Feat...0.0)), in_environment=[], not_in_environment=[], lifecycle_stage=0, unique_constraints=UniqueConstraints(min=0, max=0))

def _merlin_properties(feature):
    extra_metadata = feature.annotation.extra_metadata
    if len(extra_metadata) > 1:
        raise ValueError(
            f"{feature.name}: extra_metadata should have 1 item, has \
            {len(feature.annotation.extra_metadata)}"
        )
    elif len(extra_metadata) == 1:
        properties = feature.annotation.extra_metadata[0].value

        if isinstance(properties, bytes):
            properties = schema_bp.Any(value=properties).to_dict()

    else:
        properties = {}

    domain = _merlin_domain(feature)

    if domain:
        properties["domain"] = domain

    value_count = _merlin_value_count(feature)

    if value_count:
        properties["value_count"] = value_count
      properties["is_list"] = value_count.get("min") > 0

E TypeError: '>' not supported between instances of 'NoneType' and 'int'

merlin/schema/io/tensorflow_metadata.py:365: TypeError
=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 43155 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 34311 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45355 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 46143 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45737 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41209 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 224 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4680 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
======= 1 failed, 393 passed, 1 skipped, 25 warnings in 65.73s (0:01:05) =======
ERROR: InvocationError for command /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu/bin/python -m pytest --cov-report term --cov merlin -rxs tests/unit (exited with code 1)
___________________________________ summary ____________________________________
ERROR: test-gpu: commands failed
Build step 'Execute shell' marked build as failure
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins15408042810799427856.sh

@nvidia-merlin-bot
Copy link

Click to view CI Results
GitHub pull request #111 of commit f48634c2e2a4f3f6139125a62666569e5a86a3e2, no merge conflicts.
Running as SYSTEM
Setting status of f48634c2e2a4f3f6139125a62666569e5a86a3e2 to PENDING with url http://merlin-infra1.nvidia.com:8080/job/merlin_core/301/ and message: 'Pending'
Using context: Jenkins
Building on master in workspace /var/jenkins_home/workspace/merlin_core
using credential ce87ff3c-94f0-400a-8303-cb4acb4918b5
 > git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
 > git config remote.origin.url /~https://github.com/NVIDIA-Merlin/core # timeout=10
Fetching upstream changes from /~https://github.com/NVIDIA-Merlin/core
 > git --version # timeout=10
using GIT_ASKPASS to set credentials login for merlin-systems username and pass
 > git fetch --tags --force --progress -- /~https://github.com/NVIDIA-Merlin/core +refs/pull/111/*:refs/remotes/origin/pr/111/* # timeout=10
 > git rev-parse f48634c2e2a4f3f6139125a62666569e5a86a3e2^{commit} # timeout=10
Checking out Revision f48634c2e2a4f3f6139125a62666569e5a86a3e2 (detached)
 > git config core.sparsecheckout # timeout=10
 > git checkout -f f48634c2e2a4f3f6139125a62666569e5a86a3e2 # timeout=10
Commit message: "Check min or max when setting is_list"
 > git rev-list --no-walk c73f7bba9037f42b4518978bb5c519d16636eb98 # timeout=10
[merlin_core] $ /bin/bash /tmp/jenkins7519629414838475338.sh
GLOB sdist-make: /var/jenkins_home/workspace/merlin_core/core/setup.py
test-gpu recreate: /var/jenkins_home/workspace/merlin_core/core/.tox/test-gpu
test-gpu installdeps: pytest, pytest-cov
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu inst: /var/jenkins_home/workspace/merlin_core/core/.tox/.tmp/package/1/merlin-core-0.8.0+35.gf48634c.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
test-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.12,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,black==22.6.0,bleach==5.0.1,boto3==1.24.75,botocore==1.29.12,Brotli==1.0.9,cachetools==5.2.0,certifi==2019.11.28,cffi==1.15.1,chardet==3.0.4,charset-normalizer==2.1.1,clang==5.0,click==8.1.3,cliff==4.0.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.5.1,dbus-python==1.2.16,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,dm-tree==0.1.6,docker-pycreds==0.4.0,docutils==0.16,emoji==1.7.0,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google==3.0.0,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==2.8,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,ipywidgets==7.7.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-cache==0.4.3,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter-server-mathjax==0.2.5,jupyter-sphinx==0.3.2,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab-widgets==1.1.0,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,linkify-it-py==1.0.3,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,markdown-it-py==1.1.0,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,mdit-py-plugins==0.2.8,merlin-core==0.8.0+35.gf48634c,merlin-dataloader==0.0.2,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,myst-nb==0.13.2,myst-parser==0.15.2,natsort==8.1.0,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbdime==3.1.1,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular @ git+/~https://github.com/NVIDIA-Merlin/NVTabular.git@21117cfc4c113b30036afcb97b6daa5f377996db,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.3,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,PyGObject==3.36.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.0.2,python-apt==2.0.0+ubuntu0.20.4.8,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.22.0,requests-oauthlib==1.3.1,requests-unixsocket==0.2.0,rsa==4.7.2,s3fs==2022.2.0,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,seedir==0.3.0,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.15.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinx-togglebutton==0.3.1,sphinx_external_toc==0.3.0,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-copydirs @ git+/~https://github.com/mikemckiernan/sphinxcontrib-copydirs.git@bd8c5d79b3f91cf5f1bb0d6995aeca3fe84b670e,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.44,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,supervisor==4.1.0,tabulate==0.8.10,tblib==1.7.0,tdqm==0.0.1,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow==2.9.2,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.12+2.gbcc939255,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,uc-micro-py==1.0.1,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,versioneer==0.20,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,widgetsnbextension==3.6.0,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
test-gpu run-test-pre: PYTHONHASHSEED='1681900638'
test-gpu run-test: commands[0] | python -m pytest --cov-report term --cov merlin -rxs tests/unit
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
cachedir: .tox/test-gpu/.pytest_cache
rootdir: /var/jenkins_home/workspace/merlin_core/core, configfile: pyproject.toml
plugins: anyio-3.5.0, xdist-3.0.2, cov-4.0.0
collected 394 items / 1 skipped

tests/unit/core/test_dispatch.py .... [ 1%]
tests/unit/core/test_protocols.py ......... [ 3%]
tests/unit/core/test_version.py . [ 3%]
tests/unit/dag/test_base_operator.py .... [ 4%]
tests/unit/dag/test_column_selector.py .............................. [ 12%]
tests/unit/dag/test_dictarray.py ... [ 12%]
tests/unit/dag/test_executors.py .. [ 13%]
tests/unit/dag/test_graph.py .... [ 14%]
tests/unit/dag/ops/test_selection.py .... [ 15%]
tests/unit/io/test_io.py ............................................... [ 27%]
................................................................ [ 43%]
tests/unit/schema/test_column_schemas.py ........................... [ 50%]
tests/unit/schema/test_schema.py ............. [ 53%]
tests/unit/schema/test_schema_io.py .................................... [ 62%]
........................................................................ [ 81%]
........................................................... [ 96%]
tests/unit/schema/test_tags.py ....... [ 97%]
tests/unit/utils/test_utils.py ........ [100%]

=============================== warnings summary ===============================
tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 6 files. Did not have enough partitions to create 7 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 9 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 10 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 8 files. Did not have enough partitions to create 11 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 13 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 14 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 15 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 16 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 17 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 18 files.
warnings.warn(

tests/unit/io/test_io.py::test_io_partitions_push
/var/jenkins_home/workspace/merlin_core/core/merlin/io/dataset.py:869: UserWarning: Only creating 12 files. Did not have enough partitions to create 19 files.
warnings.warn(

tests/unit/io/test_io.py::test_validate_and_regenerate_dataset
/var/jenkins_home/workspace/merlin_core/core/merlin/io/parquet.py:551: DeprecationWarning: 'ParquetDataset.pieces' attribute is deprecated as of pyarrow 5.0.0 and will be removed in a future version. Specify 'use_legacy_dataset=False' while constructing the ParquetDataset, and then use the '.fragments' attribute instead.
paths = [p.path for p in pa_dataset.pieces]

tests/unit/schema/test_column_schemas.py::test_column_schema_tags_normalize
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.ITEM_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.ITEM: 'item'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_schema_io.py::test_json_serialization_with_embedded_dicts
tests/unit/schema/test_schema_io.py::test_merlin_to_proto_to_json_to_merlin
tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.USER_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.USER: 'user'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.SESSION_ID have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.SESSION: 'session'>, <Tags.ID: 'id'>].
warnings.warn(

tests/unit/schema/test_tags.py::test_tagset_atomizes_compound_tags
/var/jenkins_home/workspace/merlin_core/core/merlin/schema/tags.py:148: UserWarning: Compound tags like Tags.TEXT_TOKENIZED have been deprecated and will be removed in a future version. Please use the atomic versions of these tags, like [<Tags.TEXT: 'text'>, <Tags.TOKENIZED: 'tokenized'>].
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38453 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[True-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 45769 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 35615 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed[False-False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 37635 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[True]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 41337 instead
warnings.warn(

tests/unit/utils/test_utils.py::test_nvt_distributed_force[False]
/var/jenkins_home/.local/lib/python3.8/site-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.
Perhaps you already have a cluster running?
Hosting the HTTP server on port 38811 instead
warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

---------- coverage: platform linux, python 3.8.10-final-0 -----------
Name Stmts Miss Cover

merlin/core/init.py 2 0 100%
merlin/core/_version.py 354 205 42%
merlin/core/compat.py 10 4 60%
merlin/core/dispatch.py 356 212 40%
merlin/core/protocols.py 99 45 55%
merlin/core/utils.py 195 56 71%
merlin/dag/init.py 5 0 100%
merlin/dag/base_operator.py 122 20 84%
merlin/dag/dictarray.py 55 15 73%
merlin/dag/executors.py 141 68 52%
merlin/dag/graph.py 99 35 65%
merlin/dag/node.py 344 161 53%
merlin/dag/ops/init.py 4 0 100%
merlin/dag/ops/concat_columns.py 17 4 76%
merlin/dag/ops/selection.py 22 0 100%
merlin/dag/ops/subset_columns.py 12 4 67%
merlin/dag/ops/subtraction.py 21 11 48%
merlin/dag/selector.py 101 6 94%
merlin/io/init.py 4 0 100%
merlin/io/avro.py 88 88 0%
merlin/io/csv.py 57 6 89%
merlin/io/dask.py 181 53 71%
merlin/io/dataframe_engine.py 61 5 92%
merlin/io/dataframe_iter.py 21 1 95%
merlin/io/dataset.py 347 54 84%
merlin/io/dataset_engine.py 37 8 78%
merlin/io/fsspec_utils.py 127 108 15%
merlin/io/hugectr.py 45 35 22%
merlin/io/parquet.py 603 69 89%
merlin/io/shuffle.py 38 12 68%
merlin/io/worker.py 80 66 18%
merlin/io/writer.py 190 52 73%
merlin/io/writer_factory.py 18 4 78%
merlin/schema/init.py 2 0 100%
merlin/schema/io/init.py 0 0 100%
merlin/schema/io/proto_utils.py 20 4 80%
merlin/schema/io/schema_bp.py 306 5 98%
merlin/schema/io/tensorflow_metadata.py 190 17 91%
merlin/schema/schema.py 224 33 85%
merlin/schema/tags.py 82 1 99%

TOTAL 4680 1467 69%

=========================== short test summary info ============================
SKIPPED [1] tests/unit/io/test_avro.py:34: could not import 'uavro': No module named 'uavro'
============ 394 passed, 1 skipped, 25 warnings in 64.63s (0:01:04) ============
___________________________________ summary ____________________________________
test-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=1 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/core/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[merlin_core] $ /bin/bash /tmp/jenkins13487378482100226259.sh

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants