Skip to content

Commit

Permalink
Merge branch 'main' into 1539-InferenceClient-text-classification
Browse files Browse the repository at this point in the history
  • Loading branch information
Wauplin authored Sep 6, 2023
2 parents cf8e176 + b94f891 commit 206c06b
Show file tree
Hide file tree
Showing 6 changed files with 1,071 additions and 6 deletions.
72 changes: 71 additions & 1 deletion docs/source/en/guides/download.md
Original file line number Diff line number Diff line change
Expand Up @@ -161,4 +161,74 @@ Here is a table that summarizes the different options to help you choose the par
| `local_dir="path/to/folder"`<br>`local_dir_use_symlinks=False` | Yes | file in folder ||| ⚠️<br>_(file has to be cached first)_ | ❌<br>_(file is duplicated)_ |

**Note:** if you are on a Windows machine, you need to enable developer mode or run `huggingface_hub` as admin to enable
symlinks. Check out the [cache limitations](../guides/manage-cache#limitations) section for more details.
symlinks. Check out the [cache limitations](../guides/manage-cache#limitations) section for more details.

## Download from the CLI

You can use the `huggingface-cli download` command from the terminal to directly download files from the Hub.
Internally, it uses the same [`hf_hub_download`] and [`snapshot_download`] helpers described above and prints the
returned path to the terminal:

```bash
>>> huggingface-cli download gpt2 config.json
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
```

By default, the token saved locally (using `huggingface-cli login`) will be used. If you want to authenticate explicitly,
use the `--token` option:

```bash
>>> huggingface-cli download gpt2 config.json --token=hf_****
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
```

You can download multiple files at once which displays a progress bar and returns the snapshot path in which the files
are located:

```bash
>>> huggingface-cli download gpt2 config.json model.safetensors
Fetching 2 files: 100%|████████████████████████████████████████████| 2/2 [00:00<00:00, 23831.27it/s]
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10
```

If you want to silence the progress bars and potential warnings, use the `--quiet` option. This can prove useful if you
want to pass the output to another command in a script.

```bash
>>> huggingface-cli download gpt2 config.json model.safetensors
/home/wauplin/.cache/huggingface/hub/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10
```

By default, files are downloaded to the cache directory defined by `HF_HOME` environment variable (or `~/.cache/huggingface/hub` if not specified). You
can override this by using the `--cache-dir` option:

```bash
>>> huggingface-cli download gpt2 config.json --cache-dir=./cache
./cache/models--gpt2/snapshots/11c5a3d5811f50298f278a704980280950aedb10/config.json
```

If you want to download files to a local folder, without the cache directory structure, you can use `--local-dir`.
Downloading to a local folder comes with its limitations which are listed in this [table](https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder).


```bash
>>> huggingface-cli download gpt2 config.json --local-dir=./models/gpt2
./models/gpt2/config.json
```


There are more arguments you can specify to download from different repo types or revisions and to include/exclude files to download using
glob patterns:

```bash
>>> huggingface-cli download bigcode/the-stack --repo-type=dataset --revision=v1.2 --include="data/python/*" --exclu
de="*.json" --exclude="*.zip"
Fetching 206 files: 100%|████████████████████████████████████████████| 206/206 [02:31<2:31, ?it/s]
/home/wauplin/.cache/huggingface/hub/datasets--bigcode--the-stack/snapshots/9ca8fa6acdbc8ce920a0cb58adcdafc495818ae7
```

For a full list of the arguments, you can run:

```bash
huggingface-cli download --help
```
54 changes: 54 additions & 0 deletions docs/source/en/guides/upload.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,60 @@ but before that, all previous logs on the repo on deleted. All of this in a sing
... )
```

## Upload from the CLI

You can use the `huggingface-cli upload` command from the terminal to directly upload files to the Hub. Internally
it uses the same [`upload_file`] and [`upload_folder`] helpers described above.

You can either upload a single file or an entire folder:

```bash
# Usage: huggingface-cli upload [repo_id] [local_path] [path_in_repo]
>>> huggingface-cli upload Wauplin/my-cool-model ./models/model.safetensors model.safetensors
https://huggingface.co/Wauplin/my-cool-model/blob/main/model.safetensors

>>> huggingface-cli upload Wauplin/my-cool-model ./models .
https://huggingface.co/Wauplin/my-cool-model/tree/main
```

`local_path` and `path_in_repo` are optional and can be implicitly inferred. If `local_path` is not set, the tool will
check if a local folder or file has the same name as the `repo_id`. If that's the case, its content will be uploaded.
Otherwise, an exception is raised asking the user to explicitly set `local_path`. In any case, if `path_in_repo` is not
set, files are uploaded at the root of the repo.

```bash
# Upload file at root
huggingface-cli upload my-cool-model model.safetensors

# Upload directory at root
huggingface-cli upload my-cool-model ./models

# Upload `my-cool-model/` directory if it exist, raise otherwise
huggingface-cli upload my-cool-model
```

By default, the token saved locally (using `huggingface-cli login`) will be used. If you want to authenticate explicitly,
use the `--token` option:

```bash
huggingface-cli upload my-cool-model --token=hf_****
```

When uploading a folder, you can use the `--include` and `--exclude` arguments to filter the files to upload. You can
also use `--delete` to delete existing files on the Hub.

```bash
# Sync local Space with Hub (upload new files except from logs/, delete removed files)
huggingface-cli upload Wauplin/space-example --repo-type=space --exclude="/logs/*" --delete="*" --commit-message="Sync local Space with Hub"
```

Finally, you can also schedule a job that will upload your files regularly (see [scheduled uploads](#scheduled-uploads)).

```bash
# Upload new logs every 10 minutes
huggingface-cli upload training-model logs/ --every=10
```

## Advanced features

In most cases, you won't need more than [`upload_file`] and [`upload_folder`] to upload your files to the Hub.
Expand Down
204 changes: 204 additions & 0 deletions src/huggingface_hub/commands/download.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
# coding=utf-8
# Copyright 2023-present, the HuggingFace Inc. team.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
"""Contains command to download files from the Hub with the CLI.
Usage:
huggingface-cli download --help
# Download file
huggingface-cli download gpt2 config.json
# Download entire repo
huggingface-cli download fffiloni/zeroscope --repo-type=space --revision=refs/pr/78
# Download repo with filters
huggingface-cli download gpt2 --include="*.safetensors"
# Download with token
huggingface-cli download Wauplin/private-model --token=hf_***
# Download quietly (no progress bar, no warnings, only the returned path)
huggingface-cli download gpt2 config.json --quiet
# Download to local dir
huggingface-cli download gpt2 --local-dir=./models/gpt2
"""
import warnings
from argparse import Namespace, _SubParsersAction
from typing import List, Literal, Optional, Union

from huggingface_hub import logging
from huggingface_hub._snapshot_download import snapshot_download
from huggingface_hub.commands import BaseHuggingfaceCLICommand
from huggingface_hub.file_download import hf_hub_download
from huggingface_hub.utils import disable_progress_bars, enable_progress_bars


class DownloadCommand(BaseHuggingfaceCLICommand):
@staticmethod
def register_subcommand(parser: _SubParsersAction):
download_parser = parser.add_parser("download", help="Download files from the Hub")
download_parser.add_argument(
"repo_id", type=str, help="ID of the repo to download from (e.g. `username/repo-name`)."
)
download_parser.add_argument(
"filenames", type=str, nargs="*", help="Files to download (e.g. `config.json`, `data/metadata.jsonl`)."
)
download_parser.add_argument(
"--repo-type",
choices=["model", "dataset", "space"],
default="model",
help="Type of repo to download from (e.g. `dataset`).",
)
download_parser.add_argument(
"--revision",
type=str,
help="An optional Git revision id which can be a branch name, a tag, or a commit hash.",
)
download_parser.add_argument(
"--include", nargs="*", type=str, help="Glob patterns to match files to download."
)
download_parser.add_argument(
"--exclude", nargs="*", type=str, help="Glob patterns to exclude from files to download."
)
download_parser.add_argument(
"--cache-dir", type=str, help="Path to the directory where to save the downloaded files."
)
download_parser.add_argument(
"--local-dir",
type=str,
help=(
"If set, the downloaded file will be placed under this directory either as a symlink (default) or a"
" regular file. Check out"
" https://huggingface.co/docs/huggingface_hub/guides/download#download-files-to-local-folder for more"
" details."
),
)
download_parser.add_argument(
"--local-dir-use-symlinks",
choices=["auto", "True", "False"],
default="auto",
help=(
"To be used with `local_dir`. If set to 'auto', the cache directory will be used and the file will be"
" either duplicated or symlinked to the local directory depending on its size. It set to `True`, a"
" symlink will be created, no matter the file size. If set to `False`, the file will either be"
" duplicated from cache (if already exists) or downloaded from the Hub and not cached."
),
)
download_parser.add_argument(
"--force-download",
action="store_true",
help="If True, the files will be downloaded even if they are already cached.",
)
download_parser.add_argument(
"--resume-download", action="store_true", help="If True, resume a previously interrupted download."
)
download_parser.add_argument(
"--token", type=str, help="A User Access Token generated from https://huggingface.co/settings/tokens"
)
download_parser.add_argument(
"--quiet",
action="store_true",
help="If True, progress bars are disabled and only the path to the download files is printed.",
)
download_parser.set_defaults(func=DownloadCommand)

def __init__(self, args: Namespace) -> None:
self.token = args.token
self.repo_id: str = args.repo_id
self.filenames: List[str] = args.filenames
self.repo_type: str = args.repo_type
self.revision: Optional[str] = args.revision
self.include: Optional[List[str]] = args.include
self.exclude: Optional[List[str]] = args.exclude
self.cache_dir: Optional[str] = args.cache_dir
self.local_dir: Optional[str] = args.local_dir
self.force_download: bool = args.force_download
self.resume_download: bool = args.resume_download
self.quiet: bool = args.quiet

# Raise if local_dir_use_symlinks is invalid
self.local_dir_use_symlinks: Union[Literal["auto"], bool]
use_symlinks_lowercase = args.local_dir_use_symlinks.lower()
if use_symlinks_lowercase == "true":
self.local_dir_use_symlinks = True
elif use_symlinks_lowercase == "false":
self.local_dir_use_symlinks = False
elif use_symlinks_lowercase == "auto":
self.local_dir_use_symlinks = "auto"
else:
raise ValueError(
f"'{args.local_dir_use_symlinks}' is not a valid value for `local_dir_use_symlinks`. It must be either"
" 'auto', 'True' or 'False'."
)

def run(self) -> None:
if self.quiet:
disable_progress_bars()
with warnings.catch_warnings():
warnings.simplefilter("ignore")
print(self._download()) # Print path to downloaded files
enable_progress_bars()
else:
logging.set_verbosity_info()
print(self._download()) # Print path to downloaded files
logging.set_verbosity_warning()

def _download(self) -> str:
# Warn user if patterns are ignored
if len(self.filenames) > 0:
if self.include is not None and len(self.include) > 0:
warnings.warn("Ignoring `--include` since filenames have being explicitly set.")
if self.exclude is not None and len(self.exclude) > 0:
warnings.warn("Ignoring `--exclude` since filenames have being explicitly set.")

# Single file to download: use `hf_hub_download`
if len(self.filenames) == 1:
return hf_hub_download(
repo_id=self.repo_id,
repo_type=self.repo_type,
revision=self.revision,
filename=self.filenames[0],
cache_dir=self.cache_dir,
resume_download=self.resume_download,
force_download=self.force_download,
token=self.token,
local_dir=self.local_dir,
local_dir_use_symlinks=self.local_dir_use_symlinks,
library_name="huggingface-cli",
)

# Otherwise: use `snapshot_download` to ensure all files comes from same revision
elif len(self.filenames) == 0:
allow_patterns = self.include
ignore_patterns = self.exclude
else:
allow_patterns = self.filenames
ignore_patterns = None

return snapshot_download(
repo_id=self.repo_id,
repo_type=self.repo_type,
revision=self.revision,
allow_patterns=allow_patterns,
ignore_patterns=ignore_patterns,
resume_download=self.resume_download,
force_download=self.force_download,
cache_dir=self.cache_dir,
token=self.token,
local_dir=self.local_dir,
local_dir_use_symlinks=self.local_dir_use_symlinks,
library_name="huggingface-cli",
)
4 changes: 4 additions & 0 deletions src/huggingface_hub/commands/huggingface_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,11 @@
from argparse import ArgumentParser

from huggingface_hub.commands.delete_cache import DeleteCacheCommand
from huggingface_hub.commands.download import DownloadCommand
from huggingface_hub.commands.env import EnvironmentCommand
from huggingface_hub.commands.lfs import LfsCommands
from huggingface_hub.commands.scan_cache import ScanCacheCommand
from huggingface_hub.commands.upload import UploadCommand
from huggingface_hub.commands.user import UserCommands


Expand All @@ -29,6 +31,8 @@ def main():
# Register commands
EnvironmentCommand.register_subcommand(commands_parser)
UserCommands.register_subcommand(commands_parser)
UploadCommand.register_subcommand(commands_parser)
DownloadCommand.register_subcommand(commands_parser)
LfsCommands.register_subcommand(commands_parser)
ScanCacheCommand.register_subcommand(commands_parser)
DeleteCacheCommand.register_subcommand(commands_parser)
Expand Down
Loading

0 comments on commit 206c06b

Please sign in to comment.