Data Warehouse FAQ (Frequently Asked Questions)
Summary and answers to common data warehouse questions
How to modify content in a dataset?
Currently, datasets do not support deleting specific files in a particular version (deleting a single file), but some modification operations are supported. Specifically:
- In the current version, add new files to the current version through the "Upload data to current directory" function
- In the current version, overwrite existing file content by uploading a new file with the same name through the "Upload data to current directory" function
If you want to modify a dataset in the workspace, please save the modified content to the container's working directory, then after the container is closed and synced, you can copy the specified directory to the dataset.
Upload succeeds immediately but new files are not visible on the page
In some cases, HyperAI may consider certain files as already existing (even though the files were not successfully uploaded), which causes the upload dialog to close immediately after clicking upload and prompts that the files have been uploaded successfully.
In this situation, you can click the "Upload new version" or "Upload to current directory" button again, click "Clear upload cache" in the upper right corner, and then retry the upload process.
Top-level directory lost during automatic extraction
When compressing a folder, we can use the following two methods:
Compress the file directly:
Select all files in the folder and then package:
The first method actually adds an extra directory layer in the compressed package, which will also appear as a top-level directory after extraction. The second method contains a flat series of files in the compressed package, and will not include an additional directory after extraction.
Most developers use the first method to compress folders but don't realize that extracted files will have an extra directory layer. To facilitate most scenarios, when the compressed package name matches its top-level directory name, we will automatically help remove this directory layer. For example:
If I compress a directory called train, it generates a file named train.zip by default. After uploading, HyperAI will automatically remove the top-level directory train and keep the files underneath.
If you want to keep the train directory layer, you can rename train.zip to something like train_.zip during upload. After uploading, HyperAI will find that the file name (train_) and the top-level directory train are different, and will keep this directory layer.
Failure displayed after automatic extraction
If "Upload failed, please confirm data package format" is displayed after uploading a compressed package, it means HyperAI cannot extract the compressed package at all. It is recommended to package in another format and upload again.
Actual file count decreased after automatic extraction
There are multiple reasons why full or partial extraction may fail when packaging compressed files:
- Character encoding format incompatibility, such as default Chinese file character encoding issues under Windows
- Partial file corruption caused by repeated transmission of the compressed package
- The compressed package uses an incompatible format, see macOS large zip package upload
HyperAI will attempt to use multiple extraction tools to extract the compressed package, trying to ensure the integrity of the compressed package. However, after multiple attempts, some corrupted data may still be lost. Therefore, if you find that the extraction results do not match the actual local extraction results, you can try to repackage the already extracted local data or transfer through multiple partial uploads.
macOS large size zip package upload
The original zip standard supports compressed packages up to 4GB in size and containing a maximum of 65535 files.
The new zip64 standard extends this to support larger archives and a greater number of files. However, the default compression tool in macOS Sierra and later versions does not support this standard when compressing content exceeding 4GB, which will cause archives larger than 4GB compressed on macOS to fail to be successfully decompressed and uploaded on HyperAI.
Therefore, macOS users are advised to use Keka or other compression tools that support the zip64 standard.