常见问题(不定期更新)

mango支持远端用户连接(创建用户)

> use target_db

> db.createUser({
user: 'userName',
pwd: 'secretPassword',
roles: [
{ role: 'dbAdmin', db:'target_db'},
{ role: 'readWrite', db:'target_db'},
]
})

mac brew install jdk

brew search jdk 
brew install openjdk@11

java -version
# 如果无法找到
# The operation couldn’t be completed. Unable to locate a Java Runtime.
# Please visit http://www.java.com for information on installing Java.

sudo ln -sfn /opt/homebrew/opt/openjdk@11/libexec/openjdk.jdk \
/Library/Java/JavaVirtualMachines/openjdk.jdk

java -version
# openjdk version "11.0.18" 2023-01-17
# OpenJDK Runtime Environment Homebrew (build 11.0.18+0)
# OpenJDK 64-Bit Server VM Homebrew (build 11.0.18+0, mixed mode)

hdfs以及上传

使用https://github.com/big-data-europe/docker-hadoop

docker-compose up -d
# 复制文件
docker cp <local> namenode:/<path>

namenode

# HDFS list commands to show all the directories in root "/"
hdfs dfs -ls /
# Create a new directory inside HDFS using mkdir tag.
hdfs dfs -mkdir -p /user/root
# Copy the files to the input path in HDFS. -f means force
hdfs dfs -put -f <file_name> <path>
# Have a look at the content of your input file.
hdfs dfs -cat <input_file>

ERROR:: Could not find a local HDF5 installation

参考github issues

pip install cython

brew install hdf5

brew install c-blosc

export HDF5_DIR=/usr/local/

export BLOSC_DIR=/usr/local/

pip install tables

featurize中kaggle api无法下载

featurize中使用代理

printenv |grep -i proxy
# http_proxy=http://172.16.0.13:5848
# https_proxy=http://172.16.0.13:5848
# all_proxy=socks5://172.16.0.13:5848
# no_proxy=127.0.0.1,localhost

设置kaggle代理

kaggle config set -n proxy -v http://172.16.0.13:5848

# 然后就可以下载了
kaggle datasets download -d andrewmvd/hard-hat-detection

notebook执行异步代码

加入%autoawait asyncio即可