月度归档:2018年05月

Hadoop 备忘录

计算引擎层

Hadoop生态和其他生态最大的不同之一就是“单一平台多种应用”的理念了。传的数据库底层只有一个引擎,只处理关系型应用,所以是“单一平台单一应用”;而NoSQL市场有上百个NoSQL软件,每一个都针对不同的应用场景且完全独立,因此是“多平台多应用”的模式。而Hadoop在底层共用一份HDFS存储,上层有很多个组件分别服务多种应用场景,如:

  • 确定性数据分析:主要是简单的数据统计任务,例如OLAP,关注快速响应,实现组件有Impala等;
  • 探索性数据分析:主要是信息关联性发现任务,例如搜索,关注非结构化全量信息收集,实现组件有Search等;
  • 预测性数据分析:主要是机器学习类任务,例如逻辑回归等,关注计算模型的先进性和计算能力,实现组件有Spark、MapReduce等;
  • 数据处理及转化:主要是ETL类任务,例如数据管道等,关注IO吞吐率和可靠性,实现组件有MapReduce等

服务层

服务层是包装底层引擎的编程API细节,对业务人员提供更高抽象的访问模型,如Pig、Hive等。

而其中最炙手可热的就是OLAP的SQL市场了。现在,Spark有70%的访问量来自于SparkSQL!

SQL on Hadoop到底哪家强?Hive、Facebook的Pheonix、Presto、SparkSQL、Cloudera推的Impala、MapR推的Drill、IBM的BigSQL、还是Pivital开源的HAWQ?

 

 

 

 

 

Centos kvm+ceph

  • Centos kvm+ceph
  • 一. centos6.5 安装kvm
  • 1. disable selinux
  • 2. 确认支持intel虚拟化
  • 3. 安装需要的包
  • 4.设置桥接网络
  • 5.运行kvm instance(此步骤仅用于测试环境是否安装成功)
  • 6.连接到kvm
  • 二. centos安装ceph(firefly版本)
  • 准备机器
  • 管理机安装
  • 安装其他节点
  • 三. kvm使用ceph
  • 创建osd pool(块设备的容器)
  • 设置账号对该pool的读写权限
  • 用qemu-img在pool中创建img
  • 验证img创建成功
  • 用kvm创建一个虚拟机

Centos kvm+ceph

centos 7无法用公用源安装ceph, 包依赖不满足
centos 6可以安装,但内核不支持rbd,需要更新内核
rpm --importhttp://elrepo.org/RPM-GPG-KEY-elrepo.org

rpm -Uvhhttp://elrepo.org/elrepo-release-6-5.el6.elrepo.noarch.rpm

yum --enablerepo=elrepo-kernel install kernel-lt-y )

一. centos6.5 安装kvm

1. disable selinux

vi /etc/selinux/config
reboot

2. 确认支持intel虚拟化

egrep '(vmx|svm)' --color=always /proc/cpuinfo
空代表不支持

3. 安装需要的包

rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY*
yum install virt-manager libvirt qemu-kvm openssh-askpass kvm python-virtinst

service libvirtd start
chkconfig libvirtd on

emulator 一般是qemu-kvm,也可能是其他。
/usr/libexec/qemu-kvm -M ? 查看支持的host系统类型 (我碰到的情况,启动虚拟机时提示host类型rhel6.5不支持,virsh edit ...将host中的rhel6.5改为pc通过)
/usr/libexec/qemu-kvm -drive format=? 查看支持的设备类型 (必须支持rbd,如果没有显示,需要安装支持的版本。源码安装git clone git://git.qemu.org/qemu.git;./configure --enable-rbd。 可能未必产生qemu-kvm,也许是qemu-system-x86_64之类的,那就需要在配置文件里将emulator换成编译好的可执行文件)

4.设置桥接网络

yum install bridge-utils
vi /etc/sysconfig/network-scripts/ifcfg-br0

DEVICE="br0"
NM_CONTROLLED="no"
ONBOOT=yes
TYPE=Bridge
BOOTPROTO=none
IPADDR=192.168.0.100
PREFIX=24
GATEWAY=192.168.0.1
DNS1=8.8.8.8
DNS2=8.8.4.4
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System br0"

vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE="eth0"
NM_CONTROLLED="no"
ONBOOT=yes
TYPE="Ethernet"
UUID="73cb0b12-1f42-49b0-ad69-731e888276ff"
HWADDR=00:1E:90:F3:F0:02
DEFROUTE=yes
IPV4_FAILURE_FATAL=yes
IPV6INIT=no
NAME="System eth0"
BRIDGE=br0

/etc/init.d/network restart

5.运行kvm instance(此步骤仅用于测试环境是否安装成功)

virt-install --connect qemu:///system -n vm10 -r 512 --vcpus=2 --disk path=/var/lib/libvirt/images/vm10.img,size=12 -c /dev/cdrom --vnc --noautoconsole --os-type linux --os-variant debiansqueeze --accelerate --network=bridge:br0 --hvm

6.连接到kvm

如果需要安装gui
yum -y groupinstall "Desktop" "Desktop Platform" "X Window System" "Fonts"

二. centos安装ceph(firefly版本)

准备机器

一台安装管理机:admin
一台monitor:node1
两台数据机osd: node2, node3
一台使用ceph rbd的客户端: ceph-client

所有机器上都创建ceph用户,给予sudo权限,所以后续命令以ceph用户执行。
所有机器关闭 selinux
所有机器关闭Defaults requiretty(sudo visudo)
所有机器检查iptables等防火墙设置,节点间通讯使用22,6789,6800等端口,防止被拒绝

管理机安装

添加repo
sudo vim /etc/yum.repos.d/ceph.repo

[ceph-noarch]
name=Ceph noarch packages
baseurl=http://ceph.com/rpm-{ceph-release}/{distro}/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://ceph.com/git/?p=ceph.git;a=blob_plain;f=keys/release.asc

我将baseurl替换为baseurl=http://ceph.com/rpm-firefly/el6/noarch

执行sudo yum update && sudo yum install ceph-deploy

使得admin用ssh key登陆其他机器
ssh-keygen
ssh-copy-id ceph@node1
ssh-copy-id ceph@node2
ssh-copy-id ceph@node3

安装其他节点

admin机器上
mkdir my-cluster
cd my-cluster

初始化配置:
ceph-deploy new node1
此时会在当前目录下创建配置文件ceph.conf,编辑ceph.conf,
添加osd pool default size = 2, 这是因为我们只有2个osd
添加rbd default format = 2, 将默认的rbd image格式设置为2,支持image的clone功能
添加journal dio = false,

在所有节点上安装ceph
ceph-deploy install admin node1 node2 node3

初始化监控节点
ceph-deploy mon create-initial node1

初始化osd节点
ssh node2
sudo mkdir /var/local/osd0
exit
ssh node3
sudo mkdir /var/local/osd1
exit

ceph-deploy osd prepare node2:/var/local/osd0 node3:/var/local/osd1
ceph-deploy osd activate node2:/var/local/osd0 node3:/var/local/osd1
(此处两个命令应该在数秒内结束,若长时间不响应直至300秒超时,考虑是否有防火墙因素。)

将配置拷贝到各个节点
ceph-deploy admin admin-node node1 node2 node3

sudo chmod +r /etc/ceph/ceph.client.admin.keyring
ceph health
ceph status
希望得到active_clean状态

三. kvm使用ceph

创建osd pool(块设备的容器)

ceph osd pool create libvirt-pool 128 128

设置账号对该pool的读写权限

假设我们使用的账号是libvirt(如果使用admin账号默认拥有所有权限,无需设置)
ceph auth get-or-create client.libvirt mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=libvirt-pool'

用qemu-img在pool中创建img

qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 10G
这一步我遇到了Unknown file format 'rbd', 这是因为低版本的qemu-img不支持rbd.
但是我的qemu-img版本号已经够了,猜测是我的包在编译时没有加rbd的选项。无奈强制安装了版本号更低的 http://ceph.com/packages/qemu-kvm/centos/x86_64/qemu-img-0.12.1.2-2.355.el6.2.cuttlefish.x86_64.rpm 才支持)

验证img创建成功

rbd -p libvirt-pool ls

用kvm创建一个虚拟机

1. 用virsh命令或virt-manager创建一个虚拟机
需要一个iso或img 在/var/lib/libvirt/images/下
我将它命名为test。 cd rom中选择一个iso,比如debian.iso.不选硬盘。

2. virsh edit test

将该vm的配置修改为使用rbd存储
找到
<devices>

在其后添加如下:
<disk type='network' device='disk'>
<source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
<host name='{monitor-host}' port='6789'/>
</source>
<target dev='vda' bus='virtio'/>
</disk>

3. 创建访问ceph的账号
cat > secret.xml <<EOF
<secret ephemeral='no' private='no'>
<usage type='ceph'>
<name>client.libvirt secret</name>
</usage>
</secret>
EOF

sudo virsh secret-define --file secret.xml
<uuid of secret is output here>
保存用户libvirt的key
ceph auth get-key client.libvirt | sudo tee client.libvirt.key
保存产生的uuid

sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml

virsh edit test

add auth
...
</source>
<auth username='libvirt'>
<secret type='ceph' uuid='9ec59067-fdbc-a6c0-03ff-df165c0587b8'/>
</auth>

4. 开启虚拟机安装os
同时 virsh edit test
将配置的boot方式从cdrom改为hd

5.安装结束重启虚拟机,会boot from vda,也就是rbd

6. 下一步可以使用rbd snap, rdb clone, virsh, guestfs制作和使用虚拟机模板

 

virt-install

virt-install(1) - Linux man page

Name

virt-install - provision new virtual machines

Synopsis

virt-install [ OPTION ]...

Description

virt-install is a command line tool for creating new KVM , Xen, or Linux container guests using the "libvirt" hypervisor management library. See the EXAMPLES section at the end of this document to quickly get started.

virt-install tool supports both text based & graphical installations, using VNC or SDL graphics, or a text serial console. The guest can be configured to use one or more virtual disks, network interfaces, audio devices, physical USB or PCI devices, among others.

The installation media can be held locally or remotely on NFS , HTTP , FTP servers. In the latter case "virt-install" will fetch the minimal files necessary to kick off the installation process, allowing the guest to fetch the rest of the OS distribution as needed. PXE booting, and importing an existing disk image (thus skipping the install phase) are also supported.

Given suitable command line arguments, "virt-install" is capable of running completely unattended, with the guest 'kickstarting' itself too. This allows for easy automation of guest installs. An interactive mode is also available with the --prompt option, but this will only ask for the minimum required options.

Options

Most options are not required. Minimum requirements are --name, --ram, guest storage (--disk, --filesystem or --nodisks), and an install option.

-h, --help

Show the help message and exit

--connect=CONNECT

Connect to a non-default hypervisor. The default connection is chosen based on the following rules:

xenIf running on a host with the Xen kernel (checks against /proc/xen)

qemu:///system

If running on a bare metal kernel as root (needed for KVM installs)

qemu:///session

If running on a bare metal kernel as non-rootIt is only necessary to provide the "--connect" argument if this default prioritization is incorrect, eg if wanting to use QEMU while on a Xen kernel.

General Options

General configuration parameters that apply to all types of guest installs.

-n NAME , --name=NAME

Name of the new guest virtual machine instance. This must be unique amongst all guests known to the hypervisor on the connection, including those not currently active. To re-define an existing guest, use the virsh(1) tool to shut it down ('virsh shutdown') & delete ('virsh undefine') it prior to running "virt-install".

-r MEMORY , --ram=MEMORY

Memory to allocate for guest instance in megabytes. If the hypervisor does not have enough free memory, it is usual for it to automatically take memory away from the host operating system to satisfy this allocation.

--arch=ARCH

Request a non-native CPU architecture for the guest virtual machine. If omitted, the host CPU architecture will be used in the guest.

--machine=MACHINE

The machine type to emulate. This will typically not need to be specified for Xen or KVM , but is useful for choosing machine types of more exotic architectures.

-u UUID , --uuid=UUID

UUID for the guest; if none is given a random UUID will be generated. If you specify UUID , you should use a 32-digit hexadecimal number. UUID are intended to be unique across the entire data center, and indeed world. Bear this in mind if manually specifying a UUID

--vcpus=VCPUS[,maxvcpus=MAX][,sockets=#][,cores=#][,threads=#]

Number of virtual cpus to configure for the guest. If 'maxvcpus' is specified, the guest will be able to hotplug up to MAX vcpus while the guest is running, but will startup with VCPUS .CPU topology can additionally be specified with sockets, cores, and threads. If values are omitted, the rest will be autofilled prefering sockets over cores over threads.

--cpuset=CPUSET

Set which physical cpus the guest can use. "CPUSET" is a comma separated list of numbers, which can also be specified in ranges or cpus to exclude. Example:

0,2,3,5     : Use processors 0,2,3 and 5

1-5,^3,8    : Use processors 1,2,4,5 and 8

If the value 'auto' is passed, virt-install attempts to automatically determine an optimal cpu pinning using NUMA data, if available.

--numatune=NODESET,[mode=MODE]

Tune NUMA policy for the domain process. Example invocations

--numatune 1,2,3,4-7

--numatune \"1-3,5\",mode=preferred

Specifies the numa nodes to allocate memory from. This has the same syntax as "--cpuset" option. mode can be one of 'interleave', 'preferred', or 'strict' (the default). See 'man 8 numactl' for information about each mode.The nodeset string must use escaped-quotes if specifying any other option.

--cpu MODEL[,+feature][,-feature][,match=MATCH][,vendor=VENDOR]

Configure the CPU model and CPU features exposed to the guest. The only required value is MODEL , which is a valid CPU model as listed in libvirt's cpu_map.xml file.Specific CPU features can be specified in a number of ways: using one of libvirt's feature policy values force, require, optional, disable, or forbid, or with the shorthand '+feature' and '-feature', which equal 'force=feature' and 'disable=feature' respectivelySome examples:

--cpu core2duo,+x2apic,disable=vmx

Expose the core2duo CPU model, force enable x2apic, but do not expose vmx

--cpu host

Expose the host CPUs configuration to the guest. This enables the guest to take advantage of many of the host CPUs features (better performance), but may cause issues if migrating the guest to a host without an identical CPU .

--description

Human readable text description of the virtual machine. This will be stored in the guests XML configuration for access by other applications.

--security type=TYPE[,label=LABEL][,relabel=yes|no]

Configure domain security driver settings. Type can be either 'static' or 'dynamic'. 'static' configuration requires a security LABEL . Specifying LABEL without TYPE implies static configuration. To have libvirt automatically apply your static label, you must specify relabel=yes.

Installation Method options

-c CDROM , --cdrom=CDROM

File or device use as a virtual CD-ROM device for fully virtualized guests. It can be path to an ISO image, or to a CDROM device. It can also be a URL from which to fetch/access a minimal boot ISO image. The URLs take the same format as described for the "--location" argument. If a cdrom has been specified via the "--disk" option, and neither "--cdrom" nor any other install option is specified, the "--disk" cdrom is used as the install media.

-l LOCATION , --location=LOCATION

Distribution tree installtion source. virt-install can recognize certain distribution trees and fetches a bootable kernel/initrd pair to launch the install.With libvirt 0.9.4 or later, network URL installs work for remote connections. virt-install will download kernel/initrd to the local machine, and then upload the media to the remote host. This option requires the URL to be accessible by both the local and remote host.The "LOCATION" can take one of the following forms:

DIRECTORY

Path to a local directory containing an installable distribution image

nfs:host:/path or nfs://host/path

An NFS server location containing an installable distribution image

http://host/path

An HTTP server location containing an installable distribution image

ftp://host/path

An FTP server location containing an installable distribution image

Some distro specific url samples:

Fedora/Red Hat Based

http://download.fedoraproject.org/pub/fedora/linux/releases/10/Fedora/i386/os/

Debian/Ubuntu

http://ftp.us.debian.org/debian/dists/etch/main/installer-amd64/

Suse

http://download.opensuse.org/distribution/11.0/repo/oss/

Mandriva

ftp://ftp.uwsg.indiana.edu/linux/mandrake/official/2009.0/i586/

--pxe

Use the PXE boot protocol to load the initial ramdisk and kernel for starting the guest installation process.

--import

Skip the OS installation process, and build a guest around an existing disk image. The device used for booting is the first device specified via "--disk" or "--filesystem".

--init=INITPATH

Path to a binary that the container guest will init. If a root "--filesystem" is has been specified, virt-install will default to /sbin/init, otherwise will default to /bin/sh.

--livecd

Specify that the installation media is a live CD and thus the guest needs to be configured to boot off the CDROM device permanently. It may be desirable to also use the "--nodisks" flag in combination.

-x EXTRA , --extra-args=EXTRA

Additional kernel command line arguments to pass to the installer when performing a guest install from "--location". One common usage is specifying an anaconda kickstart file for automated installs, such as --extra-args "ks=http://myserver/my.ks"

--initrd-inject=PATH

Add PATH to the root of the initrd fetched with "--location". This can be used to run an automated install without requiring a network hosted kickstart file:--initrd-inject=/path/to/my.ks --extra-args "ks=file:/my.ks"

--os-type=OS_TYPE

Optimize the guest configuration for a type of operating system (ex. 'linux', 'windows'). This will attempt to pick the most suitable ACPI & APIC settings, optimally supported mouse drivers, virtio, and generally accommodate other operating system quirks.By default, virt-install will attempt to auto detect this value from the install media (currently only supported for URL installs). Autodetection can be disabled with the special value 'none'See "--os-variant" for valid options.

--os-variant=OS_VARIANT

Further optimize the guest configuration for a specific operating system variant (ex. 'fedora8', 'winxp'). This parameter is optional, and does not require an "--os-type" to be specified.By default, virt-install will attempt to auto detect this value from the install media (currently only supported for URL installs). Autodetection can be disabled with the special value 'none'.If the special value 'list' is passed, virt-install will print the full list of variant values and exit. The printed format is not a stable interface, DO NOT PARSE IT .If the special value 'none' is passed, no os variant is recorded and OS autodetection is disabled.Values for some recent OS options are:

win7 : Microsoft Windows 7

vista : Microsoft Windows Vista

winxp64 : Microsoft Windows XP (x86_64)

winxp : Microsoft Windows XP

win2k8 : Microsoft Windows Server 2008

win2k3 : Microsoft Windows Server 2003

freebsd8 : FreeBSD 8.x

generic : Generic

debiansqueeze : Debian Squeeze

debianlenny : Debian Lenny

fedora16 : Fedora 16

fedora15 : Fedora 15

fedora14 : Fedora 14

mes5.1 : Mandriva Enterprise Server 5.1 and later

mandriva2010 : Mandriva Linux 2010 and later

rhel6 : Red Hat Enterprise Linux 6

rhel5.4 : Red Hat Enterprise Linux 5.4 or later

rhel4 : Red Hat Enterprise Linux 4

sles11 : Suse Linux Enterprise Server 11

sles10 : Suse Linux Enterprise Server

ubuntuoneiric : Ubuntu 11.10 (Oneiric Ocelot)

ubuntunatty : Ubuntu 11.04 (Natty Narwhal)

ubuntumaverick : Ubuntu 10.10 (Maverick Meerkat)

ubuntulucid : Ubuntu 10.04 (Lucid Lynx)

ubuntuhardy : Ubuntu 8.04 LTS (Hardy Heron)

Use '--os-variant list' to see the full OS list

--boot=BOOTOPTS

Optionally specify the post-install VM boot configuration. This option allows specifying a boot device order, permanently booting off kernel/initrd with option kernel arguments, and enabling a BIOS boot menu (requires libvirt 0.8.3 or later)--boot can be specified in addition to other install options (such as --location, --cdrom, etc.) or can be specified on it's own. In the latter case, behavior is similar to the --import install option: there is no 'install' phase, the guest is just created and launched as specified.Some examples:

--boot cdrom,fd,hd,network,menu=on

Set the boot device priority as first cdrom, first floppy, first harddisk, network PXE boot. Additionally enable BIOS boot menu prompt.

--boot kernel=KERNEL,initrd=INITRD,kernel_args="console=/dev/ttyS0"

Have guest permanently boot off a local kernel/initrd pair, with the specified kernel options.

Storage Configuration

--disk=DISKOPTS

Specifies media to use as storage for the guest, with various options. The general format of a disk string is

--disk opt1=val1,opt2=val2,...

To specify media, the command can either be:

--disk /some/storage/path,opt1=val1

or explicitly specify one of the following arguments:

path

A path to some storage media to use, existing or not. Existing media can be a file or block device. If installing on a remote host, the existing media must be shared as a libvirt storage volume.Specifying a non-existent path implies attempting to create the new storage, and will require specifyng a 'size' value. If the base directory of the path is a libvirt storage pool on the host, the new storage will be created as a libvirt storage volume. For remote hosts, the base directory is required to be a storage pool if using this method.

pool

An existing libvirt storage pool name to create new storage on. Requires specifying a 'size' value.

volAn existing libvirt storage volume to use. This is specified as 'poolname/volname'.

Other available options:

device

Disk device type. Value can be 'cdrom', 'disk', or 'floppy'. Default is 'disk'. If a 'cdrom' is specified, and no install method is chosen, the cdrom is used as the install media.

busDisk bus type. Value can be 'ide', 'scsi', 'usb', 'virtio' or 'xen'. The default is hypervisor dependent since not all hypervisors support all bus types.

perms

Disk permissions. Value can be 'rw' (Read/Write), 'ro' (Readonly), or 'sh' (Shared Read/Write). Default is 'rw'

size

size (in GB ) to use if creating new storage

sparse

whether to skip fully allocating newly created storage. Value is 'true' or 'false'. Default is 'true' (do not fully allocate).The initial time taken to fully-allocate the guest virtual disk (sparse=false) will be usually by balanced by faster install times inside the guest. Thus use of this option is recommended to ensure consistently high performance and to avoid I/O errors in the guest should the host filesystem fill up.

cache

The cache mode to be used. The host pagecache provides cache memory. The cache value can be 'none', 'writethrough', or 'writeback'. 'writethrough' provides read caching. 'writeback' provides read and write caching.

format

Image format to be used if creating managed storage. For file volumes, this can be 'raw', 'qcow2', 'vmdk', etc. See format types in <http://libvirt.org/storage.html> for possible values. This is often mapped to the driver_type value as well.With libvirt 0.8.3 and later, this option should be specified if reusing and existing disk image, since libvirt does not autodetect storage format as it is a potential security issue. For example, if reusing an existing qcow2 image, you will want to specify format=qcow2, otherwise the hypervisor may not be able to read your disk image.

driver_name

Driver name the hypervisor should use when accessing the specified storage. Typically does not need to be set by the user.

driver_type

Driver format/type the hypervisor should use when accessing the specified storage. Typically does not need to be set by the user.

ioDisk IO backend. Can be either "threads" or "native".

error_policy

How guest should react if a write error is encountered. Can be one of "stop", "none", or "enospace"

serial

Serial number of the emulated disk device. This is used in linux guests to set /dev/disk/by-id symlinks. An example serial number might be: WD-WMAP9A966149

See the examples section for some uses. This option deprecates "--file", "--file-size", and "--nonsparse".

--filesystem

Specifies a directory on the host to export to the guest. The most simple invocation is:

--filesystem /source/on/host,/target/point/in/guest

Which will work for recent QEMU and linux guest OS or LXC containers. For QEMU , the target point is just a mounting hint in sysfs, so will not be automatically mounted.The following explicit options can be specified:

type

The type or the source directory. Valid values are 'mount' (the default) or 'template' for OpenVZ templates.

mode

The access mode for the source directory from the guest OS . Only used with QEMU and type=mount. Valid modes are 'passthrough' (the default), 'mapped', or 'squash'. See libvirt domain XML documentation for more info.

source

The directory on the host to share.

target

The mount location to use in the guest.

--nodisks

Request a virtual machine without any local disk storage, typically used for running 'Live CD ' images or installing to network storage (iSCSI or NFS root).

-f DISKFILE , --file=DISKFILE

This option is deprecated in favor of "--disk path=DISKFILE".

-s DISKSIZE , --file-size=DISKSIZE

This option is deprecated in favor of "--disk ...,size=DISKSIZE,..."

--nonsparse

This option is deprecated in favor of "--disk ...,sparse=false,..."

Networking Configuration

-w NETWORK , --network=NETWORK,opt1=val1,opt2=val2

Connect the guest to the host network. The value for "NETWORK" can take one of 3 formats:

bridge=BRIDGE

Connect to a bridge device in the host called "BRIDGE". Use this option if the host has static networking config & the guest requires full outbound and inbound connectivity to/from the LAN . Also use this if live migration will be used with this guest.

network=NAME

Connect to a virtual network in the host called "NAME". Virtual networks can be listed, created, deleted using the "virsh" command line tool. In an unmodified install of "libvirt" there is usually a virtual network with a name of "default". Use a virtual network if the host has dynamic networking (eg NetworkManager), or using wireless. The guest will be NATed to the LAN by whichever connection is active.

user

Connect to the LAN using SLIRP . Only use this if running a QEMU guest as an unprivileged user. This provides a very limited form of NAT .

If this option is omitted a single NIC will be created in the guest. If there is a bridge device in the host with a physical interface enslaved, that will be used for connectivity. Failing that, the virtual network called "default" will be used. This option can be specified multiple times to setup more than one NIC .Other available options are:

model

Network device model as seen by the guest. Value can be any nic model supported by the hypervisor, e.g.: 'e1000', 'rtl8139', 'virtio', ...

mac Fixed MAC address for the guest; If this parameter is omitted, or the value "RANDOM" is specified a suitable address will be randomly generated. For Xen virtual machines it is required that the first 3 pairs in the MAC address be the sequence '00:16:3e', while for QEMU or KVM virtual machines it must be '52:54:00'.

--nonetworks

Request a virtual machine without any network interfaces.

-b BRIDGE , --bridge=BRIDGE

This parameter is deprecated in favour of "--network bridge=bridge_name".

-m MAC , --mac=MAC

This parameter is deprecated in favour of "--network NETWORK,mac=12:34..."

Graphics Configuration

If no graphics option is specified, "virt-install" will default to '--graphics vnc' if the DISPLAY environment variable is set, otherwise '--graphics none' is used.

--graphics TYPE ,opt1=arg1,opt2=arg2,...

Specifies the graphical display configuration. This does not configure any virtual hardware, just how the guest's graphical display can be accessed. Typically the user does not need to specify this option, virt-install will try and choose a useful default, and launch a suitable connection.General format of a graphical string is

--graphics TYPE,opt1=arg1,opt2=arg2,...

For example:

--graphics vnc,password=foobar

The supported options are:

type

The display type. This is one of:vnc

Setup a virtual console in the guest and export it as a VNC server in the host. Unless the "port" parameter is also provided, the VNC server will run on the first free port number at 5900 or above. The actual VNC display allocated can be obtained using the "vncdisplay" command to "virsh" (or virt-viewer(1) can be used which handles this detail for the use).

sdl

Setup a virtual console in the guest and display an SDL window in the host to render the output. If the SDL window is closed the guest may be unconditionally terminated.

spice

Export the guest's console using the Spice protocol. Spice allows advanced features like audio and USB device streaming, as well as improved graphical performance.

Using spice graphic type will work as if those arguments were given:

--video qxl --channel spicevmc

noneNo graphical console will be allocated for the guest. Fully virtualized guests (Xen FV or QEmu/KVM) will need to have a text console configured on the first serial port in the guest (this can be done via the --extra-args option). Xen PV will set this up automatically. The command 'virsh console NAME ' can be used to connect to the serial device.

port

Request a permanent, statically assigned port number for the guest console. This is used by 'vnc' and 'spice'

tlsport

Specify the spice tlsport.

listen

Address to listen on for VNC/Spice connections. Default is typically 127.0.0.1 (localhost only), but some hypervisors allow changing this globally (for example, the qemu driver default can be changed in /etc/libvirt/qemu.conf). Use 0.0.0.0 to allow access from other machines. This is use by 'vnc' and 'spice'

keymap

Request that the virtual VNC console be configured to run with a specific keyboard layout. If the special value 'local' is specified, virt-install will attempt to configure to use the same keymap as the local system. A value of 'none' specifically defers to the hypervisor. Default behavior is hypervisor specific, but typically is the same as 'local'. This is used by 'vnc'

password

Request a VNC password, required at connection time. Beware, this info may end up in virt-install log files, so don't use an important password. This is used by 'vnc' and 'spice'

passwordvalidto

Set an expiration date for password. After the date/time has passed, all new graphical connections are denyed until a new password is set. This is used by 'vnc' and 'spice'The format for this value is YYYY-MM-DDTHH:MM:SS , for example 2011-04-01T14:30:15

--vnc

This option is deprecated in favor of "--graphics vnc,..."

--vncport=VNCPORT

This option is deprecated in favor of "--graphics vnc,port=PORT,..."

--vnclisten=VNCLISTEN

This option is deprecated in favor of "--graphics vnc,listen=LISTEN,..."

-k KEYMAP , --keymap=KEYMAP

This option is deprecated in favor of "--graphics vnc,keymap=KEYMAP,..."

--sdl

This option is deprecated in favor of "--graphics sdl,..."

--nographics

This option is deprecated in favor of "--graphics none"

--noautoconsole

Don't automatically try to connect to the guest console. The default behaviour is to launch a VNC client to display the graphical console, or to run the "virsh" "console" command to display the text console. Use of this parameter will disable this behaviour.

Virtualization Type options

Options to override the default virtualization type choices.

-v, --hvm

Request the use of full virtualization, if both para & full virtualization are available on the host. This parameter may not be available if connecting to a Xen hypervisor on a machine without hardware virtualization support. This parameter is implied if connecting to a QEMU based hypervisor.

-p, --paravirt

This guest should be a paravirtualized guest. If the host supports both para & full virtualization, and neither this parameter nor the "--hvm" are specified, this will be assumed.

--container

This guest should be a container type guest. This option is only required if the hypervisor supports other guest types as well (so for example this option is the default behavior for LXC and OpenVZ, but is provided for completeness).

--virt-type

The hypervisor to install on. Example choices are kvm, qemu, xen, or kqemu. Availabile options are listed via 'virsh capabilities' in the <domain> tags.

--accelerate

Prefer KVM or KQEMU (in that order) if installing a QEMU guest. This behavior is now the default, and this option is deprecated. To install a plain QEMU guest, use '--virt-type qemu'

--noapic

Force disable APIC for the guest.

--noacpi

Force disable ACPI for the guest.

Device Options

--host-device=HOSTDEV

Attach a physical host device to the guest. Some example values for HOSTDEV:

--host-device pci_0000_00_1b_0

A node device name via libvirt, as shown by 'virsh nodedev-list'

--host-device 001.003

USB by bus, device (via lsusb).

--host-device 0x1234:0x5678

USB by vendor, product (via lsusb).

--host-device 1f.01.02

PCI device (via lspci).

--soundhw MODEL

Attach a virtual audio device to the guest. MODEL specifies the emulated sound card model. Possible values are ich6, ac97, es1370, sb16, pcspk, or default. 'default' will be AC97 if the hypervisor supports it, otherwise it will be ES1370 .This deprecates the old boolean --sound model (which still works the same as a single '--soundhw default')

--watchdog MODEL[,action=ACTION]

Attach a virtual hardware watchdog device to the guest. This requires a daemon and device driver in the guest. The watchdog fires a signal when the virtual machine appears to hung. ACTION specifies what libvirt will do when the watchdog fires. Values are

reset

Forcefully reset the guest (the default)

poweroff

Forcefully power off the guest

pause

Pause the guest

none

Do nothing

shutdown

Gracefully shutdown the guest (not recommended, since a hung guest probably won't respond to a graceful shutdown)

MODEL is the emulated device model: either i6300esb (the default) or ib700. Some examples:Use the recommended settings:--watchdog defaultUse the i6300esb with the 'poweroff' action--watchdog i6300esb,action=poweroff

--parallel=CHAROPTS

--serial=CHAROPTS

Specifies a serial device to attach to the guest, with various options. The general format of a serial string is

--serial type,opt1=val1,opt2=val2,...

--serial and --parallel devices share all the same options, unless otherwise noted. Some of the types of character device redirection are:

--serial pty

Pseudo TTY . The allocated pty will be listed in the running guests XML description.

--serial dev,path=HOSTPATH

Host device. For serial devices, this could be /dev/ttyS0. For parallel devices, this could be /dev/parport0.

--serial file,path=FILENAME

Write output to FILENAME .

--serial pipe,path=PIPEPATH

Named pipe (see pipe(7))

--serial tcp,host=HOST:PORT,mode=MODE,protocol=PROTOCOL

TCP net console. MODE is either 'bind' (wait for connections on HOST:PORT ) or 'connect' (send output to HOST:PORT ), default is 'connect'. HOST defaults to '127.0.0.1', but PORT is required. PROTOCOL can be either 'raw' or 'telnet' (default 'raw'). If 'telnet', the port acts like a telnet server or client. Some examples:Connect to localhost, port 1234:

--serial tcp,host=:1234

Wait for connections on any address, port 4567:

--serial tcp,host=0.0.0.0:4567,mode=bind

Wait for telnet connection on localhost, port 2222. The user could then connect interactively to this console via 'telnet localhost 2222':

--serial tcp,host=:2222,mode=bind,protocol=telnet

--serial udp,host=CONNECT_HOST:PORT,bind_host=BIND_HOST:BIND_PORT

UDP net console. HOST:PORT is the destination to send output to (default HOST is '127.0.0.1', PORT is required). BIND_HOST:BIND_PORT is the optional local address to bind to (default BIND_HOST is 127.0.0.1, but is only set if BIND_PORT is specified). Some examples:Send output to default syslog port (may need to edit /etc/rsyslog.conf accordingly):

--serial udp,host=:514

Send output to remote host 192.168.10.20, port 4444 (this output can be read on the remote host using 'nc -u -l 4444'):

--serial udp,host=192.168.10.20:4444

--serial unix,path=UNIXPATH,mode=MODE

Unix socket, see unix(7). MODE has similar behavior and defaults as --serial tcp,mode=MODE

--channel

Specifies a communication channel device to connect the guest and host machine. This option uses the same options as --serial and --parallel for specifying the host/source end of the channel. Extra 'target' options are used to specify how the guest machine sees the channel.Some of the types of character device redirection are:

--channel SOURCE ,target_type=guestfwd,target_address=HOST:PORT

Communication channel using QEMU usermode networking stack. The guest can connect to the channel using the specified HOST:PORT combination.

--channel SOURCE ,target_type=virtio[,name=NAME]

Communication channel using virtio serial (requires 2.6.34 or later host and guest). Each instance of a virtio --channel line is exposed in the guest as /dev/vport0p1, /dev/vport0p2, etc. NAME is optional metadata, and can be any string, such as org.linux-kvm.virtioport1. If specified, this will be exposed in the guest at /sys/class/virtio-ports/vport0p1/NAME

--channel spicevmc,target_type=virtio[,name=NAME]

Communication channel for QEMU spice agent, using virtio serial (requires 2.6.34 or later host and guest). NAME is optional metadata, and can be any string, such as the default com.redhat.spice.0 that specifies how the guest will see the channel.

--console

Connect a text console between the guest and host. Certain guest and hypervisor combinations can automatically set up a getty in the guest, so an out of the box text login can be provided (target_type=xen for xen paravirt guests, and possibly target_type=virtio in the future).Example:

--console pty,target_type=virtio

Connect a virtio console to the guest, redirected to a PTY on the host. For supported guests, this exposes /dev/hvc0 in the guest. See http://fedoraproject.org/wiki/Features/VirtioSerial for more info. virtio console requires libvirt 0.8.3 or later.

--video=VIDEO

Specify what video device model will be attached to the guest. Valid values for VIDEO are hypervisor specific, but some options for recent kvm are cirrus, vga, qxl, or vmvga (vmware).

--smartcard=MODE[,OPTS]

Configure a virtual smartcard device.Mode is one of host, host-certificates, or passthrough. Additional options are:

type

Character device type to connect to on the host. This is only applicable for passthrough mode.

An example invocation:

--smartcard passthrough,type=spicevmc

Use the smartcard channel of a SPICE graphics device to pass smartcard info to the guest

See "http://libvirt.org/formatdomain.html#elementsSmartcard" for complete details.

Miscellaneous Options

--autostart

Set the autostart flag for a domain. This causes the domain to be started on host boot up.

--print-xml

If the requested guest has no install phase (--import, --boot), print the generated XML instead of defining the guest. By default this WILL do storage creation (can be disabled with --dry-run).If the guest has an install phase, you will need to use --print-step to specify exactly what XML output you want. This option implies --quiet.

--print-step

Acts similarly to --print-xml, except requires specifying which install step to print XML for. Possible values are 1, 2, 3, or all. Stage 1 is typically booting from the install media, and stage 2 is typically the final guest config booting off hardisk. Stage 3 is only relevant for windows installs, which by default have a second install stage. This option implies --quiet.

--noreboot

Prevent the domain from automatically rebooting after the install has completed.

--wait=WAIT

Amount of time to wait (in minutes) for a VM to complete its install. Without this option, virt-install will wait for the console to close (not neccessarily indicating the guest has shutdown), or in the case of --noautoconsole, simply kick off the install and exit. Any negative value will make virt-install wait indefinitely, a value of 0 triggers the same results as noautoconsole. If the time limit is exceeded, virt-install simply exits, leaving the virtual machine in its current state.

--force

Prevent interactive prompts. If the intended prompt was a yes/no prompt, always say yes. For any other prompts, the application will exit.

--dry-run

Proceed through the guest creation process, but do NOT create storage devices, change host device configuration, or actually teach libvirt about the guest. virt-install may still fetch install media, since this is required to properly detect the OS to install.

--prompt

Specifically enable prompting for required information. Default prompting is off (as of virtinst 0.400.0)

--check-cpu

Check that the number virtual cpus requested does not exceed physical CPUs and warn if they do.

-q, --quiet

Only print fatal error messages.

-d, --debug

Print debugging information to the terminal when running the install process. The debugging information is also stored in "$HOME/.virtinst/virt-install.log" even if this parameter is omitted.

Examples

Install a Fedora 13 KVM guest with virtio accelerated disk/network, creating a new 8GB storage file, installing from media in the hosts CDROM drive, auto launching a graphical VNC viewer

# virt-install \

--connect qemu:///system \

--virt-type kvm \

--name demo \

--ram 500 \

--disk path=/var/lib/libvirt/images/demo.img,size=8 \

--graphics vnc \

--cdrom /dev/cdrom \

--os-variant fedora13

Install a Fedora 9 plain QEMU guest, using LVM partition, virtual networking, booting from PXE , using VNC server/viewer

# virt-install \

--connect qemu:///system \

--name demo \

--ram 500 \

--disk path=/dev/HostVG/DemoVM \

--network network=default \

--virt-type qemu

--graphics vnc \

--os-variant fedora9

Install a guest with a real partition, with the default QEMU hypervisor for a different architecture using SDL graphics, using a remote kernel and initrd pair:

# virt-install \

--connect qemu:///system \

--name demo \

--ram 500 \

--disk path=/dev/hdc \

--network bridge=eth1 \

--arch ppc64 \

--graphics sdl \

--location http://download.fedora.redhat.com/pub/fedora/linux/core/6/x86_64/os/

Run a Live CD image under Xen fullyvirt, in diskless environment

# virt-install \

--hvm \

--name demo \

--ram 500 \

--nodisks \

--livecd \

--graphics vnc \

--cdrom /root/fedora7live.iso

Run /usr/bin/httpd in a linux container guest ( LXC ). Resource usage is capped at 512 MB of ram and 2 host cpus:

# virt-install \

--connect lxc:/// \

--name httpd_guest \

--ram 512 \

--vcpus 2 \

--init /usr/bin/httpd

Install a paravirtualized Xen guest, 500 MB of RAM , a 5 GB of disk, and Fedora Core 6 from a web server, in text-only mode, with old style --file options:

# virt-install \

--paravirt \

--name demo \

--ram 500 \

--file /var/lib/xen/images/demo.img \

--file-size 6 \

--graphics none \

--location http://download.fedora.redhat.com/pub/fedora/linux/core/6/x86_64/os/

Create a guest from an existing disk image 'mydisk.img' using defaults for the rest of the options.

# virt-install \

--name demo

--ram 512

--disk /home/user/VMs/mydisk.img

--import

Test a custom kernel/initrd using an existing disk image, manually specifying a serial device hooked to a PTY on the host machine.

# virt-install \

--name mykernel

--ram 512

--disk /home/user/VMs/mydisk.img

--boot kernel=/tmp/mykernel,initrd=/tmp/myinitrd,kernel_args="console=ttyS0"

--serial pty

Authors

Written by Daniel P. Berrange, Hugh Brock, Jeremy Katz, Cole Robinson and a team of many other contributors. See the AUTHORS file in the source distribution for the complete list of credits.

 

links:

https://linux.die.net/man/1/virt-install
https://libvirt.org/formatdomain.html#elementsDevices
https://www.linux-kvm.org/page/Processor_support

##################################

CPU LIST:

x86           qemu64  QEMU Virtual CPU version 1.5.3
x86           phenom  AMD Phenom(tm) 9550 Quad-Core Processor
x86         core2duo  Intel(R) Core(TM)2 Duo CPU     T7700  @ 2.40GHz
x86            kvm64  Common KVM processor
x86           qemu32  QEMU Virtual CPU version 1.5.3
x86            kvm32  Common 32-bit KVM processor
x86          coreduo  Genuine Intel(R) CPU           T2600  @ 2.16GHz
x86              486
x86          pentium
x86         pentium2
x86         pentium3
x86           athlon  QEMU Virtual CPU version 1.5.3
x86             n270  Intel(R) Atom(TM) CPU N270   @ 1.60GHz
x86      cpu64-rhel6  QEMU Virtual CPU version (cpu64-rhel6)
x86           Conroe  Intel Celeron_4x0 (Conroe/Merom Class Core 2)
x86           Penryn  Intel Core 2 Duo P9xxx (Penryn Class Core 2)
x86          Nehalem  Intel Core i7 9xx (Nehalem Class Core i7)
x86     Nehalem-IBRS  Intel Core i7 9xx (Nehalem Core i7, IBRS update)▒U
x86         Westmere  Westmere E56xx/L56xx/X56xx (Nehalem-C)
x86    Westmere-IBRS  Westmere E56xx/L56xx/X56xx (IBRS update)
x86      SandyBridge  Intel Xeon E312xx (Sandy Bridge)
x86 SandyBridge-IBRS  Intel Xeon E312xx (Sandy Bridge, IBRS update)
x86        IvyBridge  Intel Xeon E3-12xx v2 (Ivy Bridge)
x86   IvyBridge-IBRS  Intel Xeon E3-12xx v2 (Ivy Bridge, IBRS)
x86          Haswell  Intel Core Processor (Haswell)
x86     Haswell-IBRS  Intel Core Processor (Haswell, IBRS)
x86        Broadwell  Intel Core Processor (Broadwell)
x86   Broadwell-IBRS  Intel Core Processor (Broadwell, IBRS)
x86   Skylake-Client  Intel Core Processor (Skylake)
x86 Skylake-Client-IBRS  Intel Core Processor (Skylake, IBRS)
x86   Skylake-Server  Intel Xeon Processor (Skylake)
x86 Skylake-Server-IBRS  Intel Xeon Processor (Skylake, IBRS)
x86       Opteron_G1  AMD Opteron 240 (Gen 1 Class Opteron)
x86       Opteron_G2  AMD Opteron 22xx (Gen 2 Class Opteron)
x86       Opteron_G3  AMD Opteron 23xx (Gen 3 Class Opteron)
x86       Opteron_G4  AMD Opteron 62xx class CPU
x86       Opteron_G5  AMD Opteron 63xx class CPU
x86             EPYC  AMD EPYC Processor
x86        EPYC-IBPB  AMD EPYC Processor (with IBPB)
x86             host  KVM processor with all supported host features 
(only available in KVM mode)
Recognized CPUID flags:
pbe ia64 tm ht ss sse2 sse fxsr mmx acpi ds clflush pn pse36 pat cmov mca pge mtrr sep apic cx8 mce pae msr tsc pse de vme fpu
hypervisor rdrand f16c avx osxsave xsave aes tsc-deadline popcnt movbe x2apic sse4.2|sse4_2 sse4.1|sse4_1 dca pcid pdcm xtpr cx16 fma cid ssse3 tm2 est smx vmx ds_cpl monitor dtes64 pclmulqdq|pclmuldq pni|sse3
avx512vl avx512bw sha-ni avx512cd avx512er avx512pf clwb clflushopt pcommit avx512ifma smap adx rdseed avx512dq avx512f mpx rtm invpcid erms bmi2 smep avx2 hle bmi1 fsgsbase
avx512-vpopcntdq ospke pku avx512vbmi
ssbd arch-facilities stibp spec-ctrl avx512-4fmaps avx512-4vnniw
3dnow 3dnowext lm|i64 rdtscp pdpe1gb fxsr_opt|ffxsr mmxext nx|xd syscall
perfctr_nb perfctr_core topoext tbm nodeid_msr tce fma4 lwp wdt skinit xop ibs osvw 3dnowprefetch misalignsse sse4a abm cr8legacy extapic svm cmp_legacy lahf_lm
ibpb
pmm-en pmm phe-en phe ace2-en ace2 xcrypt-en xcrypt xstore-en xstore
kvm_pv_unhalt kvm_pv_eoi kvm_steal_time kvm_asyncpf kvmclock kvm_mmu kvm_nopiodelay kvmclock
pfthreshold pause_filter decodeassists flushbyasid vmcb_clean tsc_scale nrip_save svm_lock lbrv npt
xsaves xgetbv1 xsavec xsaveopt

########################################

device

'cdrom', 'disk', or 'floppy'.

target bus
"ide", "scsi", "virtio", "xen", "usb", "sata", or "sd"

rbd
<target dev="hda" bus="ide"/>
<target dev="sda" bus="sata"/>
<target dev="sda" bus="scsi"/>
<target dev='vda' bus='virtio'/>

hda
<target dev="sda" bus="ide"/>

lun
<target dev='sda' bus='scsi'/>

iSCSI
<target dev='vda' bus='virtio'/>

 

 

 

 

 

ceph 常用命令

1、ceph pool配置
1 创建pool
ceph osd pool create {pool-name} {pg-num} [{pgp-num}]

2 查看pool
ceph osd lspools

3 设置pool的指标pool池中最大存储对象数或最大存储字节数 有其一即可
ceph osd pool set-quota {pool-name} [max_objects{obj-count}] [max_bytes {bytes}]

4 重命名pool
ceph osd pool rename {current-pool-name} {new-pool-name}

5 查看pool的状态
rsdosdf

6 给pool制作快照
ceph osd pool mksnap {pool-name} {snap-name}

7 删除pool的快照
ceph osd pool rmsnap {pool-name} {snap-name}

8 删除一个pool
ceph osd pool delete {pool-name} [{pool-name} --yes-i-really-really-mean-it]

2、ceph pg配置
1 设置pool 中 pgs 的大小
ceph osd poolset {pool-name} pg_num

2 获得 pool 中 pgs 的大小
ceph osd pool get {pool-name} pg_num

3 设置 pool 中 pgs 组的大小
ceph osd poolset {pool-name} pgp_num

4 获得 pool 中 pgs 组的大小
ceph osd poolget {pool-name} pgp_num

5 查看集群中 pgs 的状态
ceph pg dump

6 查看指定pg 的 map
ceph pg map{pg-id}

7 查看指定pg 的状态
ceph pg {pg-id}query

8 清除一个 pg
ceph pg scrub{pg-id}

3、ceph镜像配置
1 查看更新Linux内核版本,

需相应Linux内核支持modproberbd 查看:uname –r

若不支持:
执行下述操作:

sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-2.el7.elrepo.noarch.rpm
sudo yum --enablerepo=elrepo-kernel install kernel-ml kernel-ml-devel

改为新内核:
sudo grub2-set-default 'CentOS Linux (4.4.0-1.el7.elrepo.x86_64) 7 (Core)'
grub2-editenv list 查看修改

最终更新设置内核:
sudo grub2-mkconfig -o /boot/grub2/grub.cfg

2 创建一个设备的镜像,size以M为单位
rbd create{image-name} --size {megabytes} –pool {pool-name}

如:
rbd createtestImage --size 512 --pool testPool

3 列出一个 pool 里面的块设备,即创建的镜像
rbd list{pool-name}

4 读出一个镜像里面的信息
rbd --image{image-name} -p {pool-name} info

5 重新修改镜像的大小
rbd resize--image {image-name} –p {pool-name} --size 1024

6 将块设备映射到本地
1 sudomodprobe rbd
2 sudo rbd map{image-name} --pool {pool-name}

7 显示映射的模块
rbd showmapped

8 从一个具体的 pool 中删除一个块设备
rbd rm {image-name} -p{pool-name}

9 取消块设备的映射
sudo rbdunmap /dev/rbd/{poolname}/{imagename}

4、ceph快照配置
1 创建快照

1 给镜像创建一个快照
rbd snap create {pool-name}/{image-name}@{snap-name}

2 列出一个镜像的所有快照
rbd snap ls {pool-name}/{image-name}

3 快照回滚
rbd snap rollback {pool-name}/{image-name}@{snap-name}

4 删除快照
rbd snap rm {pool-name}/{image-name}@{snap-name}

5 删除一个镜像的全部快照
rbd snap purge{pool-name}/{image-name}

2 分层快照

1 首先创建一个特殊格式的的镜像,格式要求:format 2
rbd create --image-format 2 {image-name}--size { megabytes } --pool {pool-name}

2 创建快照
rbd snap create {pool-name}/{image-name}@{snap-name}

3 克隆品访问父快照。父快照,所有克隆品都会损坏。为防止数据丢失,必须先保护、然后再克隆快照。

rbd snap protect {pool-name}/{image-name}@{snapshot-name}
$ rbd clone rbd/testImage2@test2Snap rbd/testImage3

4 克隆快照,将一个存储池中的镜像的快照克隆到另一存储池中,成为一个镜像
rbd clone {pool-name}/{parent-image}@{snap-name} {pool-name}/{child-image-name}

5 删除快照

1] 删除方式一:
删除所有快照 取消快照保护 删除快照

2] 删除方式二:
首先将克隆品压缩 解除保护 删除快照

6 取消快照保护
rbd snapunprotect {pool-name}/{image-name}@{snapshot-name}

7 列出一块镜像的子孙
rbd children {pool-name}/{image-name}@{snapshot-name}

8 克隆镜像压缩,进而消除父快照的引用
rbd flatten {pool-name}/{imag

5、ceph qemu配置
1 使用 qemu 创建镜像
qemu-img create-f raw rbd:{pool-name}/{image-name} {size}
例如: qemu-img create -f raw rbd:testPool/testImage0 1G

2 调整 qemu 镜像的大小
qemu-img resizerbd:{pool-name}/{image-name} {size}

3 查看一个 qemu 镜像属性信息
qemu-img inforbd:{pool-name}/{image-name}

4 转换 qemu 其他格式的镜像为块设备镜像
qemu-img convert-p -f 源镜像-O 目标镜像格式文件
rbd:poolname/imagename转换后的文件

5 运行镜像中的虚拟机
qemu -m 内存大小 -driveformat=raw,file=rbd:poolname/imagename

6 qemu 的带缓存控制

1 qemu -m1024 -drive format=rbd,file=rbd:poolname/imagename,cache=writeback
2 qemu –m 1024 \
-driveformat=raw,file=rbd:poolname/imagename:rbd_cache=true,cache=writeback

注:如若设置了rbd_cache=ture那么必须设置cache=writeback或者risk data loss 否则的话,qemu将不会发送请求给librbd如果qemu没有在配置文件中指定清楚,那么基于rbd的上层文件系统可能能被损坏

7 qemu 缓冲操作

回写:
rbd_cache = true
直写:
rbd_cache = ture
rbd_cache_max_dirty = 0
无:
rbd_cache = false

Git命令

Git命令

Git 是一个很强大的分布式版本控制系统。它不但适用于管理大型开源软件的源代码,管理私人的文档和源代码也有很多优势。
Git常用操作命令:
1) 远程仓库相关命令
检出仓库:$ git clone git://github.com/jquery/jquery.git
查看远程仓库:$ git remote -v
添加远程仓库:$ git remote add [name] [url]
删除远程仓库:$ git remote rm [name]
修改远程仓库:$ git remote set-url --push [name] [newUrl]
拉取远程仓库:$ git pull [remoteName] [localBranchName]
推送远程仓库:$ git push [remoteName] [localBranchName]

*如果想把本地的某个分支test提交到远程仓库,并作为远程仓库的master分支,或者作为另外一个名叫test的分支,如下:
$git push origin test:master // 提交本地test分支作为远程的master分支
$git push origin test:test // 提交本地test分支作为远程的test分支

2)分支(branch)操作相关命令
查看本地分支:$ git branch
查看远程分支:$ git branch -r
创建本地分支:$ git branch [name] ----注意新分支创建后不会自动切换为当前分支
切换分支:$ git checkout [name]
创建新分支并立即切换到新分支:$ git checkout -b [name]
删除分支:$ git branch -d [name] ---- -d选项只能删除已经参与了合并的分支,对于未有合并的分支是无法删除的。如果想强制删除一个分支,可以使用-D选项
合并分支:$ git merge [name] ----将名称为[name]的分支与当前分支合并
创建远程分支(本地分支push到远程):$ git push origin [name]
删除远程分支:$ git push origin :heads/[name] 或 $ gitpush origin :[name]

*创建空的分支:(执行命令之前记得先提交你当前分支的修改,否则会被强制删干净没得后悔)
$git symbolic-ref HEAD refs/heads/[name]
$rm .git/index
$git clean -fdx

3)版本(tag)操作相关命令
查看版本:$ git tag
创建版本:$ git tag [name]
删除版本:$ git tag -d [name]
查看远程版本:$ git tag -r
创建远程版本(本地版本push到远程):$ git push origin [name]
删除远程版本:$ git push origin :refs/tags/[name]
合并远程仓库的tag到本地:$ git pull origin --tags
上传本地tag到远程仓库:$ git push origin --tags
创建带注释的tag:$ git tag -a [name] -m 'yourMessage'

4) 子模块(submodule)相关操作命令
添加子模块:$ git submodule add [url] [path]
如:$git submodule add git://github.com/soberh/ui-libs.git src/main/webapp/ui-libs
初始化子模块:$ git submodule init ----只在首次检出仓库时运行一次就行
更新子模块:$ git submodule update ----每次更新或切换分支后都需要运行一下
删除子模块:(分4步走哦)
1) $ git rm --cached [path]
2) 编辑“.gitmodules”文件,将子模块的相关配置节点删除掉
3) 编辑“ .git/config”文件,将子模块的相关配置节点删除掉
4) 手动删除子模块残留的目录

5)忽略一些文件、文件夹不提交
在仓库根目录下创建名称为“.gitignore”的文件,写入不需要的文件夹名或文件,每个元素占一行即可,如
target
bin
*.db

=====================
Git 常用命令
git branch 查看本地所有分支
git status 查看当前状态
git commit 提交
git branch -a 查看所有的分支
git branch -r 查看本地所有分支
git commit -am "init" 提交并且加注释
git remote add origin git@192.168.1.119:ndshow
git push origin master 将文件给推到服务器上
git remote show origin 显示远程库origin里的资源
git push origin master:develop
git push origin master:hb-dev 将本地库与服务器上的库进行关联
git checkout --track origin/dev 切换到远程dev分支
git branch -D master develop 删除本地库develop
git checkout -b dev 建立一个新的本地分支dev
git merge origin/dev 将分支dev与当前分支进行合并
git checkout dev 切换到本地dev分支
git remote show 查看远程库
git add .
git rm 文件名(包括路径) 从git中删除指定文件
git clone git://github.com/schacon/grit.git 从服务器上将代码给拉下来
git config --list 看所有用户
git ls-files 看已经被提交的
git rm [file name] 删除一个文件
git commit -a 提交当前repos的所有的改变
git add [file name] 添加一个文件到git index
git commit -v 当你用-v参数的时候可以看commit的差异
git commit -m "This is the message describing the commit" 添加commit信息
git commit -a -a是代表add,把所有的change加到git index里然后再commit
git commit -a -v 一般提交命令
git log 看你commit的日志
git diff 查看尚未暂存的更新
git rm a.a 移除文件(从暂存区和工作区中删除)
git rm --cached a.a 移除文件(只从暂存区中删除)
git commit -m "remove" 移除文件(从Git中删除)
git rm -f a.a 强行移除修改后文件(从暂存区和工作区中删除)
git diff --cached 或 $ git diff --staged 查看尚未提交的更新
git stash push 将文件给push到一个临时空间中
git stash pop 将文件从临时空间pop下来
---------------------------------------------------------
git remote add origin git@github.com:username/Hello-World.git
git push origin master 将本地项目给提交到服务器中
-----------------------------------------------------------
git pull 本地与服务器端同步
-----------------------------------------------------------------
git push (远程仓库名) (分支名) 将本地分支推送到服务器上去。
git push origin serverfix:awesomebranch
------------------------------------------------------------------
git fetch 相当于是从远程获取最新版本到本地,不会自动merge
git commit -a -m "log_message" (-a是提交所有改动,-m是加入log信息) 本地修改同步至服务器端 :
git branch branch_0.1 master 从主分支master创建branch_0.1分支
git branch -m branch_0.1 branch_1.0 将branch_0.1重命名为branch_1.0
git checkout branch_1.0/master 切换到branch_1.0/master分支
du -hs

-----------------------------------------------------------
mkdir WebApp
cd WebApp
git init
touch README
git add README
git commit -m 'first commit'
git remote add origin git@github.com:daixu/WebApp.git
git push -u origin maste

Git 常用命令速查表

 

 

..

 

形容智慧语与句

形容智慧的成语

1) 悉心竭力:悉心:尽心;竭力:用尽全力。竭尽智慧和力量。

2) 目达耳通:形容感觉灵敏,非常聪明。

3) 七窍玲珑:形容聪明灵巧。相传心有七窍,故称。

4) 颖悟绝伦:颖悟:聪颖。绝伦:超过同辈。聪明过人。亦作“颖悟绝人”。

5) 姱容修态:姱:美好;修:长远;态:志向。美丽的容貌,长远的智慧。

6) 矜智负能:矜:夸耀。夸耀智慧和才能。

7) 材高知深:材:通“才”。知:通“智”。才能出众,智慧高超。

8) 饰智矜愚:装作有智慧而在无知者面前夸耀。

9) 聪明睿达:聪明:聪敏有智慧。形容洞察力强,见识卓越。

10) 私智小慧:私:个人的;慧:智慧。个人的智慧和小聪明。指带有片面性而又自以为是的聪明。

表示智慧的成语

1) 折冲万里:折冲:指抵御敌人。指在远离沙场的庙堂上以谋略和智慧克敌制胜。常用以形容高明的外交才干或在外交争端中取得胜利。

2) 百龙之智:龙:公孙龙,战国时人,著有《公孙龙子》。一百个公孙龙的智慧。形容非常聪明。

3) 竭智尽力:用尽智慧和力量。

4) 积思广益:指集中众人的智慧,可使效果更大更好。

5) 聪慧绝伦:绝伦:同类中无可比拟者。指十分聪明智慧。

6) 足智多谋:足:充实,足够;智:聪明、智慧;谋:计谋。富有智慧,善于谋划。形容人善于料事和用计。

7) 教一识百:形容具有特殊的才能、智慧。

8) 没魂少智:智:智慧。形容失魂落魄的样子。

9) 知出乎争:智:同“智”;争:斗争。聪明才智是在反复斗争中锻炼出来的。比喻智慧来源于实践。

10) 才疏智浅:才:才能;疏:稀少;智:智慧。才识不高,智力短浅。用作自谦之词。

11) 虚室生白:虚:使空虚;室:指心;白:指道。心无任何杂念,就会悟出“道”来,生出智慧。也常用以形容清澈明朗的境界。

12) 智珠在握:智珠:佛教指本性的智慧。比喻具有高深的智慧并能应付任何事情。

13) 集思广益:集:集中;思:思考,意见;广:扩大。指集中群众的智慧,广泛吸收有益的意见。

14) 明镜不疲:明亮的镜子不为频繁地照人而疲劳。比喻人的智慧不会因使用而受损害。

15) 聪明睿知:聪明:聪敏有智慧。形容洞察力强,见识卓越。

16) 殚智竭力:殚:竭尽。用尽智慧和力量。

17) 智尽力穷:智慧和能力都已用尽。

18) 不测之智:测:估计;智:才智,智慧。不可估计的才智。形容智高才广。

19) 矜愚饰智:装作有智慧,在愚人面前夸耀自己。

20) 私智小惠:个人的智慧和小聪明。指带有片面性而又自以为是的聪明。

形容智慧的成语及解释

1) 一士之智:智:智慧。一个人的智慧。形容有限的才智。

2) 人多智广:人多智慧也多。用来强调人多出智慧。

3) 殚智竭虑:用尽智慧,竭力谋虑。

4) 智尽能索:索:竭尽。智慧和能力都已用尽。

5) 施谋用智:智:智慧,计谋。运用策略计谋。

6) 智贵免祸:智:智慧。人的聪明智慧,正当使用,可以使他避免灾祸。

7) 绝圣弃知:绝:断绝;圣:智慧;弃:舍去,抛开;知:通“智”,智慧。指摒弃聪明智巧,回归天真纯朴。

8) 敏而好学:敏:聪明;好:喜好。天资聪明而又好学。

9) 绝圣弃智:圣、智:智慧,聪明。弃绝聪明才智,返归天真纯朴。这是古代老、庄的无为而治的思想。

10) 虚室上白:虚:使空虚;室:指心;白:指道。心无任何杂念,就会悟出“道”来,生出智慧。也常用以形容清澈明朗的境界。

11) 明昭昏蒙:昭:明白;蒙:愚昧无知。聪明而通晓事理,愚昧而不明事理。

12) 高世之智:高世:超过世人;智:智慧,才智。具有超出世人的才智。形容才智非凡。

13) 双修福慧:修:善。福德和智慧都修行到了。指既有福,又聪明。

14) 才智过人:才能智慧都胜过一般人。

15) 万物之灵:万物:泛指天地间的所有生物;灵:聪明、灵巧。世上一切物种中最有灵性的。指人而言。

16) 负薪之资:负薪:背柴草,旧指地位低微的人;资:资质,指智慧,能力。指卑贱者的资质。

17) 醍醐灌顶:醍醐:酥酪上凝聚的油。用纯酥油浇到头上。佛教指灌输智慧,使人彻底觉悟。比喻听了高明的意见使人受到很大启发。也形容清凉舒适。

18) 知以藏往:知:同“智”,才智;以:已经;藏:包含。人的智慧包含在过去的事物中。比喻聪明才智来源于过去的经验教训。

19) 至智弃智:智慧达到极点,就可舍弃智慧不用。

20) 知尽能索:比喻智慧能力都竭尽了。

LBS / 经伟度 / latitude / longitude / gps

..

纬度:

北纬为正数,南纬为负数。
纬度 是指某点与地球球心的连线和地球赤道面所成的线面角,其数值在0至90度之间。位于赤道以北的点的纬度叫北纬,记为N;位于赤道以南的点的纬度称南纬,记为S。
纬度数值在0至30度之间的地区称为低纬度地区;纬度数值在30至60度之间的地区称为中纬度地区;纬度数值在60至90度之间的地区称为高纬度地区。
赤道、南回归线、北回归线、南极圈和北极圈是特殊的纬线。
各纬度线附近的城市和其他地理区北纬90度:北极
北纬80度:
北纬70度:摩尔曼斯克
北纬60度:奥斯陆、斯德哥尔摩、赫尔辛基、圣彼得堡、雷克雅维克
北纬50度:伦敦、巴黎、布鲁塞尔、法兰克福、布拉格、克拉科夫、基辅、温哥华、莫斯科
北纬40度:马德里、伊斯坦布尔、安卡拉、喀什、北京、盐湖城、丹佛、华盛顿、纽约
北纬35度:东京
北纬30度:开罗、苏伊士运河、科威特城、新德里、珠穆朗玛峰、拉萨、三江并流、重庆、长江三峡、武汉、杭州、休斯敦、新奥尔良
北纬20度:香港、撒哈拉沙漠、吉达、台湾、孟买、内比都、广州、海口、福建省、火奴鲁鲁、墨西哥城
北纬10度:墨西哥城、科纳克里、亚的斯亚贝巴、胡志明市、宿务、圣荷西、巴拿马城、巴拿马运河、加拉加斯
赤道:圣多美、利伯维尔、坎帕拉、新加坡、基多
南纬10度:罗安达、帝力、莫尔兹比港、利马、累西腓
南纬20度:塔那那利佛、苏瓦、苏克雷
南纬30度:悉尼、开普敦、布隆方丹、德班、布里斯班、复活节岛、圣地亚哥
南纬35度:堪培拉
南纬40度:惠灵顿
南纬50度:麦哲伦海峡
南纬60度:德雷克海峡
南纬70度:
南纬80度:
南纬90度:南极、阿蒙森-斯科特站

地球的子午线总长度大约40008km。
平均:
纬度1度 = 大约111km
纬度1分= 大约1.85km
纬度1秒= 大约30.9m

赤道的纬度为0°,将行星平分为南半球和北半球。

转换

经纬度以度数表示,一般可直接以小数点表示,但亦可把度数的小数点分为角分(1角分等于六十分之一度),和秒(一秒等于六十分之一分)。表示经纬度有多样模式,以下是其中一些例子。

度分秒表示(度:分:秒)-49°30'00"-123d30m00s
度分表示(度:分)-49°30.0'-123d30.0m
度数表示-49.5000°-123.5000d(一般会有四位小数)。

海峡

北半球
白令海峡(西经170、北极圈)
东南亚的马六甲(东经100、北纬2.2度)
西亚(阿拉伯半岛)的霍尔木兹海峡(东经60、北回归线)
曼德海峡(东经45、北纬14)
土耳其海峡(东经30、北纬40)
直布罗陀海峡(西经5、北纬36)
南半球
非洲的莫桑比克海峡(东经40、南纬20)
南美洲的麦哲伦海峡(西经70、南纬53)
重要的半岛:
朝鲜半岛(东经126、北纬38)
中南半岛(东经100、北纬15)
马来半岛(东经102、北纬5)
印度半岛(东经80、北纬20)
阿拉伯半岛(东经50、北纬20)
西奈半岛(东经35、北纬30)
小亚细亚半岛(东经30、北纬40)
巴尔干半岛(东经20、北纬40)
亚平宁半岛(东经15、北纬40)
伊比利亚半岛(西经5、北纬40)
日德兰半岛(东经5、北纬55)
斯勘的纳维亚半岛(东经10、北纬60)
约克角(东经145、南纬15)
阿拉斯加半岛(西经165、北纬60)
下加利福尼亚半岛(西经110、北回归线)

经伟度扩展

经纬度划分规则图

 

距离扩展

计算经纬度之间的距离计算经纬度之间的距离

该模型将地球看成圆球,假设地球上有A(ja,wa),B(jb,wb)两点(注:ja和jb分别是A和B的经度,wa和wb分别是A和B的纬度),A和B两点的球面距离就是AB的弧长,AB弧长=R*角AOB(注:角AOB是A跟B的夹角,O是地球的球心,R是地球半径,约为6367000米)。如何求出角AOB呢?可以先求AOB的最大边AB的长度,再根据余弦定律可以求夹角。

google maps脚本中的计算距离代码:

private const double EARTH_RADIUS = 6378.137;
private static double rad(double d)
{
   return d * Math.PI / 180.0;
}
public static double GetDistance(double lat1, double lng1, double lat2, double lng2)
{
   double radLat1 = rad(lat1);
   double radLat2 = rad(lat2);
   double a = radLat1 - radLat2;
   double b = rad(lng1) - rad(lng2);
   double s = 2 * Math.Asin(Math.Sqrt(Math.Pow(Math.Sin(a/2),2) +
    Math.Cos(radLat1)*Math.Cos(radLat2)*Math.Pow(Math.Sin(b/2),2)));
   s = s * EARTH_RADIUS;
   s = Math.Round(s * 10000) / 10000;
   return s;
}

公式:

 

外形扩展

地球由于受到自转时的惯性及离心力的作用,他并非完美的圆形。所以地球最高点并不是珠穆朗玛峰,虽然其海拔有8848米,由于地球不是完美的球型,所以赤道附近的山峰其实离星空更近一些,因此地球最高点理论上是厄瓜多尔博拉索山(Mount Chimborazo),它的海拔虽然有6272米,却比珠峰”高“出2400米。

 

 

Ceph的安装

Ceph的安装

官网链接:http://docs.ceph.com/docs/master/install/install-storage-cluster/

2.1 相关依赖的安装

我这里操作系统都是Centos7.2的操作系统。

#yum install yum-plugin-priorities -y

# cat /etc/yum/pluginconf.d/priorities.conf   #确保priority.conf启用插件。

[main]
enabled = 1

# vim /etc/yum.repos.d/ceph.repo  #做ceph源文件

[Ceph]
name=Ceph packages for $basearch
baseurl=http://download.ceph.com/rpm-jewel/el7/$basearch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[Ceph-noarch]
name=Ceph noarch packages
baseurl=http://download.ceph.com/rpm-jewel/el7/noarch
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

[ceph-source]
name=Ceph source packages
baseurl=http://download.ceph.com/rpm-jewel/el7/SRPMS
enabled=1
gpgcheck=1
type=rpm-md
gpgkey=https://download.ceph.com/keys/release.asc

OR

[ceph]
name=ceph
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/
gpgcheck=0

#yum install snappy leveldb gdisk python-argparse gperftools-libs -y

2.2 ceph安装

rpm -Uvh http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/ceph-release-1-1.el7.noarch.rpm
yum -y install epel-release
yum -y install ceph-deploy
yum -y install ntp ntpdate ntp-doc
yum -y install openssh-server openssh-client
yum -y install yum-plugin-priorities -y
yum -y install redhat-lsb
yum -y install ceph ceph-release ceph-common ceph-radosgw

 


性能优化:

 Ceph Configurations

[global]

参数名 描述 默认值 建议值
public network 客户端访问网络 192.168.100.0/24
cluster network 集群网络 192.168.1.0/24
max open files 如果设置了该选项,Ceph会设置系统的max open fds 0 131072

  • 查看系统最大文件打开数可以使用命令
1
cat /proc/sys/fs/file-max

[osd] - filestore

参数名 描述 默认值 建议值
filestore xattr use omap 为XATTRS使用object map,EXT4文件系统时使用,XFS或者btrfs也可以使用 false true
filestore max sync interval 从日志到数据盘最大同步间隔(seconds) 5 15
filestore min sync interval 从日志到数据盘最小同步间隔(seconds) 0.1 10
filestore queue max ops 数据盘最大接受的操作数 500 25000
filestore queue max bytes 数据盘一次操作最大字节数(bytes) 100 << 20 10485760
filestore queue committing max ops 数据盘能够commit的操作数 500 5000
filestore queue committing max bytes 数据盘能够commit的最大字节数(bytes) 100 << 20 10485760000
filestore op threads 并发文件系统操作数 2 32

  • 调整omap的原因主要是EXT4文件系统默认仅有4K
  • filestore queue相关的参数对于性能影响很小,参数调整不会对性能优化有本质上提升

[osd] - journal

参数名 描述 默认值 建议值
osd journal size OSD日志大小(MB) 5120 20000
journal max write bytes journal一次性写入的最大字节数(bytes) 10 << 20 1073714824
journal max write entries journal一次性写入的最大记录数 100 10000
journal queue max ops journal一次性最大在队列中的操作数 500 50000
journal queue max bytes journal一次性最大在队列中的字节数(bytes) 10 << 20 10485760000

  • Ceph OSD Daemon stops writes and synchronizes the journal with the filesystem, allowing Ceph OSD Daemons to trim operations from the journal and reuse the space.
  • 上面这段话的意思就是,Ceph OSD进程在往数据盘上刷数据的过程中,是停止写操作的。

[osd] - osd config tuning

参数名 描述 默认值 建议值
osd max write size OSD一次可写入的最大值(MB) 90 512
osd client message size cap 客户端允许在内存中的最大数据(bytes) 524288000 2147483648
osd deep scrub stride 在Deep Scrub时候允许读取的字节数(bytes) 524288 131072
osd op threads OSD进程操作的线程数 2 8
osd disk threads OSD密集型操作例如恢复和Scrubbing时的线程 1 4
osd map cache size 保留OSD Map的缓存(MB) 500 1024
osd map cache bl size OSD进程在内存中的OSD Map缓存(MB) 50 128
osd mount options xfs Ceph OSD xfs Mount选项 rw,noatime,inode64 rw,noexec,nodev,noatime,nodiratime,nobarrier

  • 增加osd op threads和disk threads会带来额外的CPU开销

[osd] - recovery tuning

参数名 描述 默认值 建议值
osd recovery op priority 恢复操作优先级,取值1-63,值越高占用资源越高 10 4
osd recovery max active 同一时间内活跃的恢复请求数 15 10
osd max backfills 一个OSD允许的最大backfills数 10 4

[osd] - client tuning

参数名 描述 默认值 建议值
rbd cache RBD缓存 true true
rbd cache size RBD缓存大小(bytes) 33554432 268435456
rbd cache max dirty 缓存为write-back时允许的最大dirty字节数(bytes),如果为0,使用write-through 25165824 134217728
rbd cache max dirty age 在被刷新到存储盘前dirty数据存在缓存的时间(seconds) 1 5

关闭Debug

3. PG Number

PG和PGP数量一定要根据OSD的数量进行调整,计算公式如下,但是最后算出的结果一定要接近或者等于一个2的指数。

Total PGs = (Total_number_of_OSD * 100) / max_replication_count

例如15个OSD,副本数为3的情况下,根据公式计算的结果应该为500,最接近512,所以需要设定该pool(volumes)的pg_num和pgp_num都为512.

1
2
ceph osd pool set volumes pg_num 512
ceph osd pool set volumes pgp_num 512

4. CRUSH Map

CRUSH是一个非常灵活的方式,CRUSH MAP的调整取决于部署的具体环境,这个可能需要根据具体情况进行分析,这里面就不再赘述了。

 


Mon删除

从健康的集群中删除

1、  systemctl stop ceph-mon@{mon-id}

2、  ceph mon remove {mon-id}

3、  从ceph.conf中删除

从不健康的集群中删除

1、  ceph mon dump

2、  service ceph stop mon

3、  ceph-mon –i {mon-id} --extract-monmap {mappath}

4、  monmaptool {mappath}  -rm {mon-id}

5、  ceph-mon –I {mon-id} –inject-monmap {monpath}


参考:

删除Ceph OSD节点

Ceph性能优化

 

 

..

CEPH pg/pgs

PG全称是placement groups,它是ceph的逻辑存储单元。在数据存储到cesh时,先打散成一系列对象,再结合基于对象名的哈希操作、复制级别、PG数量,产生目标PG号。根据复制级别的不同,每个PG在不同的OSD上进行复制和分发。可以把PG想象成存储了多个对象的逻辑容器,这个容器映射到多个具体的OSD。PG存在的意义是提高ceph存储系统的性能和扩展性。

如果没有PG,就难以管理和跟踪数以亿计的对象,它们分布在数百个OSD上。对ceph来说,管理PG比直接管理每个对象要简单得多。每个PG需要消耗一定的系统资源包括CPU、内存等。集群的PG数量应该被精确计算得出。通常来说,增加PG的数量可以减少OSD的负载,但是这个增加应该有计划进行。一个推荐配置是每OSD对应50-100个PG。如果数据规模增大,在集群扩容的同时PG数量也需要调整。CRUSH会管理PG的重新分配。

每个pool应该分配多少个PG,与OSD的数量、复制份数、pool数量有关,有个计算公式在:

http://ceph.com/pgcalc/

《learning ceph》这本书里的计算公式也差不多:

Total PGs = ((Total_number_of_OSD * 100) / max_replication_count) / pool_count

结算的结果往上取靠近2的N次方的值。比如总共OSD数量是160,复制份数3,pool数量也是3,那么按上述公式计算出的结果是1777.7。取跟它接近的2的N次方是2048,那么每个pool分配的PG数量就是2048。

在更改pool的PG数量时,需同时更改PGP的数量。PGP是为了管理placement而存在的专门的PG,它和PG的数量应该保持一致。如果你增加pool的pg_num,就需要同时增加pgp_num,保持它们大小一致,这样集群才能正常rebalancing。下面介绍如何修改pg_num和pgp_num。

(1)检查rbd这个pool里已存在的PG和PGP数量:

$ ceph osd pool get rbd pg_num
pg_num: 128
$ ceph osd pool get rbd pgp_num
pgp_num: 128

(2)检查pool的复制size,执行如下命令:

$ ceph osd dump |grep size|grep rbd
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash 
rjenkins pg_num 128 pgp_num 128 last_change 45 flags hashpspool stripe_width 0

(3)使用上述公式,根据OSD数量、复制size、pool的数量,计算出新的PG数量,假设是256.

(4)变更rbd的pg_num和pgp_num为256:

$ ceph osd pool set rbd pg_num 256
$ ceph osd pool set rbd pgp_num 256

(5)如果有其他pool,同步调整它们的pg_num和pgp_num,以使负载更加均衡。

##############################################################

Ceph常用命令总结

1. 创建自定义pool

ceph osd pool create pg_num pgp_num

其中pgp_num为pg_num的有效归置组个数,是一个可选参数。pg_num应该足够大,不要拘泥于官方文档的计算方法,根据实际情况选择256、512、1024、2048、4096。

2. 设置pool的副本数、最小副本数、最大副本数

ceph osd pool set <poolname> size 2
ceph osd pool set <poolname> min_size 1
ceph osd pool set <poolname> max_size 10

资源所限,如果不希望保存3副本,可以用该命令对特定的pool更改副本存放数。

利用get可以获得特定pool的副本数。

ceph osd pool get <poolname> size

3. 增加osd

可以利用ceph-deploy增加osd:

ceph-deploy osd prepare monosd1:/mnt/ceph osd2:/mnt/ceph
ceph-deploy osd activate monosd1:/mnt/ceph osd2:/mnt/ceph

#相当于:
ceph-deploy osd create monosd1:/mnt/ceph osd2:/mnt/ceph

#还有一种方法,在安装osd时同时指定对应的journal的安装路径
ceph-deploy osd create osd1:/cephmp1:/dev/sdf1 /cephmp2:/dev/sdf2

也可以手动增加:

## Prepare disk first, create partition and format it
<insert parted oneliner>
mkfs.xfs -f /dev/sdd
mkdir /cephmp1
mount /dev/sdd /cephmp1
cd /cephmp1

ceph-osd -i 12 --mkfs --mkkey
ceph auth add osd.12 osd 'allow *' mon 'allow rwx' -i /cephmp1/keyring

#change the crushmap
ceph osd getcrushmap -o map
crushtool -d map -o map.txt
vim map.txt
crushtool -c map.txt -o map
ceph osd setcrushmap -i map

## Start it
/etc/init.d/ceph start osd.12

4. 删除osd

先将此osd停止工作:

## Mark it out
ceph osd out 5

## Wait for data migration to complete (ceph -w), then stop it
service ceph -a stop osd.5

## Now it is marked out and down

再对其进行删除操作:

## If deleting from active stack, be sure to follow the above to mark it out and down
ceph osd crush remove osd.5

## Remove auth for disk
ceph auth del osd.5

## Remove disk
ceph osd rm 5

## Remove from ceph.conf and copy new conf to all hosts

5. 查看osd总体情况、osd的详细信息、crush的详细信息

ceph osd tree
ceph osd dump --format=json-pretty
ceph osd crush dump --format=json-pretty

6. 获得并修改CRUSH maps

## save current crushmap in binary
ceph osd getcrushmap -o crushmap.bin

## Convert to txt
crushtool -d crushmap.bin -o crushmap.txt

## Edit it and re-convert to binary
crushtool -c crushmap.txt -o crushmap.bin.new

## Inject into running system
ceph osd setcrushmap -i crushmap.bin.new

## If you've added a new ruleset and want to use that for a pool, do something like:
ceph osd pool default crush rule = 4

#也可以这样设置一个pool的rule
cpeh osd pool set testpool crush_ruleset <ruleset_id>

-o=output; -d=decompile; -c=compile; -i=input

记住这些缩写,上面的命令就很容易理解了。

7. 增加/删除journal

为了提高性能,通常将ceph的journal置于单独的磁盘或分区中:

先利用以下命令设置ceph集群为nodown:

  • ceph osd set nodown
# Relevant ceph.conf options
 -- existing setup --
[osd]
    osd data = /srv/ceph/osd$id
    osd journal = /srv/ceph/osd$id/journal
    osd journal size = 512

# stop the OSD:
/etc/init.d/ceph osd.0 stop
/etc/init.d/ceph osd.1 stop
/etc/init.d/ceph osd.2 stop

# Flush the journal:
ceph-osd -i 0 --flush-journal
ceph-osd -i 1 --flush-journal
ceph-osd -i 2 --flush-journal

# Now update ceph.conf - this is very important or you'll just recreate 
journal on the same disk again
 -- change to [filebased journal] --
[osd]
    osd data = /srv/ceph/osd$id
    osd journal = /srv/ceph/journal/osd$id/journal
    osd journal size = 10000

 -- change to [partitionbased journal (journal
 in this case would be on /dev/sda2)] --
[osd]
    osd data = /srv/ceph/osd$id
    osd journal = /dev/sda2
    osd journal size = 0

# Create new journal on each disk
ceph-osd -i 0 --mkjournal
ceph-osd -i 1 --mkjournal
ceph-osd -i 2 --mkjournal

# Done, now start all OSD again
/etc/init.d/ceph osd.0 start
/etc/init.d/ceph osd.1 start
/etc/init.d/ceph osd.2 start

记得将nodown设置回来:

  • ceph osd unset nodown

8. ceph cache pool

经初步测试,ceph的cache pool性能并不好,有时甚至低于无cache pool时的性能。可以利用flashcache等替代方案来优化ceph的cache。

ceph osd tier add satapool ssdpool
ceph osd tier cache-mode ssdpool writeback
ceph osd pool set ssdpool hit_set_type bloom
ceph osd pool set ssdpool hit_set_count 1

## In this example 80-85% of the cache pool is equal to 280GB
ceph osd pool set ssdpool target_max_bytes $((280*1024*1024*1024))
ceph osd tier set-overlay satapool ssdpool
ceph osd pool set ssdpool hit_set_period 300
ceph osd pool set ssdpool cache_min_flush_age 300   # 10 minutes
ceph osd pool set ssdpool cache_min_evict_age 1800   # 30 minutes
ceph osd pool set ssdpool cache_target_dirty_ratio .4
ceph osd pool set ssdpool cache_target_full_ratio .8

9. 查看运行时配置

ceph --admin-daemon /var/run/ceph/ceph-osd.0.asok config show

10. 查看监控集群状态

ceph health
cehp health detail
ceph status
ceph -s

#可以加上--fortmat=json-pretty
ceph osd stat
ceph osd dump
ceph osd tree

ceph mon stat
ceph quorum_status
ceph mon dump

ceph mds stat
ceph mds dump

11. 查看所有的pool

ceph osd lspools
rados lspools

12. 查看kvm和qemu是否支持rbd

qemu-system-x86_64 -drive format=?
qemu-img -h | grep rbd

13, 查看特定的pool及其中的文件

rbd ls testpool
rbd create testpool/test.img -s 1024 --image-format=2
rbd info testpool/test.img
rbd rm testpool/test.img

#统计块数
rados -p testpool ls | grep ^rb.0.11a1 | wc -l
#导入并查看文件
rados makepool testpool
rados put -p testpool logo.png logo.png
ceph osd map testpool logo.png

rbd import logo.png testpool/logo.png
rbd info testpool/logo.png

14. 挂载/卸载创建的块设备

ceph osd pool create testpool 256 256
rbd create testpool/test.img -s 1024 --image-format=2
rbd map testpool/test.img
rbd showmapped
mkfs.xfs /dev/rbd0
rbd unmap /dev/rbd0

15. 创建快照

#创建
rbd snap create testpool/test.img@test.img-snap1
#查看
rbd snap ls testpool/test.img
#回滚
rbd snap rollback testpool/test.img@test.img-snap1
#删除
rbd snap rm testpool/test.img@test.img-snap1
#清除所有快照
rbd snap purge testpool/test.img

16. 计算合理的pg数

官方建议每OSD50-100个pg。total pgs=osds*100/副本数,例如6osd、2副本的环境,pgs为6*100/2=300

pg数只能增加,无法减少;增加pg_num后必须同时增减pgp_num

17. 对pool的操作

ceph osd pool create testpool 256 256
ceph osd pool delete testpool testpool --yes-i-really-really-mean-it
ceph osd pool rename testpool anothertestpool
ceph osd pool mksnap testpool testpool-snap

18. 重新安装前的格式化

ceph-deploy purge osd0 osd1
ceph-deploy purgedata osd0 osd1
ceph-deploy forgetkeys

ceph-deploy disk zap --fs-type xfs osd0:/dev/sdb1

19. 修改osd journal的存储路径

#noout参数会阻止osd被标记为out,使其权重为0
ceph osd set noout
service ceph stop osd.1
ceph-osd -i 1 --flush-journal
mount /dev/sdc /journal
ceph-osd -i 1 --mkjournal /journal
service ceph start osd.1
ceph osd unset noout

20. xfs挂载参数

mkfs.xfs -n size=64k /dev/sdb1

#/etc/fstab挂载参数
rw,noexec,nodev,noatime,nodiratime,nobarrier

21. 认证配置

[global]
auth cluser required = none
auth service required = none
auth client required = none

#0.56之前
auth supported = none

22. pg_num不够用,进行迁移和重命名

ceph osd pool create new-pool pg_num
rados cppool old-pool new-pool
ceph osd pool delete old-pool
ceph osd pool rename new-pool old-pool

#或者直接增加pool的pg_num

23. 推送config文件

ceph-deploy --overwrite-conf config push mon1 mon2 mon3

24. 在线修改config参数

ceph tell osd.* injectargs '--mon_clock_drift_allowde 1'

使用此命令需要区分配置的参数属于mon、mds还是osd。


Ceph三节点(3 mon+9 osd)集群部署

2017-7-14

一、基础环境准备
3台机器,每台机器标配:1G内存、2块网卡、3块20G的SATA裸盘
"$"符号表示三个节点都进行同样配置
$ cat /etc/hosts
ceph-node1 10.20.0.101
ceph-node2 10.20.0.102
ceph-node3 10.20.0.103
$ yum install epel-release

二、安装ceph-deploy
[root@ceph-node1 ~]# ssh-keygen //在node1节点配置免SSH密钥登录到其他节点
[root@ceph-node1 ~]# ssh-copy-id ceph-node2
[root@ceph-node1 ~]# ssh-copy-id ceph-node3
[root@ceph-node1 ~]# yum install ceph-deploy -y

三个节点安装: yum install ceph

[root@ceph01 yum.repos.d]# cat ceph.repo
[ceph]
name=ceph
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/x86_64/
gpgcheck=0
[ceph-noarch]
name=cephnoarch
baseurl=http://mirrors.aliyun.com/ceph/rpm-jewel/el7/noarch/
gpgcheck=0
[root@ceph01 yum.repos.d]#

[root@ceph-node1 ~]# ceph-deploy new ceph-node1 ceph-node2 ceph-node3 //new命令会生成一个ceph-node1集群,并且会在当前目录生成配置文件和密钥文件

+++++++++++++++++++++++++++++++++++++++
[root@ceph-node1 ~]# ceph -v
ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af)
[root@ceph-node1 ~]# ceph-deploy mon create-initial //创建第一个monitor
[root@ceph-node1 ~]# ceph status //此时集群处于error状态是正常的
cluster ea54af9f-f286-40b2-933d-9e98e7595f1a
health HEALTH_ERR
[root@ceph-node1 ~]# systemctl start ceph
[root@ceph-node1 ~]# systemctl enable ceph

 

三、创建对象存储设备OSD,并加入到ceph集群
[root@ceph-node1 ~]# ceph-deploy disk list ceph-node1 //列出ceph-node1已有的磁盘,很奇怪没有列出sdb、sdc、sdd,但是确实存在的
//下面的zap命令慎用,会销毁磁盘中已经存在的分区表和数据。ceph-node1是主机名,同样可以是ceph-node2
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node1:sdb ceph-node1:sdc ceph-node1:sdd
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node1:sdb ceph-node1:sdc ceph-node1:sdd //擦除磁盘原有数据,并创建新的文件系统,默认是XFS,然后将磁盘的第一个分区作为数据分区,第二个分区作为日志分区。加入到OSD中。
[root@ceph-node1 ~]# ceph status //可以看到集群依旧没有处于健康状态。我们需要再添加一些节点到ceph集群中,以便它能够形成分布式的、冗余的对象存储,这样集群状态才为健康。
cluster ea54af9f-f286-40b2-933d-9e98e7595f1a
health HEALTH_WARN
64 pgs stuck inactive
64 pgs stuck unclean
monmap e1: 1 mons at {ceph-node1=10.20.0.101:6789/0}
election epoch 2, quorum 0 ceph-node1
osdmap e6: 3 osds: 0 up, 0 in
pgmap v7: 64 pgs, 1 pools, 0 bytes data, 0 objects
0 kB used, 0 kB / 0 kB avail
64 creating

 

四、纵向扩展多节点Ceph集群,添加Monitor和OSD
注意:Ceph存储集群最少需要一个Monitor处于运行状态,要提供可用性的话,则需要奇数个monitor,比如3个或5个,以形成仲裁(quorum)。
(1)在Ceph-node2和Ceph-node3部署monitor,但是是在ceph-node1执行命令!
[root@ceph-node1 ~]# ceph-deploy mon add ceph-node2
[root@ceph-node1 ~]# ceph-deploy mon add ceph-node3
++++++++++++++++++++++++++
报错:[root@ceph-node1 ~]# ceph-deploy mon create ceph-node2
[ceph-node3][WARNIN] Executing /sbin/chkconfig ceph on
[ceph-node3][ERROR ] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory
[ceph-node3][WARNIN] monitor: mon.ceph-node3, might not be running yet
[ceph-node3][INFO ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.ceph-node3.asok mon_status
[ceph-node2][WARNIN] neither `public_addr` nor `public_network` keys are defined for monitors
解决:①通过在CentOS 7上chkconfig,怀疑节点1并没有远程启动节点2的ceph服务,如果我在node2手动启动的话,应该就可以了
[root@ceph-node2 ~]# systemctl status ceph
● ceph.service - LSB: Start Ceph distributed file system daemons at boot time
Loaded: loaded (/etc/rc.d/init.d/ceph)
Active: inactive (dead)
结果enable后还是失败了
②沃日,原来是书上写错了,在已经添加了监控节点后,后续添加监控节点应该是mon add,真的是醉了!

++++++++++++++++++++++++++++++++
[root@ceph-node1 ~]# ceph status
monmap e3: 3 mons at {ceph-node1=10.20.0.101:6789/0,ceph-node2=10.20.0.102:6789/0,ceph-node3=10.20.0.103:6789/0}
election epoch 8, quorum 0,1,2 ceph-node1,ceph-node2,ceph-node3

(2)添加更多的OSD节点,依然在ceph-node1执行命令即可。
[root@ceph-node1 ~]# ceph-deploy disk list ceph-node2 ceph-node3
//确保磁盘号不要出错,否则的话,容易把系统盘都给格式化了!
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd
//经过实践,下面的这条命令,osd create最好分两步,prepare和activate,终于为什么不清楚。
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy osd create ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd

++++++++++++++++++++++++++++++++++++++++++++
报错:
[ceph-node3][WARNIN] ceph-disk: Error: Command '['/usr/sbin/sgdisk', '--new=2:0:5120M', '--change-name=2:ceph journal', '--partition-guid=2:fa28bc46-55de-464a-8151-9c2b51f9c00d', '--typecode=2:45b0969e-9b03-4f30-b4c6-b4b80ceff106', '--mbrtogpt', '--', '/dev/sdd']' returned non-zero exit status 4
[ceph-node3][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /dev/sdd
[ceph_deploy][ERROR ] GenericError: Failed to create 3 OSDs
未解决:原来敲入osd create命令不小心把node2写成node3了,哎我尼玛,后面越来越难办了
+++++++++++++++++++++++++++++

[root@ceph-node1 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.08995 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0.02998 host ceph-node2
3 0.00999 osd.3 down 0 1.00000
4 0.00999 osd.4 down 0 1.00000
8 0.00999 osd.8 down 0 1.00000
-4 0.02998 host ceph-node3
5 0.00999 osd.5 down 0 1.00000
6 0.00999 osd.6 down 0 1.00000
7 0.00999 osd.7 down 0 1.00000

有6个OSD都处于down状态,ceph-deploy osd activate 依然失败,根据之前的报告,osd create的时候就是失败的。
未解决:由于刚部署ceph集群,还没有数据,可以把OSD给清空。
参考文档:http://www.cnblogs.com/zhangzhengyan/p/5839897.html
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node3:sdb ceph-node3:sdc ceph-node3:sdd

(1)从ceph osd tree移走crush map的osd.4,还有其他osd号
[root@ceph-node1 ~]# ceph osd crush remove osd.3
[root@ceph-node1 ~]# ceph osd crush remove osd.4
[root@ceph-node1 ~]# ceph osd crush remove osd.8
[root@ceph-node1 ~]# ceph osd crush remove osd.5
[root@ceph-node1 ~]# ceph osd crush remove osd.6
[root@ceph-node1 ~]# ceph osd crush remove osd.7

(2)[root@ceph-node1 ~]# ceph osd rm 3
[root@ceph-node1 ~]# ceph osd rm 4
[root@ceph-node1 ~]# ceph osd rm 5
[root@ceph-node1 ~]# ceph osd rm 6
[root@ceph-node1 ~]# ceph osd rm 7
[root@ceph-node1 ~]# ceph osd rm 8

[root@ceph-node1 ~]# ceph osd tree //终于清理干净了
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.02998 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0 host ceph-node2
-4 0 host ceph-node3
可以登录到node2和node3,sdb/sdc/sdd都被干掉了,除了还剩GPT格式,现在重新ceph-deploy osd create
我尼玛还是报错呀,[ceph_deploy.osd][ERROR ] Failed to execute command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /dev/sdd

++++++++++++++++++++++++++++++++++
报错:(1)node1远程激活node2的osd出错。prepare和activate能够取代osd create的步骤
[root@ceph-node1 ~]# ceph-deploy osd prepare ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd
[root@ceph-node1 ~]# ceph-deploy osd activate ceph-node2:sdb ceph-node2:sdc ceph-node2:sdd

[ceph-node2][WARNIN] ceph-disk: Cannot discover filesystem type: device /dev/sdb: Line is truncated:
[ceph-node2][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy][ERROR ] RuntimeError: Failed to execute command: ceph-disk -v activate --mark-init sysvinit --mount /dev/sdb
解决:格式分区权限问题,在报错的节点执行ceph-disk activate-all即可。

(2)明明是node2的sdb盘,但是ceph osd tree却发现执行的是node3的sdb盘,当然会报错了
Starting Ceph osd.4 on ceph-node2...
Running as unit ceph-osd.4.1500013086.674713414.service.
Error EINVAL: entity osd.3 exists but key does not match
[root@ceph-node1 ~]# ceph osd tree
3 0 osd.3 down 0 1.00000

解决:[root@ceph-node1 ~]# ceph auth del osd.3
[root@ceph-node1 ~]# ceph osd rm 3
在node2上lsblk发现sdb不正常,没有挂载osd,那么于是
[root@ceph-node1 ~]# ceph-deploy disk zap ceph-node2:sdb
[root@ceph-node1 ~]# ceph-deploy osd prepare ceph-node2:sdb
[root@ceph-node1 ~]# ceph osd tree //至少osd跑到node2上面,而不是node3,还好还好。
-3 0.02998 host ceph-node2
3 0.00999 osd.3 down 0 1.00000
[root@ceph-node1 ~]# ceph-deploy osd activate ceph-node2:sdb //肯定失败,按照上面的经验,必须在Node2上单独激活
[root@ceph-node2 ~]# ceph-disk activate-all

[root@ceph-node1 ~]# ceph osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 0.08995 root default
-2 0.02998 host ceph-node1
0 0.00999 osd.0 up 1.00000 1.00000
1 0.00999 osd.1 up 1.00000 1.00000
2 0.00999 osd.2 up 1.00000 1.00000
-3 0.02998 host ceph-node2
4 0.00999 osd.4 up 1.00000 1.00000
5 0.00999 osd.5 up 1.00000 1.00000
3 0.00999 osd.3 up 1.00000 1.00000
-4 0.02998 host ceph-node3
6 0.00999 osd.6 up 1.00000 1.00000
7 0.00999 osd.7 up 1.00000 1.00000
8 0.00999 osd.8 up 1.00000 1.00000
哎,卧槽,终于解决了,之前只是一个小小的盘符写错了,就害得我搞这么久啊,细心点!

=========================================
拓展:
[root@ceph-node1 ~]# lsblk // OSD up的分区都挂载到/var/lib/ceph/osd目录下
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 40G 0 disk
├─sda1 8:1 0 500M 0 part /boot
└─sda2 8:2 0 39.5G 0 part
├─centos-root 253:0 0 38.5G 0 lvm /
└─centos-swap 253:1 0 1G 0 lvm [SWAP]
sdb 8:16 0 20G 0 disk
├─sdb1 8:17 0 15G 0 part /var/lib/ceph/osd/ceph-0
└─sdb2 8:18 0 5G 0 part
sdc 8:32 0 20G 0 disk
├─sdc1 8:33 0 15G 0 part /var/lib/ceph/osd/ceph-1
└─sdc2 8:34 0 5G 0 part
sdd 8:48 0 20G 0 disk
├─sdd1 8:49 0 15G 0 part /var/lib/ceph/osd/ceph-2
└─sdd2 8:50 0 5G 0 part
sr0 11:0 1 1024M 0 rom