Saturday, September 16, 2023

nodetool commands

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool status aug_space;

Datacenter: Cassandra

=====================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 127.0.0.1 380.53 KiB 1 77.1% 39e7bbfa-2250-41a2-975b-f6a32e8677d0 rack1

UN 127.0.0.2 365.82 KiB 1 89.2% 606d2798-fa58-4656-bcba-5ef8a9c5012f rack1

UN 127.0.0.3 335.28 KiB 1 89.2% ab14e681-7a13-4027-8b4c-0f40a2428a85 rack1

UN 127.0.0.4 323.7 KiB 1 44.6% a411cdb0-f8a1-4f4b-ac44-c1b4ff703124 rack1

since replication factor in aug_space is 3 hence total sum of all data % is 300 %

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool info

ID : 39e7bbfa-2250-41a2-975b-f6a32e8677d0

Gossip active : true

Native Transport active: true

Load : 392.57 KiB

Generation No : 1694106957

Uptime (seconds) : 145381

Heap Memory (MB) : 374.69 / 512.00

Off Heap Memory (MB) : 0.00

Data Center : Cassandra

Rack : rack1

Exceptions : 0

Key Cache : entries 0, size 0 bytes, capacity 25 MiB, 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds

Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

Counter Cache : entries 52, size 6.65 KiB, capacity 12 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds

Chunk Cache : entries 702, size 22.2 MiB, capacity 817 MiB, 702 misses, 18130 requests, 0.961 recent hit rate, 2358.204 microseconds miss latency

Percent Repaired : 0.0%

Token : 0

ubuntu@ds201-node1:~$ /honme/ubuntu/node2/resources/cassandra/bin/nodetool info

bash: /honme/ubuntu/node2/resources/cassandra/bin/nodetool: No such file or directory

ubuntu@ds201-node1:~$ /home/ubuntu/node2/resources/cassandra/bin/nodetool info

ID : 606d2798-fa58-4656-bcba-5ef8a9c5012f

Gossip active : true

Native Transport active: true

Load : 359.02 KiB

Generation No : 1694107334

Uptime (seconds) : 145059

Heap Memory (MB) : 378.41 / 512.00

Off Heap Memory (MB) : 0.00

Data Center : Cassandra

Rack : rack1

Exceptions : 0

Key Cache : entries 0, size 0 bytes, capacity 25 MiB, 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds

Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

Counter Cache : entries 28, size 3.59 KiB, capacity 12 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds

Chunk Cache : entries 736, size 23.56 MiB, capacity 817 MiB, 736 misses, 15944 requests, 0.954 recent hit rate, 7143.424 microseconds miss latency

Percent Repaired : 0.0%

Token : 4223372036854775807

Friday, September 15, 2023

clustering key restrictions

cqlsh:aug_space> select * from movies_by_ratings;

age_rating | movie_id | genres | release_date | title

------------+--------------------------------------+-----------------------+---------------------------------+-----------------

5 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | null | 2023-09-09 18:27:16.646000+0000 | null

5 | 93762f96-005c-4bd9-8994-1a6454a91334 | null | 2023-09-09 18:27:16.646000+0000 | null

8 | 236d3970-97f1-4f24-9bb1-d657cc698dac | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park 2

8 | 6ae5a279-36e0-49db-b39d-f9b582f082a3 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar 2

7 | d9573b24-02cb-49b4-91ba-c040c5a0127a | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar

9 | 2e70b67b-e6f7-4cf3-9493-b8e21641094d | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 3

9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little

9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park

(9 rows)

cqlsh:aug_space> select * from movies_by_ratings order by movie_id desc;

InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."

cqlsh:aug_space> select * from movies_by_ratings where age_rating=9 order by movie_id asc;

age_rating | movie_id | genres | release_date | title

------------+--------------------------------------+-----------------------+---------------------------------+-----------------

9 | 2e70b67b-e6f7-4cf3-9493-b8e21641094d | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 3

9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little

9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park

(4 rows)

cqlsh:aug_space> select * from movies_by_ratings where age_rating=9 order by title desc;

InvalidRequest: Error from server: code=2200 [Invalid query] message="Order by is currently only supported on the clustered columns of the PRIMARY KEY, got title"

Sunday, September 10, 2023

Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node.

Here is how the gossiper works:

1. Once per second, the gossiper will choose a random node in the cluster and initi‐ alize a gossip session with it. Each round of gossip requires three messages.

2. The gossip initiator sends its chosen friend a GossipDigestSyn message.

3. When the friend receives this message, it returns a GossipDigestAck message.

4. When the initiator receives the ack message from the friend, it sends the friend a GossipDigestAck2 message to complete the round of gossip.

more -> https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool describecluster

Cluster Information:

Name: Test Cluster

Snitch: com.datastax.bdp.snitch.DseDelegateSnitch

DynamicEndPointSnitch: enabled

Partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Schema versions:

00abff53-2837-363a-abd8-b75594c5631e: [127.0.0.1, 127.0.0.2, 127.0.0.3, 127.0.0.4]

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool status aug_space;

Datacenter: Cassandra

=====================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

-- Address Load Tokens Owns (effective) Host ID Rack

UN 127.0.0.1 380.53 KiB 1 77.1% 39e7bbfa-2250-41a2-975b-f6a32e8677d0 rack1

UN 127.0.0.2 365.82 KiB 1 89.2% 606d2798-fa58-4656-bcba-5ef8a9c5012f rack1

UN 127.0.0.3 335.28 KiB 1 89.2% ab14e681-7a13-4027-8b4c-0f40a2428a85 rack1

UN 127.0.0.4 323.7 KiB 1 44.6% a411cdb0-f8a1-4f4b-ac44-c1b4ff703124 rack1

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool gossipinfo;

/127.0.0.1

generation:1694106957

heartbeat:120749

STATUS:25:NORMAL,0

LOAD:120733:389660.0

SCHEMA:117734:00abff53-2837-363a-abd8-b75594c5631e

DC:41:Cassandra

RACK:18:rack1

RELEASE_VERSION:4:4.0.0.2284

NATIVE_TRANSPORT_ADDRESS:3:127.0.0.1

X_11_PADDING:92056:{"dse_version":"6.0.0","workloads":"Cassandra","workload":"Cassandra","active":"true","server_id":"08-00-27-32-1E-DD","graph":false,"health":0.9}

NET_VERSION:1:256

HOST_ID:2:39e7bbfa-2250-41a2-975b-f6a32e8677d0

NATIVE_TRANSPORT_READY:54:true

NATIVE_TRANSPORT_PORT:6:9041

NATIVE_TRANSPORT_PORT_SSL:7:9041

STORAGE_PORT:8:7000

STORAGE_PORT_SSL:9:7001

JMX_PORT:10:7199

TOKENS:24:<hidden>

/127.0.0.2

generation:1694107334

heartbeat:120358

STATUS:51:NORMAL,4223372036854775807

LOAD:120300:374600.0

SCHEMA:117343:00abff53-2837-363a-abd8-b75594c5631e

DC:65:Cassandra

RACK:18:rack1

RELEASE_VERSION:4:4.0.0.2284

NATIVE_TRANSPORT_ADDRESS:3:127.0

Saturday, September 9, 2023

Secondary index in cassandra

cqlsh:aug_space> select * from movies;

movie_id | age_rating | genres | release_date | title

--------------------------------------+------------+-----------------------+---------------------------------+-----------------

236d3970-97f1-4f24-9bb1-d657cc698dac | 8 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park 2

41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | 9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

6ae5a279-36e0-49db-b39d-f9b582f082a3 | 8 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar 2

d9573b24-02cb-49b4-91ba-c040c5a0127a | 7 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar

93762f96-005c-4bd9-8994-1a6454a91334 | 9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park

7b085f61-4202-4bd8-b84c-f2a46baf0853 | 9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little

cqlsh:aug_space> select * from movies where title='avatar';

InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"

cqlsh:aug_space> select token(movie_id) , movie_id from movies;

system.token(movie_id) | movie_id

------------------------+--------------------------------------

-7679664659544709434 | 236d3970-97f1-4f24-9bb1-d657cc698dac

-5854020856241272382 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2

-5353160429242411980 | 6ae5a279-36e0-49db-b39d-f9b582f082a3

-5014329844387628670 | d9573b24-02cb-49b4-91ba-c040c5a0127a

3478065203213068432 | 93762f96-005c-4bd9-8994-1a6454a91334

6637426536803809222 | 7b085f61-4202-4bd8-b84c-f2a46baf0853

(6 rows)

cqlsh:aug_space> CREATE INDEX IF NOT EXISTS ON movies (title);

cqlsh:aug_space> select * from movies where title='Avatar';

movie_id | age_rating | genres | release_date | title

--------------------------------------+------------+-----------------------+---------------------------------+--------

d9573b24-02cb-49b4-91ba-c040c5a0127a | 7 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar

cqlsh:aug_space> select * from movies_by_ratings where age_rating=9;

age_rating | movie_id | genres | release_date | title

------------+--------------------------------------+-----------------------+---------------------------------+-----------------

9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little

9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park

(3 rows)

cqlsh:aug_space> desc table movies;

CREATE TABLE aug_space.movies (

movie_id uuid PRIMARY KEY,

age_rating int,

genres set<text>,

release_date timestamp,

title text

) WITH bloom_filter_fp_chance = 0.01

AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}

AND comment = ''

AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}

AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

AND crc_check_chance = 1.0

AND dclocal_read_repair_chance = 0.1

AND default_time_to_live = 0

AND gc_grace_seconds = 864000

AND max_index_interval = 2048

AND memtable_flush_period_in_ms = 0

AND min_index_interval = 128

AND read_repair_chance = 0.0

AND speculative_retry = '99PERCENTILE';

CREATE INDEX movies_title_idx ON aug_space.movies (title);

CREATE INDEX movies_age_rating_idx ON aug_space.movies (age_rating);

for more information -->

SASI & regular index ---

https://www.baeldung.com/cassandra-secondary-indexes

Use Where Clause in Cassandra - CQL

How to use Where Clause in Cassandra (CQL)

Where clause in Cassandra Query Language is used to filter the records. To filter the records, you will be using

Partition Key ( Single or composite)
A cluster column with Partition Key
A column on which you have created the index
A column which is not part of Partition Key or no index is created on it, you can use in where clause but you have to use with Allow Filtering.

Friday, September 8, 2023

Consistency level for LWT in Cassandra

How to set consistency level and serial consistency.

Cassandra’s lightweight transactions are limited to a single partition. Internally, Cassandra stores a Paxos state for each partition. This ensures that transactions on different partitions cannot interfere with each other.

cqlsh:aug_space> consistency ONE;

Consistency level set to ONE.

cqlsh:aug_space> SERIAL CONSISTENCY;

Current serial consistency level is SERIAL.

cqlsh:aug_space> Insert into bank_emp_record(Name, bank_name, Id) values('Dinesh', 'Axis Bank', 201) IF NOT EXISTS;

[applied] | name | id | bank_name

-----------+--------+-----+-----------

False | Dinesh | 201 | Axis Bank

cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (UUID(), {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;

[applied]

-----------

True

cqlsh:aug_space> select * from users;

id | addresses | emails | ip_numbers

--------------------------------------+---------------------------------------+-------------------------+--------------------------------------------

44b37d64-ba2b-435f-99a7-c5a4bc81a10c | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

dd463353-a538-430d-810f-7df19a3cc148 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

1c914387-7e4f-4e69-9aad-72d9b854d31d | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 | null | null | {'10.10.14.1', '10.10.24.1'}

732d1cb7-53cf-4b0a-892e-3edfcc2c1d51 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

(5 rows)

cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (1234567, {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;

InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid INTEGER constant (1234567) for "id" of type uuid"

cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (732d1cb7-53cf-4b0a-892e-3edfcc2c1d51, {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;

[applied] | id | addresses | emails | ip_numbers

-----------+--------------------------------------+---------------------------------------+-------------------------+--------------------------------------------

False | 732d1cb7-53cf-4b0a-892e-3edfcc2c1d51 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

cqlsh:aug_space>

cqlsh:aug_space> select * from bank_emp_record;

name | id | bank_name

--------+-----+--------------------------

David | 410 | ICICI Bank

Dinesh | 201 | Axis Bank

Dinesh | 202 | Axis Bank

Dinesh | 203 | Axis Bank

Dinesh | 204 | Axis Bank

Ashish | 101 | Employee bank change new

Ashish | 102 | Employee bank change new

Ashish | 103 | Employee bank change new

Ashish | 104 | Employee bank change new

Ashish | 105 | Employee bank change new

(10 rows)

cqlsh:aug_space> update bank_emp_record set bank_name ='HDFC Bank' where name='David' IF EXISTS;

[applied]

-----------

True

cqlsh:aug_space> update bank_emp_record set bank_name ='ICICI Bank' where name='David' ;

cqlsh:aug_space> select * from bank_emp_record;

name | id | bank_name

--------+-----+--------------------------

David | 410 | ICICI Bank

Dinesh | 201 | Axis Bank

Dinesh | 202 | Axis Bank

Dinesh | 203 | Axis Bank

Dinesh | 204 | Axis Bank

Ashish | 101 | Employee bank change new

Ashish | 102 | Employee bank change new

Ashish | 103 | Employee bank change new

Ashish | 104 | Employee bank change new

Ashish | 105 | Employee bank change new

(10 rows)

cqlsh:aug_space> update bank_emp_record set bank_name ='ICICI Bank' where name='David' IF bank_name ='HDFC Bank' ;

[applied] | bank_name

-----------+------------

False | ICICI Bank

Tuesday, August 22, 2023

Quick useful Cassandra links

changing Vnode impact on Cluster --> https://thelastpickle.com/blog/2021/01/29/impacts-of-changing-the-number-of-vnodes.html

ER diagram shapes --> https://www.geeksforgeeks.org/introduction-of-er-model/

https://www.edureka.co/blog/interview-questions/cassandra-interview-questions/

https://www.baeldung.com/cassandra-consistency-levels

https://www.instaclustr.com/support/documentation/cassandra/using-cassandra/managing-tombstones-in-cassandra/#section-when-do-tombstones-cause-problems

https://www.instaclustr.com/blog/apache-cassandra-compaction/

Data modelling in cassandra

https://www.datastax.com/learn/data-modeling-by-example

I am getting difficulty in understanding vnode cluster concept . Then i watched https://www.youtube.com/watch?v=xhclDLmhdrc&list=PLtZafdfRI4M5blB0jpYtlg70l-W4xVQnO&index=10&t=903s video twice . Now only the concept of vNode is bit clear.

Query First approach

Cassandra gets a lot of its speed in database reads and writes from the fact that it never has to perform any joins on the database in fact in Cassandra it’s impossible to perform joins, so instead of taking a relational model approach we take a query first approach, this means we design our tables for a specific query, as a result, we want to have every table which is catered for a specific query, rather than flexible tables such as this employee table and the company car table, hence, we only ever have to query one table when we’re reading or writing data, obviously there are some consequences to this approach, as we might end up writing the same data to multiple tables just to satisfy different queries but it’s ideal for a Cassandra’s distributed architecture.

Monday, August 7, 2023

Cassandra is key value or columnar

Cassandra is not a columnar database.
A columnar/column-store/column-oriented database, as you said, guarantees data locality for a single column, within a given node, on disk. This is a column that spans many or all rows depending on if, or how, you specify partitions and what the database supports.

Cassandra is a column-family* store. A column-family store ensures data locality at the partition level, not the column level. In a database like Cassandra a partition is a group of rows and columns split up by a specified partition key, then clustered together by specified clustering column(s) (optional). To query Cassandra, you must know, at a minimum, the partition key in order to avoid full scans of your data.
All data for a given partition in Cassandra is guaranteed to be on the same node and in a given file (SSTable) in the same location within that file. The one thing to note here is that depending on your compaction strategy, the partition can be split across multiple files on disk, so data locality on disk is not a guarantee.

Column stores lend themselves to, and are designed for, analytic workloads. Because each column is in the same location on disk, they can read all information for a given column across many/all rows incredibly fast. This comes at the cost of very slow writes which usually need to be done in batch loads to avoid drastic performance implications.

Column-family stores, like Cassandra, are a great choice if you have high throughput writes and want to be able to linearly scale horizontally. Reads that use the partition key are incredibly fast as the partition key defines where the data resides. The downfall here is that if you need to do any sort of ad-hoc query, a full scan of all data is required.
* The term "column-family" comes from the original storage engine that was a key/value store, where the value was a "family" of column/value tuples. There was no hard limit on the number of columns that each key could have. In Cassandra this was later abstracted into "partitions", then eventually the storage engine was modified to match the abstraction.

Courtsy --> https://www.quora.com/How-is-Cassandra-a-columnar-database

THEVOID

Saturday, September 16, 2023

nodetool commands

Friday, September 15, 2023

clustering key restrictions

Sunday, September 10, 2023

Gossip Protocol

Saturday, September 9, 2023

Secondary index in cassandra

Use Where Clause in Cassandra - CQL

How to use Where Clause in Cassandra (CQL)

Friday, September 8, 2023

Consistency level for LWT in Cassandra

Tuesday, August 22, 2023

Quick useful Cassandra links

Monday, August 7, 2023

Cassandra is key value or columnar