Saturday, September 16, 2023

nodetool commands

 


ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool status aug_space;

Datacenter: Cassandra

=====================

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack

UN  127.0.0.1  380.53 KiB  1            77.1%             39e7bbfa-2250-41a2-975b-f6a32e8677d0  rack1

UN  127.0.0.2  365.82 KiB  1            89.2%             606d2798-fa58-4656-bcba-5ef8a9c5012f  rack1

UN  127.0.0.3  335.28 KiB  1            89.2%             ab14e681-7a13-4027-8b4c-0f40a2428a85  rack1

UN  127.0.0.4  323.7 KiB  1            44.6%             a411cdb0-f8a1-4f4b-ac44-c1b4ff703124  rack1


since replication factor in  aug_space  is 3 hence total sum of all data % is 300 % 


ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool info

ID                     : 39e7bbfa-2250-41a2-975b-f6a32e8677d0

Gossip active          : true

Native Transport active: true

Load                   : 392.57 KiB

Generation No          : 1694106957

Uptime (seconds)       : 145381

Heap Memory (MB)       : 374.69 / 512.00

Off Heap Memory (MB)   : 0.00

Data Center            : Cassandra

Rack                   : rack1

Exceptions             : 0

Key Cache              : entries 0, size 0 bytes, capacity 25 MiB, 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds

Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

Counter Cache          : entries 52, size 6.65 KiB, capacity 12 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds

Chunk Cache            : entries 702, size 22.2 MiB, capacity 817 MiB, 702 misses, 18130 requests, 0.961 recent hit rate, 2358.204 microseconds miss latency

Percent Repaired       : 0.0%

Token                  : 0

ubuntu@ds201-node1:~$ /honme/ubuntu/node2/resources/cassandra/bin/nodetool info

bash: /honme/ubuntu/node2/resources/cassandra/bin/nodetool: No such file or directory

ubuntu@ds201-node1:~$ /home/ubuntu/node2/resources/cassandra/bin/nodetool info

ID                     : 606d2798-fa58-4656-bcba-5ef8a9c5012f

Gossip active          : true

Native Transport active: true

Load                   : 359.02 KiB

Generation No          : 1694107334

Uptime (seconds)       : 145059

Heap Memory (MB)       : 378.41 / 512.00

Off Heap Memory (MB)   : 0.00

Data Center            : Cassandra

Rack                   : rack1

Exceptions             : 0

Key Cache              : entries 0, size 0 bytes, capacity 25 MiB, 0 hits, 0 requests, NaN recent hit rate, 14400 save period in seconds

Row Cache              : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit rate, 0 save period in seconds

Counter Cache          : entries 28, size 3.59 KiB, capacity 12 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds

Chunk Cache            : entries 736, size 23.56 MiB, capacity 817 MiB, 736 misses, 15944 requests, 0.954 recent hit rate, 7143.424 microseconds miss latency

Percent Repaired       : 0.0%

Token                  : 4223372036854775807



Friday, September 15, 2023

clustering key restrictions

 cqlsh:aug_space> select * from movies_by_ratings;


 age_rating | movie_id                             | genres                | release_date                    | title

------------+--------------------------------------+-----------------------+---------------------------------+-----------------

          5 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 |                  null | 2023-09-09 18:27:16.646000+0000 |            null

          5 | 93762f96-005c-4bd9-8994-1a6454a91334 |                  null | 2023-09-09 18:27:16.646000+0000 |            null

          8 | 236d3970-97f1-4f24-9bb1-d657cc698dac | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park 2

          8 | 6ae5a279-36e0-49db-b39d-f9b582f082a3 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |       Avatar  2

          7 | d9573b24-02cb-49b4-91ba-c040c5a0127a | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |          Avatar

          9 | 2e70b67b-e6f7-4cf3-9493-b8e21641094d | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 3

          9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

          9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Sturat Little

          9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Jurassic Park


(9 rows)

cqlsh:aug_space> select * from movies_by_ratings order by movie_id desc;

InvalidRequest: Error from server: code=2200 [Invalid query] message="ORDER BY is only supported when the partition key is restricted by an EQ or an IN."


cqlsh:aug_space> select * from movies_by_ratings where age_rating=9 order by movie_id asc;


 age_rating | movie_id                             | genres                | release_date                    | title

------------+--------------------------------------+-----------------------+---------------------------------+-----------------

          9 | 2e70b67b-e6f7-4cf3-9493-b8e21641094d | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 3

          9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2

          9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Sturat Little

          9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Jurassic Park


(4 rows)

cqlsh:aug_space> select * from movies_by_ratings where age_rating=9 order by title desc;

InvalidRequest: Error from server: code=2200 [Invalid query] message="Order by is currently only supported on the clustered columns of the PRIMARY KEY, got title"

Sunday, September 10, 2023

Gossip Protocol

Gossip is a peer-to-peer communication protocol in which nodes periodically exchange state information about themselves and about other nodes they know about. The gossip process runs every second and exchanges state messages with up to three other nodes in the cluster. The nodes exchange information about themselves and about the other nodes that they have gossiped about, so all nodes quickly learn about all other nodes in the cluster. A gossip message has a version associated with it, so that during a gossip exchange, older information is overwritten with the most current state for a particular node. 

Here is how the gossiper works: 
1. Once per second, the gossiper will choose a random node in the cluster and initi‐ alize a gossip session with it. Each round of gossip requires three messages.
2. The gossip initiator sends its chosen friend a GossipDigestSyn message. 
3. When the friend receives this message, it returns a GossipDigestAck message.
4. When the initiator receives the ack message from the friend, it sends the friend a GossipDigestAck2 message to complete the round of gossip.

more -> https://docs.datastax.com/en/cassandra-oss/3.x/cassandra/architecture/archGossipAbout.html





ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: com.datastax.bdp.snitch.DseDelegateSnitch
DynamicEndPointSnitch: enabled
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
00abff53-2837-363a-abd8-b75594c5631e: [127.0.0.1, 127.0.0.2, 127.0.0.3, 127.0.0.4]

ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool status aug_space;
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  380.53 KiB  1            77.1%             39e7bbfa-2250-41a2-975b-f6a32e8677d0  rack1
UN  127.0.0.2  365.82 KiB  1            89.2%             606d2798-fa58-4656-bcba-5ef8a9c5012f  rack1
UN  127.0.0.3  335.28 KiB  1            89.2%             ab14e681-7a13-4027-8b4c-0f40a2428a85  rack1
UN  127.0.0.4  323.7 KiB  1            44.6%             a411cdb0-f8a1-4f4b-ac44-c1b4ff703124  rack1


ubuntu@ds201-node1:~$ /home/ubuntu/node1/resources/cassandra/bin/nodetool gossipinfo;
/127.0.0.1
  generation:1694106957
  heartbeat:120749
  STATUS:25:NORMAL,0
  LOAD:120733:389660.0
  SCHEMA:117734:00abff53-2837-363a-abd8-b75594c5631e
  DC:41:Cassandra
  RACK:18:rack1
  RELEASE_VERSION:4:4.0.0.2284
  NATIVE_TRANSPORT_ADDRESS:3:127.0.0.1
  X_11_PADDING:92056:{"dse_version":"6.0.0","workloads":"Cassandra","workload":"Cassandra","active":"true","server_id":"08-00-27-32-1E-DD","graph":false,"health":0.9}
  NET_VERSION:1:256
  HOST_ID:2:39e7bbfa-2250-41a2-975b-f6a32e8677d0
  NATIVE_TRANSPORT_READY:54:true
  NATIVE_TRANSPORT_PORT:6:9041
  NATIVE_TRANSPORT_PORT_SSL:7:9041
  STORAGE_PORT:8:7000
  STORAGE_PORT_SSL:9:7001
  JMX_PORT:10:7199
  TOKENS:24:<hidden>
/127.0.0.2
  generation:1694107334
  heartbeat:120358
  STATUS:51:NORMAL,4223372036854775807
  LOAD:120300:374600.0
  SCHEMA:117343:00abff53-2837-363a-abd8-b75594c5631e
  DC:65:Cassandra
  RACK:18:rack1
  RELEASE_VERSION:4:4.0.0.2284
  NATIVE_TRANSPORT_ADDRESS:3:127.0

Saturday, September 9, 2023

Secondary index in cassandra

 







cqlsh:aug_space> select * from movies;

 movie_id                             | age_rating | genres                | release_date                    | title
--------------------------------------+------------+-----------------------+---------------------------------+-----------------
 236d3970-97f1-4f24-9bb1-d657cc698dac |          8 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Jurassic Park 2
 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 |          9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2
 6ae5a279-36e0-49db-b39d-f9b582f082a3 |          8 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |       Avatar  2
 d9573b24-02cb-49b4-91ba-c040c5a0127a |          7 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |          Avatar
 93762f96-005c-4bd9-8994-1a6454a91334 |          9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Jurassic Park
 7b085f61-4202-4bd8-b84c-f2a46baf0853 |          9 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Sturat Little

cqlsh:aug_space> select * from movies where title='avatar';
InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING"
cqlsh:aug_space> select  token(movie_id) , movie_id from movies;

 system.token(movie_id) | movie_id
------------------------+--------------------------------------
   -7679664659544709434 | 236d3970-97f1-4f24-9bb1-d657cc698dac
   -5854020856241272382 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2
   -5353160429242411980 | 6ae5a279-36e0-49db-b39d-f9b582f082a3
   -5014329844387628670 | d9573b24-02cb-49b4-91ba-c040c5a0127a
    3478065203213068432 | 93762f96-005c-4bd9-8994-1a6454a91334
    6637426536803809222 | 7b085f61-4202-4bd8-b84c-f2a46baf0853

(6 rows)
cqlsh:aug_space> CREATE INDEX IF NOT EXISTS ON movies (title);

cqlsh:aug_space> select * from movies where title='Avatar';

 movie_id                             | age_rating | genres                | release_date                    | title
--------------------------------------+------------+-----------------------+---------------------------------+--------
 d9573b24-02cb-49b4-91ba-c040c5a0127a |          7 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Avatar


cqlsh:aug_space> select * from movies_by_ratings where age_rating=9;

 age_rating | movie_id                             | genres                | release_date                    | title
------------+--------------------------------------+-----------------------+---------------------------------+-----------------
          9 | 41d8ffdf-fb1e-43d0-85ae-9c4f358f35c2 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 | Sturat Little 2
          9 | 7b085f61-4202-4bd8-b84c-f2a46baf0853 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Sturat Little
          9 | 93762f96-005c-4bd9-8994-1a6454a91334 | {'animation', 'kids'} | 2023-09-09 00:00:00.000000+0000 |   Jurassic Park

(3 rows)
cqlsh:aug_space> desc table movies;

CREATE TABLE aug_space.movies (
    movie_id uuid PRIMARY KEY,
    age_rating int,
    genres set<text>,
    release_date timestamp,
    title text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';
CREATE INDEX movies_title_idx ON aug_space.movies (title);
CREATE INDEX movies_age_rating_idx ON aug_space.movies (age_rating);


for more information -->


SASI & regular index ---
https://www.baeldung.com/cassandra-secondary-indexes 


Use Where Clause in Cassandra - CQL

 

How to use Where Clause in Cassandra (CQL)

Where clause in Cassandra Query Language is used to filter the records.  To filter the records, you will be using 

  • Partition Key ( Single or composite)
  • A cluster column with Partition Key
  • A column on which you have created the index
  • A column which is not part of Partition Key or no index is created on it, you can use in where clause but you have to use with Allow Filtering.



Friday, September 8, 2023

Consistency level for LWT in Cassandra

 How to set consistency level  and serial consistency.

Cassandra’s lightweight transactions are limited to a single partition. Internally, Cassandra stores a Paxos state for each partition. This ensures that transactions on different partitions cannot interfere with each other.

cqlsh:aug_space> consistency ONE;
Consistency level set to ONE.
cqlsh:aug_space> SERIAL CONSISTENCY;
Current serial consistency level is SERIAL.
cqlsh:aug_space> Insert into bank_emp_record(Name, bank_name, Id) values('Dinesh', 'Axis Bank', 201) IF NOT EXISTS; 

 [applied] | name   | id  | bank_name
-----------+--------+-----+-----------
     False | Dinesh | 201 | Axis Bank

cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (UUID(), {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;

 [applied]
-----------
      True

cqlsh:aug_space> select * from users;

 id                                   | addresses                             | emails                  | ip_numbers
--------------------------------------+---------------------------------------+-------------------------+--------------------------------------------
 44b37d64-ba2b-435f-99a7-c5a4bc81a10c | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}
 dd463353-a538-430d-810f-7df19a3cc148 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}
 1c914387-7e4f-4e69-9aad-72d9b854d31d | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}
 6ab09bec-e68e-48d9-a5f8-97e6fb4c9b47 |                                  null |                    null |               {'10.10.14.1', '10.10.24.1'}
 732d1cb7-53cf-4b0a-892e-3edfcc2c1d51 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

(5 rows)
cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (1234567, {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid INTEGER constant (1234567) for "id" of type uuid"
cqlsh:aug_space> INSERT INTO users (id, addresses, emails, ip_numbers) VALUES (732d1cb7-53cf-4b0a-892e-3edfcc2c1d51, {'home':'192, 10th cross, wilson'} ,['shweta001@yahoo.com'],{'10.10.11.1', '10.10.10.1', '10.10.12.2'}) IF NOT EXISTS;

 [applied] | id                                   | addresses                             | emails                  | ip_numbers
-----------+--------------------------------------+---------------------------------------+-------------------------+--------------------------------------------
     False | 732d1cb7-53cf-4b0a-892e-3edfcc2c1d51 | {'home': ('192, 10th cross, wilson')} | ['shweta001@yahoo.com'] | {'10.10.10.1', '10.10.11.1', '10.10.12.2'}

cqlsh:aug_space> 


cqlsh:aug_space>  select * from bank_emp_record;

 name   | id  | bank_name
--------+-----+--------------------------
  David | 410 |               ICICI Bank
 Dinesh | 201 |                Axis Bank
 Dinesh | 202 |                Axis Bank
 Dinesh | 203 |                Axis Bank
 Dinesh | 204 |                Axis Bank
 Ashish | 101 | Employee bank change new
 Ashish | 102 | Employee bank change new
 Ashish | 103 | Employee bank change new
 Ashish | 104 | Employee bank change new
 Ashish | 105 | Employee bank change new

(10 rows)
cqlsh:aug_space>  update bank_emp_record set bank_name ='HDFC Bank'  where name='David'  IF EXISTS;

 [applied]
-----------
      True

cqlsh:aug_space>  update bank_emp_record set bank_name ='ICICI Bank'  where name='David' ;
cqlsh:aug_space>  select * from bank_emp_record;

 name   | id  | bank_name
--------+-----+--------------------------
  David | 410 |               ICICI Bank
 Dinesh | 201 |                Axis Bank
 Dinesh | 202 |                Axis Bank
 Dinesh | 203 |                Axis Bank
 Dinesh | 204 |                Axis Bank
 Ashish | 101 | Employee bank change new
 Ashish | 102 | Employee bank change new
 Ashish | 103 | Employee bank change new
 Ashish | 104 | Employee bank change new
 Ashish | 105 | Employee bank change new

(10 rows)
cqlsh:aug_space>  update bank_emp_record set bank_name ='ICICI Bank'  where name='David'  IF bank_name ='HDFC Bank' ;

 [applied] | bank_name
-----------+------------
     False | ICICI Bank


Tuesday, August 22, 2023

Quick useful Cassandra links



changing  Vnode  impact on Cluster --> https://thelastpickle.com/blog/2021/01/29/impacts-of-changing-the-number-of-vnodes.html

ER diagram  shapes  --> https://www.geeksforgeeks.org/introduction-of-er-model/ 

https://www.edureka.co/blog/interview-questions/cassandra-interview-questions/





 Data modelling in cassandra

https://www.datastax.com/learn/data-modeling-by-example




Query First approach 

Cassandra gets a lot of its speed in database reads and writes from the fact that it never has to perform any joins on the database in fact in Cassandra it’s impossible to perform joins, so instead of taking a relational model approach we take a query first approach, this means we design our tables for a specific query, as a result, we want to have every table which is catered for a specific query, rather than flexible tables such as this employee table and the company car table, hence, we only ever have to query one table when we’re reading or writing data, obviously there are some consequences to this approach, as we might end up writing the same data to multiple tables just to satisfy different queries but it’s ideal for a Cassandra’s distributed architecture.



Read  more on read repair 




Monday, August 7, 2023

Cassandra is key value or columnar

 Cassandra is not a columnar database.
A columnar/column-store/column-oriented database, as you said, guarantees data locality for a single column, within a given node, on disk. This is a column that spans many or all rows depending on if, or how, you specify partitions and what the database supports.

Cassandra is a column-family* store. A column-family store ensures data locality at the partition level, not the column level. In a database like Cassandra a partition is a group of rows and columns split up by a specified partition key, then clustered together by specified clustering column(s) (optional). To query Cassandra, you must know, at a minimum, the partition key in order to avoid full scans of your data.
All data for a given partition in Cassandra is guaranteed to be on the same node and in a given file (SSTable) in the same location within that file. The one thing to note here is that depending on your compaction strategy, the partition can be split across multiple files on disk, so data locality on disk is not a guarantee.

Column stores lend themselves to, and are designed for, analytic workloads. Because each column is in the same location on disk, they can read all information for a given column across many/all rows incredibly fast. This comes at the cost of very slow writes which usually need to be done in batch loads to avoid drastic performance implications.

Column-family stores, like Cassandra, are a great choice if you have high throughput writes and want to be able to linearly scale horizontally. Reads that use the partition key are incredibly fast as the partition key defines where the data resides. The downfall here is that if you need to do any sort of ad-hoc query, a full scan of all data is required.
* The term "column-family" comes from the original storage engine that was a key/value store, where the value was a "family" of column/value tuples. There was no hard limit on the number of columns that each key could have. In Cassandra this was later abstracted into "partitions", then eventually the storage engine was modified to match the abstraction.

Courtsy --> https://www.quora.com/How-is-Cassandra-a-columnar-database